Yeah, we've thought of that - logs on the backend aren't showing the calls come in until the delay is finished. We have other mirror instances of nginx (with same configs) that succeed the whole time. We also have test scripts that are curling the same backend service via the "working" nginx proxy, the "flaky" nginx proxy", and the backend service directly. Those loop multiple times per second and the "working" nginx proxy (until it turns flaky) and the direct calls to the backend always succeed even while the "flaky proxy" reliably shows the error.
Still digging... thanks for the continued brainstorm!
Still digging... thanks for the continued brainstorm!