I’ve almost completed the v2 => v3 upgrade, but now I’m stuck on one last problem. Convox / K8s is returning a 502 error for my app’s homepage (and only the homepage.) Everything else works fine and returns 200 response (health checks, app pages, sign in page, etc.)
I can also see the 200 response in the logs, but openresty seems to have a problem with my homepage, and is dropping the 200 response and returning a 502. This isn’t happening with v2 racks, it’s only happening on v3. (I’m running an identical application on both v2 and v3 as a practice run for the migration.)
I can’t find any more detailed logs to show why this is happening. The only thing I can think of is that my home page includes a lot of preload “link” headers. Maybe this isn’t supported and is causing an error?
There should be no issues with the response time. It only takes 40ms for the application to return a response. Other pages are slower but still working fine.
I removed these link headers and the page loads fine. Reverted the change, and the 502 Bad Gateway error is back. (Just to triple-check that this was definitely the issue.)
I very much suspect this is related to the issue a bunch of us have been seeing around the default ingress controller having default settings that only allow 4k headers.
I had some more accidental downtime today due to this same issue. I am setting up a CSP policy with a fairly large number of domains, and my server started responding with 502 errors since my response headers became too large.
I should really try adding some e2e tests. Maybe I can set up a Rails middleware to check the header size during the integration test suite and crash if it’s too large.
@mark Do you know if these changes are persistent, or do you need to set this every time you deploy, or after you update Convox?
I noticed that it’s an app-specific setting configured within the Convox app and database, so it’s actually not even part of my Terraform config or .tfstate file. So I’m not too sure how temporary this is, and if I can expect downtime with 502 errors at some point when it gets reset.