Convox Community

Health checks failing with 403

Is there anything outside an application that can cause a 403 error on a health check?

2019-08-19T16:04:26Z system/aws/ecs (service production-gazette-ServiceApi-IJF0ZX2TN611-Service-1EIO0UCOL76YZ) (instance i-0755ae09b91a8135d) (port 32768) is unhealthy in (target-group arn:aws:elasticloadbalancing:us-east-1:805781165890:targetgroup/produ-Balan-13DQHTN22DFY1/570a8e44b2600cbe) due to (reason Health checks failed with these codes: [403])

I have one service (the frontend/api) that is failing health checks with a 403 error, and constantly restarting. I can’t tell what is causing this and I don’t think the problem is coming from the app as all endpoints appear to work fine in the brief seconds when the app is actually running.

After it has restarted I’m able to get a 200 via the browser (if I’m quick enough) before the health check runs and restarts it.

I’ve gone back to the previous release (from about a month ago) and that works, but doing a diff between the codebase I can’t see anything relevant that’s changed. I upgraded from Rails 5 to Rails 6, but as above, the /health URL does work for me, and is very simple in routes rb:

get '/health', to: proc { |_env| [200, headers, ['OK']] }

So I don’t think a Rails change could effect that.

Any tips would be greatly appreciated.

Zach

Below shows the health check working for me, also note the 403s don’t show up in the logs:

2019-08-19T16:21:17Z service/api:RDCADHUUXWD/9db118997cb3 Exiting

2019-08-19T16:21:29Z system/aws/ecs (service production-gazette-ServiceApi-IJF0ZX2TN611-Service-1EIO0UCOL76YZ) has started 1 tasks: (task 429db65b-f740-4ef6-b3aa-b213ba757dd5).

2019-08-19T16:21:30Z service/api:RDCADHUUXWD/b213ba757dd5 Puma starting in single mode...

2019-08-19T16:21:30Z service/api:RDCADHUUXWD/b213ba757dd5 * Version 3.12.0 (ruby 2.6.0-p0), codename: Llamas in Pajamas

2019-08-19T16:21:30Z service/api:RDCADHUUXWD/b213ba757dd5 * Min threads: 1, max threads: 5

2019-08-19T16:21:30Z service/api:RDCADHUUXWD/b213ba757dd5 * Environment: production

2019-08-19T16:21:30Z service/api:RDCADHUUXWD/b213ba757dd5 * Listening on tcp://0.0.0.0:80

2019-08-19T16:21:30Z service/api:RDCADHUUXWD/b213ba757dd5 Use Ctrl-C to stop

2019-08-19T16:21:38Z system/aws/ecs (service production-gazette-ServiceApi-IJF0ZX2TN611-Service-1EIO0UCOL76YZ) registered 1 targets in (target-group arn:aws:elasticloadbalancing:us-east-1:805781165890:targetgroup/produ-Balan-13DQHTN22DFY1/570a8e44b2600cbe)

2019-08-19T16:21:48Z service/api:RDCADHUUXWD/b213ba757dd5 I, [2019-08-19T16:21:48.881587 #1] INFO -- : [4e72559a-322c-4dce-a124-91159b4cd7bd] Started GET "/health" for 67.245.151.25 at 2019-08-19 16:21:48 +0000

2019-08-19T16:21:56Z service/api:RDCADHUUXWD/b213ba757dd5 - Gracefully stopping, waiting for requests to finish

2019-08-19T16:21:56Z service/api:RDCADHUUXWD/b213ba757dd5 === puma shutdown: 2019-08-19 16:21:56 +0000 ===

2019-08-19T16:21:56Z service/api:RDCADHUUXWD/b213ba757dd5 - Goodbye!

2019-08-19T16:21:56Z service/api:RDCADHUUXWD/b213ba757dd5 Exiting

2019-08-19T16:21:56Z system/aws/ecs (service production-gazette-ServiceApi-IJF0ZX2TN611-Service-1EIO0UCOL76YZ) has stopped 1 running tasks: (task 429db65b-f740-4ef6-b3aa-b213ba757dd5).

2019-08-19T16:21:56Z system/aws/ecs (service production-gazette-ServiceApi-IJF0ZX2TN611-Service-1EIO0UCOL76YZ) deregistered 1 targets in (target-group arn:aws:elasticloadbalancing:us-east-1:805781165890:targetgroup/produ-Balan-13DQHTN22DFY1/570a8e44b2600cbe)

2019-08-19T16:21:56Z system/aws/ecs (service production-gazette-ServiceApi-IJF0ZX2TN611-Service-1EIO0UCOL76YZ) (instance i-02e52b61f35966976) (port 32812) is unhealthy in (target-group arn:aws:elasticloadbalancing:us-east-1:805781165890:targetgroup/produ-Balan-13DQHTN22DFY1/570a8e44b2600cbe) due to (reason Health checks failed with these codes: [403])

2019-08-19T16:22:06Z system/aws/ecs (service production-gazette-ServiceApi-IJF0ZX2TN611-Service-1EIO0UCOL76YZ) has started 1 tasks: (task 856e5ddc-9d7f-44f9-a255-bae83baad69f).

2019-08-19T16:22:09Z service/api:RDCADHUUXWD/bae83baad69f => Booting Puma

Update … I switched out the api service for another one and it worked fine, so it does appear to be the api code base (and probably Rails 6 upgrade).

Why the 403 only occurs for the TargetGroup’s ping, and doesn’t show up in the service’s logs is beyond me…