Agents are causing my deploys to get into a deadlock and roll back

nathan.f77 · June 17, 2019, 7:59pm

I’ve added a datadog agent to each of my instances, and now my deploys seem to be regularly failing because ECS doesn’t have enough room to start the agent on my instances. This causes my CloudFormation stack to get into a failed state, and the update rollback often fails too.

So far my solution is to manually SSH into the instance and run “docker kill” to kill one of my worker containers, and hope that ECS will use the free CPU / RAM to start up an agent. This is a really ugly workaround and I’ve had to do it about 3 times so far. (But sometimes the deploy is fine, if the agent can get in there first.)

Can Convox / ECS be configured so that my agents will kick off a container if there isn’t enough room? I’ve tried tweaking the CPU / RAM for my web/worker containers, but I can’t figure out the right balance, and the other Convox agents swoop in there to use up the resources.

Topic		Replies	Views
Convox's ECS Task placement strategy Rack (Version 2)	2	1079	March 6, 2019
Enabling Container insights on Rack V2 Rack (Version 2)	1	230	December 29, 2022
Issue promoting releases with release 20210310212746 Rack (Version 2)	5	562	August 30, 2021
Failing service blocked entire rack Console	0	541	March 10, 2020
Latest Rack Autoscaled from 10 -> 700 instances Rack (Version 2)	4	626	March 20, 2021

Agents are causing my deploys to get into a deadlock and roll back

Related topics