Convox Community

Agents are causing my deploys to get into a deadlock and roll back

I’ve added a datadog agent to each of my instances, and now my deploys seem to be regularly failing because ECS doesn’t have enough room to start the agent on my instances. This causes my CloudFormation stack to get into a failed state, and the update rollback often fails too.

So far my solution is to manually SSH into the instance and run “docker kill” to kill one of my worker containers, and hope that ECS will use the free CPU / RAM to start up an agent. This is a really ugly workaround and I’ve had to do it about 3 times so far. (But sometimes the deploy is fine, if the agent can get in there first.)

Can Convox / ECS be configured so that my agents will kick off a container if there isn’t enough room? I’ve tried tweaking the CPU / RAM for my web/worker containers, but I can’t figure out the right balance, and the other Convox agents swoop in there to use up the resources.