Convox's ECS Task placement strategy

I would like to have a better understanding of how Convox setups a rack with regards to ECS task placement strategies…

Specifically, I think I understand that ECS decides where to place a new task based on various variables (e.g., memory reservation, CPU, whether Fargate is used or not, service scheduling strategy, task placement strategy, etc.), however we’re having hard time figuring out how convox sets the whole cluster.

@ddollar I would appreciate if you could enlighten here a little bit please.

Reading https://github.com/convox/rack/blob/dc44e3787c08af9cfcd2811ddaeaa769651bfef7/provider/aws/formation/service.json.tmpl#L390-L410 the following questions come to mind:

  • What is an “Agent” and how to tell if it’s enabled or not?
  • If the agent is enabled, what does the DAEMON scheduling strategy supposed to do?
  • If the agent is NOT enabled, then how will ECS behave while placing tasks in EC2 instances?

I hope I am asking the right/good questions here, but the gist of what I am trying to understand is this: when I scale an app’s count/cpu/memory I would like to have a pretty good understanding (within reason) of how my rack will handle the scale operation, considering all other apps that are running on the same rack + taking into account scheduled tasks that may be running, etc.

I ended up in this rabbit whole, because recently we’ve been getting some inconsistent command=<...> result=failure reason=RESOURCE:MEMORY errors on some scheduled tasks that weren’t being placed, although apparently there were EC2 instances still some available memory to them, we believe…

Thanks!

/cc: @stefano

An agent is simply a task that must be deployed on every EC2 instance of the cluster (agent: true in convox.yml – https://docs.convox.com/application/convox-yml). The DAEMON strategy instructs ECS to always place one task for that service on each EC2 instance.

For non-agent services, the placement strategies dictate that the tasks should first be spread across AZs, then – within a single AZ – they will be spread across instances.

If a task cannot be placed according to those constraints due to insufficient memory or CPU, then you will get the error messages you mentioned.

At least that’s how I understand it.

@crohr I appreciate you taking time to answer. Thank you! :pray:

That pretty much confirms what I was understanding by looking at the CloudFormation template generated by convox.