IAM Error upgrading rack

Updating from version: 20190126182547

Anyone familiar with this issue?

CREATE_IN_PROGRESS VolumeTarget0
CREATE_IN_PROGRESS VolumeTarget1
CREATE_FAILED VolumeTarget0 The IAM identity making this call has an IAM policy that is too large. Reduce the size of the policy and try again. (Service: AmazonElasticFileSystem; Status Code: 400; Error Code: BadRequest; Request ID: 21104a41-6c06-11e9-8946-35b35ddaba77)
CREATE_FAILED VolumeTarget1 The IAM identity making this call has an IAM policy that is too large. Reduce the size of the policy and try again. (Service: AmazonElasticFileSystem; Status Code: 400; Error Code: BadRequest; Request ID: 214d5345-6c06-11e9-8946-35b35ddaba77)
CREATE_FAILED SpotInstancesLifecycleInterrupting Resource creation cancelled
UPDATE_FAILED BuildInstances Resource update cancelled
UPDATE_ROLLBACK_IN_PROGRESS staging-vessel The following resource(s) failed to create: [VolumeTarget1, SpotInstancesLifecycleInterrupting, VolumeTarget0]. The following resource(s) failed to update: [BuildInstances].
1 Like

Unfortunately seems to have put my rack into limbo too:

<rack> is in UPDATE_ROLLBACK_FAILED state and can not be updated. status code: 400,

Does this mean needing to destroy the whole rack and rebuild?

I am having the same issue and need to know how to fix it

Also see thing, I’m trying to continue the rollback in the AWS cloudformation console and then will try to remove the build instance.

Anyone have any luck since this in upgrading racks stuck like this? I was able to fix not being able to roll back per matt’s suggestion, but I still can’t update the rack. I just tried two oldish racks that were both on the same version: one in the Canada region and another in the Australia region. The Australian one completed without any issues, but the Canadian one is still getting stuck with the error: The IAM identity making this call has an IAM policy that is too large. Reduce the size of the policy and try again. (Service: AmazonElasticFileSystem; Status Code: 400; Error Code: BadRequest; Request ID: *). I am tried it a few times as well as manually set the max AZs to 2 like Error when changing the MaxAvailabilityZones on a rack referenced. Still not allowing me to update.

2 Likes

Nope - I’ve tried multiple times.
My rack must be about 10 months out of date now, but I’ve had some other major things to focus on this year and clearly there’s no simple solution to this.

Need to get this sorted, as I’m extremely nervous older racks won’t support evolving AWS changes, and things will come tumbling down one day.

Out of curiosity, is it only a single rack that won’t update for you as well? We have 4 racks (3 production in different regions + 1 staging), and only the production app in the CA region fails. Are you having similar experiences?

This is a known bug in IAM in AWS which is unfortunately outside of Convox’s control. Amazon have promised a fix but haven’t given a timeline for it :frowning_face:

The suggested workaround is to go into the CloudFormation stack for your rack and manually update the version parameter to something more recent than your current version. Apply it, let it finish, then try running the update again and it will just work sometimes. Otherwise, and as well, please raise a ticket with AWS to encourage them to fix it!

Thanks for getting back on this @ed_convox and for proposing a solution. Shame it’s AWS, and yeah I think it is a policy for them never to provide a release estimate as they’ve always said the same to me.

In terms of the interim, do you mean we should update the version parameter of the rack itself - or of something to do with AWS / IAM?

@scott - to answer your question, it is two different racks with the same problem, both in the London region.

FYI I had to update the version and let it fail first. If I just tried to set the version in CloudFormation to a version I hadn’t tried to install yet, it would throw an error about missing something in S3. I am guessing the releases are stored in S3 to pull from?

I am updating version at a time to see if at some point it starts working again, but so far no luck. Every version I try to apply goes through the motions for a minute, then fails, fails to rollback and I have to continue the rollback manually. Then I go in and change the CloudFormation version manually, wait for that to update, and try the next one. When I do this, is there anything important that I am missing from these updates? Are there migrations that aren’t being applied with this method?

I spent most of the day going through version by version and at no point did it actually update. Changing the version manually at the very least didn’t update the ECS agent version. So my questions are twofold:

  1. Is there a way to manually update anything to get around this issue? I get that it is an issue on AWS’s side, but where is the IAM payload that it is rejecting at? I couldn’t find anything in my IAM console with a lot of permissions that would be causing the issue. If I had an idea where to look I might be able to work around this myself. I really doubt AWS is going to fix this anytime soon…
  2. Are there migrations I missed from hacking the version like this that I need to figure out another way to apply?