BUG: Endpoints with existing VPC will break rack updates

Hi,

We believe to hit a bug with the new “VPC endpoints” feature whilst attempting to update two of our racks from 20220310121318 to 20220427210019.

We believe the bug is due to a series of releases starting from 20220328135225, released by heronrs (Heron Rossi) · GitHub (is that you @heron ?)

Relevant CloudFormation events highlighting the issue are:

convox-rack-rtc              UPDATE_ROLLBACK_IN_PROGRESS    The following resource(s) failed to create: [CFEndpoint, KMSEndpoint, ECSEndpoint]. The following resource(s) failed to update: [InstancesLifecycleHandler].
InstancesLifecycleHandler    UPDATE_FAILED                  Resource update cancelled
ECSEndpoint                  CREATE_FAILED                  private-dns-enabled cannot be set because there is already a conflicting DNS domain for ecs.us-east-1.amazonaws.com in the VPC vpc-dfa0a*** (Service: AmazonEC2; Status Code: 400; Error Code: InvalidParameter; Request ID: 6e5ebffe-d686-4891-adbb-f0873ba6e***; Proxy: null)
KMSEndpoint                  CREATE_FAILED                  private-dns-enabled cannot be set because there is already a conflicting DNS domain for kms.us-east-1.amazonaws.com in the VPC vpc-dfa0a*** (Service: AmazonEC2; Status Code: 400; Error Code: InvalidParameter; Request ID: 92e592d5-8af1-42c6-bbaf-cda7ee05b***; Proxy: null)
CFEndpoint                   CREATE_FAILED                  private-dns-enabled cannot be set because there is already a conflicting DNS domain for cloudformation.us-east-1.amazonaws.com in the VPC vpc-dfa0a*** (Service: AmazonEC2; Status Code: 400; Error Code: InvalidParameter; Request ID: da32833d-99fd-42f5-9040-4e6151a27***; Proxy: null)

A little bit of context so that you can better understand how our racks are set up:

There are two racks (A and B), and they share a single VPC (i.e., B's ExistingVpc has the ID of A and they both share the same VPCCIDR).

A was created a long while before B, and when we’ve created B here’s the command we’ve used:

convox rack install aws \
  --name <rack-name> \
  ExistingVpc="<vpc-id>" \
  VPCCIDR="10.0.0.0/16" \
  Subnet0CIDR="10.0.7.0/24" \
  Subnet1CIDR="10.0.8.0/24" \
  Subnet2CIDR="10.0.9.0/24" \
  SubnetPrivate0CIDR="10.x.10.0/24" \
  SubnetPrivate1CIDR="10.x.11.0/24" \
  SubnetPrivate2CIDR="10.x.12.0/24" \
  InternetGateway="<igw-id>"

Up until this update however, we’ve always updated B first, and then A. On this update cycle for whatever reason we’ve first updated A (successfully) instead, but failed to update B afterwards due to the errors mentioned above.

I guess the new “VPC endpoints” feature has some rough edges still.

Would it be possible to issue a bug fix relatively quick as this is blocking our production environment’s ability to update to newer releases.

Thank you!

1 Like

Hey, @asd thanks for reporting this. I’ll have a look and get back to you asap.

1 Like

Hey, @heron . I appreciate you! Feel free to loop me in if you need more feedback.

hI @asd just letting you know that we’re working on a solution. You can track the PR here Don't create VPC endpoint if exist by Twsouza · Pull Request #3541 · convox/rack · GitHub

Hi @heron , I just saw that #3541 was closed via #3542.

Does that mean this issue is now solved and we could attempt to update our rack? I assume not, because there wasn’t a convox release containing this PR. Am I missing something?

Thank you!

Hi @asd we released the fix for the problem you reported, you can update your racks to Release 20220607201406 · convox/rack · GitHub
any questions, feel free to reach us.

Thank you! I will try the new release ASAP. I appreciate you, taking time to fix this.