Need help with CloudFormation error while trying to upgrade Redis version

nathan.f77 · September 29, 2020, 4:48pm

Hello,

I’m running into an error in CloudFormation when I try to upgrade my Redis resource. I’m trying to run this command:

convox rack resources update redis-**** EngineVersion=5.0.6 AutomaticFailoverEnabled=true NumCacheClusters=2 --wait

This worked fine on my staging rack, but I’m getting UPDATE_FAILED for my production rack:

The subnet IDs subnet-********,subnet-******** are in use. (Service: AmazonElastiCache; Status Code: 400; Error Code: SubnetInUse; Request ID: ********; Proxy: null)

I’m still running an older version of the convox rack (gen 2): 20191120232059

$ convox rack resources info redis-****
Name     redis-****
Type     redis
Status   running
Options  AutomaticFailoverEnabled=false
         Database=0
         Encrypted=false
         InstanceType=cache.t2.medium
         NumCacheClusters=1

Does anyone have any idea how I might be able to fix this?

I’ve considered creating a new app-level Redis resource and swapping the REDIS_URL during a deploy, but that will cause some downtime, since my worker processes always update much faster than the web processes.

Any help would be greatly appreciated!

matt · February 10, 2021, 2:48pm

Hey @nathan.f77 did you manage to resolve this?

nathan.f77 · February 11, 2021, 9:40am

Hi @matt, yes fortunately I was able to resolve this issue. I reached out to AWS support and got a few more details about the cause:

---> To identify the cause behind the aforementioned error, I checked the latest update call made on the production stack using our internal tools. I noticed that an API call, ‘ModifyCacheSubnetGroup’ was initiated during the latest stack update to modify the subnetIds in the Cloudformation resource, ‘CacheSubnetGroup’ . Please refer to [1] for more information on ‘ModifyCacheSubnetGroup’.
Old subnetIds :
****************************
subnet-ab6f****, subnet-57d3****, subnet-7cb3****
****************************
New subnetIds :
****************************
subnet-9b3a****, subnet-6de8****, subnet-2022****
****************************

I was not able to resolve the subnet problem for the Convox Redis resource, so I ended up migrating to a new Redis cluster.

I figured out how I can perform a zero-downtime migration by running two sets of worker containers at the same time, for both Redis clusters. So I did something like this:

Set OLD_REDIS_URL to REDIS_URL. (Don’t promote.)
a) Add another set of worker containers that always perform jobs from OLD_REDIS_URL.
b) Add a new Redis resource to convox.yml
c) Deploy
Set REDIS_URL to the new Redis URL and promote the release
Remove worker_old service from convox.yml. Deploy the change. Now all web and worker containers are using the new Redis.
Delete the redis rack resource
Unset OLD_REDIS_URL

This way, I was running two sets of workers that were using both Redis instances at the same time, so I could perform a zero-downtime migration without losing any background jobs.

Topic		Replies	Views
Error switching back to Private=No Rack (Version 2)	10	900	April 27, 2019
Problems upgrading rack from 20190111211123 Rack (Version 2)	1	627	December 14, 2019
Trying to change AvailabilityZones of a private rack Rack (Version 2)	0	387	February 2, 2022
Updating ElasticCache Memcached instances Rack (Version 2)	2	1284	August 24, 2020
[20190128190529] Additional Rack Parameters Releases	6	810	January 30, 2019

Need help with CloudFormation error while trying to upgrade Redis version

Related topics