Need help with CloudFormation error while trying to upgrade Redis version

Hello,

I’m running into an error in CloudFormation when I try to upgrade my Redis resource. I’m trying to run this command:

convox rack resources update redis-**** EngineVersion=5.0.6 AutomaticFailoverEnabled=true NumCacheClusters=2 --wait

This worked fine on my staging rack, but I’m getting UPDATE_FAILED for my production rack:

The subnet IDs subnet-********,subnet-******** are in use. (Service: AmazonElastiCache; Status Code: 400; Error Code: SubnetInUse; Request ID: ********; Proxy: null)

I’m still running an older version of the convox rack (gen 2): 20191120232059

$ convox rack resources info redis-****
Name     redis-****
Type     redis
Status   running
Options  AutomaticFailoverEnabled=false
         Database=0
         Encrypted=false
         InstanceType=cache.t2.medium
         NumCacheClusters=1

Does anyone have any idea how I might be able to fix this?

I’ve considered creating a new app-level Redis resource and swapping the REDIS_URL during a deploy, but that will cause some downtime, since my worker processes always update much faster than the web processes.

Any help would be greatly appreciated!

Hey @nathan.f77 did you manage to resolve this?

Hi @matt, yes fortunately I was able to resolve this issue. I reached out to AWS support and got a few more details about the cause:

---> To identify the cause behind the aforementioned error, I checked the latest update call made on the production stack using our internal tools. I noticed that an API call, ‘ModifyCacheSubnetGroup’ was initiated during the latest stack update to modify the subnetIds in the Cloudformation resource, ‘CacheSubnetGroup’ . Please refer to [1] for more information on ‘ModifyCacheSubnetGroup’.
Old subnetIds :
****************************
subnet-ab6f****, subnet-57d3****, subnet-7cb3****
****************************
New subnetIds :
****************************
subnet-9b3a****, subnet-6de8****, subnet-2022****
****************************

I was not able to resolve the subnet problem for the Convox Redis resource, so I ended up migrating to a new Redis cluster.

I figured out how I can perform a zero-downtime migration by running two sets of worker containers at the same time, for both Redis clusters. So I did something like this:

  1. Set OLD_REDIS_URL to REDIS_URL. (Don’t promote.)

  2. a) Add another set of worker containers that always perform jobs from OLD_REDIS_URL.
    b) Add a new Redis resource to convox.yml
    c) Deploy

  3. Set REDIS_URL to the new Redis URL and promote the release

  4. Remove worker_old service from convox.yml. Deploy the change. Now all web and worker containers are using the new Redis.

  5. Delete the redis rack resource

  6. Unset OLD_REDIS_URL

This way, I was running two sets of workers that were using both Redis instances at the same time, so I could perform a zero-downtime migration without losing any background jobs.