Feedback on v3 Racks?

With the Kubernetes-based Racks being released for a few weeks now, we’re wondering who has tried them out? If you have, how have you found them? If you haven’t, what’s stopped you from doing so?

Free credits are available from many of the cloud providers - $200 from Azure, $300 from GCP, AWS free tier, and Digital Ocean just being extremely competitive on price (look out for an upcoming blog post on this topic!) - this means that it’s extremely frictionless to get a Rack going for testing.

Many users are now running their production apps on v3 Racks, and experiencing the benefits of doing so. Deploys, rollbacks and scaling can be an order of magnitude faster. Running a staging Rack on a cheaper cloud provider makes a lot of sense. Support for custom load balancing allows for more service configuration options.

Is there anything else you’d like to see in the new Racks?

Thanks :blush:

  • Support for RDS (Database resources) / Elasticache (Redis resources) / Other Cloud Provider equivalents?

  • Support for AWS Certificate Manager based SSL certificates (or other cloud provider equivalents), while Lets Encrypt is a great tool, its not a perfect fit for all use cases and sometimes you need something else.

  • Support for using spot instances (AWS) and equivalents on other clouds out of the box

  • The documentation needs more work.

    • It would be nice to have the documentation mention GRPC and HTTP2, they aren’t very obscure but were completely unsupported in older racks, and should be possible one way or another using the v3 racks. Do we need a custom balancer using raw TCP ports to support HTTP2? (GRPC Requires HTTP2) or does the default load balancer now support HTTP2?

    • How well does Convox play with any manual Kubernetes management? Can I use kubectl to directly schedule work into the cluster without Convox having any issues? For instance if I wanted to deploy a Kubernetes operator from a vendor like MongoDB or Elastic … or start using a tool like KubeDB to have more database/resource options in my Kubernetes cluster (Their supported tools list)

    • The blog post talks about terraform power users being able to modify the config.

      This means most people will just use our installer which performs all the Terraform magic for you, but if you are a Terraform power user and you want to customise your installation, you can do so very easily.

      How does one do this? The install documentation has only one mention of terraform “Install Terraform” (which is repeated 6 times, once in each development and production install guide) Theres nothing more about how terraform fits in, or how it can be used in a more advanced way to customise anything. (In my case I’m looking at this in order to use AWS Spot instances for my development cluster.)

    • The CLI reference docs don’t seem to be built using any kind of “code first” documentation pipeline. e.g. The CLI reference page for the rack subcommand has no mention of the install sub-subcommand or its options Convox ( install related commands get a short mention in the sub pages under Convox ). Having proper complete documentation of all the commands, sub commands, sub sub commands and their options, is very important.

Thats all I can think of at the moment, but I’ll probably think of more once I’ve started using it more heavily in the next few weeks.

Edit 1:

2 Likes

Thanks @sam, that’s great feedback thanks! I’ll definitely look at the docs for you and we’ll discuss the other bits :blush:

More doc feedback for you @ed_convox

  • The documentation only covers install using the command line, it should also mention the web UI (and likely include a mention of what happens to the terraform state in this case)
  • The documentation doesn’t include any explanation how to add a v3 rack that was created with the CLI into the web console. (I worked it out with an educated guess, but it really should be in the documentation so people don’t have to guess about basic setup tasks)

Edit (more stuff)

  • The documentation for deployment could do with some more of the “obvious headings” that were in the v2 docs. Like the “Creating an application” section in the v2 docs. The info is in there, but this is another instance of the v3 docs being in need of work to improve their usability.
  • After importing a v3 rack created via the CLI there appears to be no usage statistics in the dashboard. ( I have not created a v3 rack using the dashboard to see if this is different) - If this is normal, it should be documented somewhere.
1 Like

Hey @sam, thanks for the further feedback! On those points:

  • we explicitly tried to keep the Console stuff out of the initial docs to keep things cleaner and (hopefully) less confusing for new people to the platform. Initially, v3 installation wasn’t available in the Console either, but is now! Console docs are available from within the Console itself but they could also do with beefing up.
  • rack mv reference docs are now in there: https://docs.convox.com/reference/cli/rack#rack-mv but I agree a more prosaic explanation would be good!
  • What other ‘obvious headings’ would you be looking for? We do have the Local Development page (https://docs.convox.com/tutorials/local-development) which covers getting you up and running with an app, and I’m bringing in the Example Application documentation (and extending it) shortly.
  • stats for v3 Racks are coming shortly :blush:

For us, it was the following:

  • The cost. The blog post seems to miss the AWS NAT cost which we found to be about $100/month. Perhaps we configured something wrong or calculated it wrong, but in talking with David, it seems 3 NATs are the default and they’re not cheap.
  • The lack of metrics. Since v3 racks aren’t using the ALB, we lose the ALB routing metrics. On top of that, convox metrics don’t work. We have app metrics, but with AWS, we need to know metrics from the point of ingestion – e.g. we need to answer the question is AWS sending back 502 before the requests ever reach our apps? If we don’t have that, we’re flying blind. To add metrics, we estimated the cost of using data dog to $200/month for us. For a rack with a base cost of $244 (by our estimation), adding the analytics was both an extra cost and maintenance for us. Also, the convox stats are very helpful, but for us, the critical part is knowing if the load balancer/ingress point is responding with 50x before the request hits our app server. The current v2 stats don’t show that (though they are useful in other regards).

Other than those two things, we really loved the v3 racks. I would echo the question about manually configuring k8s – we had to do some custom configuration to make datadog work for us. We knew that if we updated the rack version, we’d have to reapply those changes and we didn’t love that part. But, we kinda assumed that mucking around sometimes would be inevitable.

1 Like

Speaking of stats, metrics. and so on. The current V3 documentation for DataDog is broken. Not because the documentation is wrong, but because v3 cannot deploy DataDog correctly.

The current code for deploying an agent, while it makes a DaemonSet and does things mostly correct, it tries to setup some (at least in the case of DataDog) unnecessary service related stuff, which results in scheduling 3 pods, but only one running pod, as the other two fail to schedule producing this error from the cluster scheduler.

Warning FailedScheduling 57s (x5661 over 5d21h) default-scheduler 0/3 nodes are available: 1 node(s) didn't have free ports for the requested pod ports, 2 node(s) didn't match node selector, 3 Insufficient pods.

In addition, if it was working, as a normal Convox based “agent service” it can’t support the Kubernetes Cluster Agent features, and also cant enable the datadog Network performance monitoring features due to lacking the ability to deploy multi-container pods (which the network performance monitoring uses)

As a workaround to these things, I’m using the official DataDog Helm charts, but I’m also looking at the future DataDog operator. So better support / integration with Helm and Operators is definitely rising in importance the more I use v3 racks.

1 Like

More bugs/feedback for you @ed_convox , in addition to the DataDog agent bug, the v3 convox CLI seems to be unable to update rack parameters. It updates the local config files and terraform but attempts to execute them (using terraform under the hood) in such a way that `convox rack param fails due to terraform discovering the existing resources instead of applying it as a modification.

I could be wrong since I’m relatively new to terraform and I’m trying to reverse engineer these things from the go source code and the templates in the convox repo, but…

From the look of it, this issue might be fixed in the terraform AWS EKS module, https://github.com/terraform-aws-modules/terraform-aws-eks/pull/625 but this requires the AWS provider >= 2.52.0 (the convox terraform templates/sources specify ~> 2.49) and I think, to specifically designate that you want to use the EKS module from the registry… (per the instructions https://registry.terraform.io/modules/terraform-aws-modules/eks/aws/11.0.0 on ) which I don’t think the current terraform templates do.

Regardless of that, its definitely some kind of underlying terraform issue, since manually executing terraform apply in the root of the local terraform state folder causes the same error. So it looks pretty clear that its something to do with the terraform state that convox is trying to apply.

1 Like

Is there any additional info around this? I am looking into deploying a MongoDB Kubernetes operator with our setup.

Is there anything else you’d like to see in the new Racks?

I have some feature requests for v3 racks. Here are the main ones:

  • Support for setting proxy-buffer-size in ingress nginx config. I manually set this annotation on the web ingress: nginx.ingress.kubernetes.io/proxy-buffer-size: 32k. It would be very nice if I could set this as a rack param. (I was seeing a lot of 502 errors with the default 4k buffers.)
  • Support for DNS01 challenges in cert-manager for LetsEncrypt
    • This is really handy when migrating to a new Convox rack, so that I can issue SSL certs before switching the DNS records. Would make it much easier for me to set up blue/green deployments for safe Convox release updates.