Convox API in K8S Rack crashes

I think I’ve found a reason why the local rack is so unreliable for me. The convox api service seems to become unhealthy and get restarted a lot:

$ kubectl describe pod/api-767f6d7b9-8j84k -n convox
Name:               api-767f6d7b9-8j84k
Namespace:          convox
Priority:           0
PriorityClassName:  <none>
Node:               docker-desktop/192.168.65.3
Start Time:         Thu, 31 Jan 2019 10:22:33 +0100
Labels:             app=system
                    pod-template-hash=767f6d7b9
                    rack=convox
                    service=api
                    system=convox
Annotations:        scheduler.alpha.kubernetes.io/critical-pod:
Status:             Running
IP:                 10.1.0.6
Controlled By:      ReplicaSet/api-767f6d7b9
Containers:
  main:
    Container ID:  docker://70cc356c01b8cb6900cbdb22ab559e3831dfebc09248fbbf4083fad14e3b0751
    Image:         convox/rack:20190130162938
    Image ID:      docker-pullable://convox/rack@sha256:98ea788786614da93cd39efea2750040593513066fcff267d07a1854ca590a00
    Port:          5443/TCP
    Host Port:     0/TCP
    Args:
      rack
    State:          Running
      Started:      Thu, 31 Jan 2019 14:25:47 +0100
    Last State:     Terminated
      Reason:       Error
      Exit Code:    143
      Started:      Thu, 31 Jan 2019 14:21:24 +0100
      Finished:     Thu, 31 Jan 2019 14:25:44 +0100
    Ready:          True
    Restart Count:  15
    Liveness:       http-get https://:5443/check delay=5s timeout=3s period=5s #success=1 #failure=3
    Readiness:      http-get https://:5443/check delay=0s timeout=3s period=5s #success=1 #failure=3
    Environment Variables from:
      env-api  ConfigMap  Optional: false
    Environment:
      DATA:         /data
      DEVELOPMENT:  false
      ID:           5e63eec1-eedc-42e7-8aac-57fd68a47965
      IMAGE:        convox/rack:20190130162938
      RACK:         convox
      VERSION:      20190130162938
    Mounts:
      /data from data (rw)
      /var/run/docker.sock from docker (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from rack-token-8xgpl (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  data:
    Type:          HostPath (bare host directory volume)
    Path:          /var/rack/convox
    HostPathType:  DirectoryOrCreate
  docker:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/docker.sock
    HostPathType:
  rack-token-8xgpl:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  rack-token-8xgpl
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                    From                     Message
  ----     ------     ----                   ----                     -------
  Normal   Pulled     7m17s (x16 over 4h6m)  kubelet, docker-desktop  Container image "convox/rack:20190130162938" already present on machine
  Normal   Killing    7m17s (x11 over 3h5m)  kubelet, docker-desktop  Killing container with id docker://main:Container failed liveness probe.. Container will be killed and recreated.
  Normal   Created    7m16s (x15 over 4h6m)  kubelet, docker-desktop  Created container
  Normal   Started    7m15s (x15 over 4h6m)  kubelet, docker-desktop  Started container
  Warning  Unhealthy  7m7s (x27 over 4h5m)   kubelet, docker-desktop  Liveness probe failed: Get https://10.1.0.6:5443/check: dial tcp 10.1.0.6:5443: connect: connection refused
  Warning  Unhealthy  7m6s (x36 over 4h5m)   kubelet, docker-desktop  Readiness probe failed: Get https://10.1.0.6:5443/check: dial tcp 10.1.0.6:5443: connect: connection refused
  Warning  Unhealthy  3m5s (x24 over 3h38m)  kubelet, docker-desktop  Liveness probe failed: Get https://10.1.0.6:5443/check: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy  3m3s (x30 over 3h38m)  kubelet, docker-desktop  Readiness probe failed: Get https://10.1.0.6:5443/check: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

Next up is to figure out why. The logs I get with kubectl logs deployments/api -n convox does not give much information, it just abruptly ends and starts without any errors.

Would it help to scale the api out to multiple pods?

I’ve scaled it up now with this command:

kubectl scale --replicas=3 deployments/api -n convox

so we’ll see if it makes any difference.

Ran into the same issue. The kubectl scale comment worked for me. Did it ever work for you, @robert?

I’m having the exact same issue here as well. I tried to scale the deployment but have a new error in the build phase.

ERROR: EOF
1 Like

@robert @xtagon @epigrammarwebservice

Did you ever resolve this?
I assume it is related to:
“ERROR: dial tcp: lookup api.convox.svc.cluster.local: too many open files”
and to the fact that the convox CLI does not respect the “–no-sync” flag…