Files
SnarfCode/argocd-traefik-fix.md
2026-05-21 12:49:21 -04:00

108 lines
3.5 KiB
Markdown

# ArgoCD Ingress Fix - Traefik Bad Gateway
## Environment
- **Cluster**: RKE2 managed by Rancher
- **Ingress Controller**: Traefik (kube-system namespace)
- **ArgoCD Version**: v3.4.2 (Helm chart argo-cd-9.5.14)
- **Namespace**: infrastructure
- **Hostname**: argo.snarfnet.net
## Problem
After deploying ArgoCD, accessing `https://argo.snarfnet.net` returned a **502 Bad Gateway** from Traefik.
## Root Cause
Two issues were identified:
### 1. Service TargetPort Mismatch
The ArgoCD server was listening on port **8080**, but the Kubernetes service had `targetPort: 8081`. This was corrected by patching the service to point both ports (80 and 443) to targetPort 8080.
### 2. Traefik Protocol Mismatch (Primary Issue)
The ArgoCD service defined two ports:
```yaml
ports:
- name: http
port: 80
targetPort: 8080
- name: https
port: 443
targetPort: 8080
```
The Ingress resource routed traffic to port 80, but Traefik's Kubernetes provider saw the port named `https` (443) on the service and automatically selected it, connecting to the backend using **HTTPS**:
```
"servers":[{"url":"https://10.42.1.76:8080"}]
```
However, ArgoCD was configured to run in insecure mode (`server.insecure: true`), meaning it only served plain **HTTP** on port 8080. Traefik's HTTPS connection to an HTTP backend resulted in the Bad Gateway.
Working services (Gitea, Jenkins, etc.) did not have this problem because they only exposed a single HTTP port with no `https` named port to confuse Traefik.
## Fix
Removed the `https` (port 443) entry from the `argocd-server` service, leaving only the HTTP port:
```yaml
spec:
ports:
- name: http
port: 80
targetPort: 8080
```
This forced Traefik to use `http://` when connecting to the backend, which matched ArgoCD's insecure mode.
After the change, Traefik's internal service config showed:
```
"servers":[{"url":"http://10.42.1.76:8080"}]
```
## Permanent Fix for Helm Upgrades
To prevent the Helm chart from recreating the 443 port on future upgrades, use one of these approaches:
### Option A: Annotate the Ingress
Add this annotation to the `argo-ing` Ingress resource so Traefik always uses HTTP regardless of service port names:
```yaml
metadata:
annotations:
traefik.ingress.kubernetes.io/service.serversscheme: http
```
### Option B: Helm Values
Configure the chart to not expose the HTTPS service port (check chart documentation for exact key, as it varies by version):
```yaml
configs:
params:
server.insecure: true
server:
service:
type: ClusterIP
```
## Debugging Steps That Led to the Fix
1. Verified the pod was running and healthy (`1/1 Ready`)
2. Confirmed the pod was listening on port 8080 via `/proc/net/tcp6`
3. Tested direct pod connectivity from another pod in the cluster — returned HTTP 200
4. Queried Traefik's internal API at `http://127.0.0.1:9000/api/http/services`
5. Discovered Traefik was using `https://` to connect to the backend
6. Compared with working services (Gitea, Jenkins) which all used `http://`
7. Identified the `https` named port on the service as the cause
## Key Takeaway
Traefik's Kubernetes Ingress provider infers the backend protocol from the service port name. A port named `https` causes Traefik to connect using HTTPS, regardless of what port number the Ingress backend specifies. When running ArgoCD in insecure mode behind a TLS-terminating reverse proxy, ensure the service does not expose an `https` named port, or use the `traefik.ingress.kubernetes.io/service.serversscheme` annotation to override the behavior.