Deploying a FastAPI Service to EKS with GitOps and Zero Long-Lived AWS Keys
Deploying a FastAPI Service to EKS with GitOps and Zero Long-Lived AWS Keys
Getting a Python service to production on Kubernetes involves more moving parts than it looks: package management, multi-stage Docker builds, IAM federation, CI/CD, Helm charts, and GitOps wiring. This post walks through the full setup end to end — including the two production gotchas that cost me time.
The Stack
- API: FastAPI
- Packaging: Python
uv(fast package manager + virtual env) - Container: Docker (multi-stage build)
- Registry: AWS ECR
- Cluster: AWS EKS (Kubernetes)
- CI/CD: GitHub Actions
- GitOps: ArgoCD + Helm
Part 1: Running Locally with uv
The project uses uv as a modern, fast replacement for pip + virtualenv. You don't need to create a virtual environment manually — uv handles it.
# Install dependencies and create .venv automatically
uv sync
# Run the API server
uv run uvicorn main:app --reload
# Run tests
uv run pytest
uv sync reads pyproject.toml and uv.lock, creates a .venv, and installs all dependencies. The uv run prefix ensures commands use that virtual environment without needing to activate it.
Part 2: Docker — Multi-Stage Build
Some dependencies require compiling C extensions at install time. The build environment needs build-essential, but that shouldn't be in the final image.
# Stage 1: Builder — has gcc/make to compile C extensions
FROM ghcr.io/astral-sh/uv:python3.13-bookworm-slim AS builder
WORKDIR /app
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
&& rm -rf /var/lib/apt/lists/*
COPY pyproject.toml uv.lock ./
RUN uv sync --frozen --no-dev --no-install-project # deps only (layer cache)
COPY main.py cli.py ./
COPY app/ ./app/
RUN uv sync --frozen --no-dev # install project
# Stage 2: Runtime — slim, no build tools
FROM python:3.13-slim-bookworm AS runtime
WORKDIR /app
RUN groupadd --gid 1000 appuser && \
useradd --uid 1000 --gid 1000 --no-create-home appuser
COPY --from=builder /app/.venv /app/.venv
COPY main.py ./
COPY app/ ./app/
ENV PATH="/app/.venv/bin:$PATH"
ENV PYTHONUNBUFFERED=1
USER appuser
EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "2"]
Three decisions worth calling out:
- Two-stage build: compiler tools stay in the builder stage, runtime image stays small and secure
- Non-root user: runs as
appuser(uid 1000), not root - Layer caching: dependencies are copied and installed before application code — code changes don't invalidate the slow compilation step
Part 3: AWS Infrastructure Setup
EKS Cluster
I used eksctl to create the cluster. It's a purpose-built CLI (not a wrapper around the aws CLI — it uses the AWS SDK directly) that handles VPC, subnets, security groups, IAM roles, and the EC2 nodegroup in one shot.
eksctl create cluster \
--name your-service \
--region us-east-1 \
--version 1.29 \
--nodegroup-name workers \
--node-type t3.medium \
--nodes 2 \
--nodes-min 2 \
--nodes-max 4 \
--managed
These three tools serve completely different purposes and it's worth being clear on which one to reach for:
| Tool | Talks to | Used for |
|---|---|---|
eksctl |
AWS APIs | Create/delete clusters, nodegroups |
kubectl |
Kubernetes API server | Deploy apps, check pods, port-forward |
aws CLI |
AWS APIs | IAM, ECR, security groups, everything else |
Once the cluster is created, you rarely need eksctl again. Day-to-day work is kubectl and aws CLI.
ECR Repository
aws ecr create-repository \
--repository-name your-service \
--region us-east-1 \
--image-scanning-configuration scanOnPush=true
One ECR repository per service is the standard pattern. The repository URI goes into helm/your-service/values.yaml.
OIDC Federation (No Long-Lived AWS Keys)
Instead of storing AWS access keys as GitHub secrets, I used OIDC federation — GitHub Actions gets a short-lived token from AWS via a trust relationship.
# 1. Register GitHub as an identity provider (once per account)
aws iam create-open-id-connect-provider \
--url https://token.actions.githubusercontent.com \
--client-id-list sts.amazonaws.com \
--thumbprint-list 6938fd4d98bab03faadb97b34396831e3780aea1
# 2. Create IAM role with trust policy scoped to your repo
aws iam create-role \
--role-name github-actions-ecr \
--assume-role-policy-document file://trust-policy.json
# 3. Attach ECR permissions
aws iam attach-role-policy \
--role-name github-actions-ecr \
--policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryPowerUser
The trust policy allows only your specific repo to assume the role:
{
"Condition": {
"StringLike": {
"token.actions.githubusercontent.com:sub": "repo:<OWNER>/your-service:*"
}
}
}
The only GitHub secret needed is AWS_ACCOUNT_ID — no access keys.
Part 4: CI/CD with GitHub Actions
The pipeline runs four jobs in sequence on every push to main:
lint → test → build-and-push → update-helm-values
Job 1 & 2: Lint and Test
- run: uv run ruff check .
- run: uv run ruff format --check .
- run: uv run pytest --cov=app --cov-report=xml
Job 3: Build and Push to ECR
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::${{ secrets.AWS_ACCOUNT_ID }}:role/github-actions-ecr
aws-region: us-east-1
- name: Build and push Docker image
uses: docker/build-push-action@v6
with:
context: .
push: true
tags: |
${{ env.ECR_URI }}:${{ github.sha::7 }}
${{ env.ECR_URI }}:latest
cache-from: type=registry,ref=${{ env.ECR_URI }}:latest
cache-to: type=inline
The image is tagged with the first 7 characters of the Git SHA (e.g., 4565a6a).
Job 4: Update Helm Values (GitOps Trigger)
After pushing the image, CI updates values.yaml with the new tag and commits it back to the repo:
- run: |
sed -i "s/^ tag: .*/ tag: \"${{ needs.build-and-push.outputs.image_tag }}\"/" \
helm/your-service/values.yaml
- run: |
git config user.name "github-actions[bot]"
git add helm/your-service/values.yaml
git commit -m "ci: update image tag to $IMAGE_TAG"
git push
This commit is what triggers ArgoCD to detect a drift and sync.
Part 5: Helm Chart
The Helm chart lives at helm/your-service/ and templates a Deployment, Service, HorizontalPodAutoscaler, ServiceAccount, and a Secret placeholder. values.yaml is the single configuration knob:
replicaCount: 2
image:
repository: <ACCOUNT_ID>.dkr.ecr.us-east-1.amazonaws.com/your-service
tag: "4565a6a" # auto-updated by CI
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 10
targetCPUUtilizationPercentage: 70
env:
API_KEY:
valueFrom:
secretKeyRef:
name: your-service-secret
key: api-key
The API key is injected from a Kubernetes Secret created manually:
kubectl create secret generic your-service-secret \
--from-literal=api-key=<YOUR_API_KEY> \
-n your-service
Part 6: ArgoCD — GitOps Deployments
ArgoCD is installed on the same EKS cluster and watches the GitHub repo. When values.yaml changes (after CI updates the image tag), ArgoCD detects the drift and runs helm upgrade automatically.
# argocd/application.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: your-service
namespace: argocd
spec:
source:
repoURL: https://github.com/<OWNER>/your-service.git
targetRevision: main
path: helm/your-service
destination:
server: https://kubernetes.default.svc
namespace: your-service
syncPolicy:
automated:
prune: true # delete resources removed from Git
selfHeal: true # revert manual changes to match Git
The full deployment flow once everything is wired up:
Developer pushes code
│
▼
GitHub Actions: lint → test → build → push to ECR
│
▼
CI commits new image tag to values.yaml
│
▼
ArgoCD detects Git change (polls every 3 min)
│
▼
ArgoCD runs helm upgrade with new values
│
▼
Kubernetes rolls out new pods (rolling update)
│
▼
Old pods terminate after new pods are healthy
ArgoCD doesn't replace Helm — it uses Helm under the hood. The difference: Helm is imperative (you run helm upgrade), ArgoCD is declarative (you commit to Git and ArgoCD converges the cluster). ArgoCD adds drift detection, selfHeal, and rollback via the UI on top of that.
Lessons Learned
eksctl has credential issues with AWS SSO
If you use SSO or temporary session credentials, eksctl (Go SDK) may fail to refresh tokens. Workaround:
eval "$(aws configure export-credentials --format env)"
This exports the current credentials as environment variables that the Go SDK can read.
Bootstrap problem: nodes need ECR read permissions before ArgoCD can deploy
When ArgoCD first tried to deploy, pods went into ImagePullBackOff — the nodegroup IAM role didn't have permission to pull from ECR. The fix:
aws iam attach-role-policy \
--role-name <NodeInstanceRole> \
--policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
This needs to be set up before the first deploy, not after pods start failing.
Layer caching matters for compile-heavy dependencies
Splitting the COPY + RUN uv sync into two steps — deps first, then application code — means code changes don't re-trigger the slow C compilation step. A build that takes 4 minutes cold runs in 30 seconds when only the app code changed.
The Final Flow
Once everything is set up, the developer workflow is:
git push origin main
Everything else is automatic:
- GitHub Actions lints, tests, builds, and pushes the image
- CI commits the new image tag to
values.yaml - ArgoCD detects the Git change and syncs
- Kubernetes rolls out new pods with zero downtime
To verify:
kubectl get pods -n your-service -o wide
argocd app get your-service
kubectl port-forward svc/your-service 9000:80 -n your-service
curl http://localhost:9000/api/v1/health
Summary
| Component | Purpose |
|---|---|
uv |
Fast Python package manager, replaces pip + venv |
| FastAPI | API framework |
| Docker (multi-stage) | Small, secure runtime image |
| ECR | Docker image registry |
| EKS | Managed Kubernetes cluster |
| OIDC federation | Keyless AWS auth from GitHub Actions |
| GitHub Actions | CI: lint → test → build → push → update Helm |
| Helm | Kubernetes manifest templating |
| ArgoCD | GitOps: auto-sync cluster state to Git |
The key insight of this setup: Git is the source of truth. No one runs kubectl apply or helm upgrade manually. The cluster always reflects what's in the repo.