From Local Dev to Production: Deploying a Python API to AWS EKS with GitOps
From Local Dev to Production: Deploying a Python API to AWS EKS with GitOps
I recently went through the full journey of taking a Python API — built with FastAPI — from running locally to a production-grade deployment on AWS EKS, complete with a CI/CD pipeline via GitHub Actions, and GitOps-based deployments via ArgoCD. This post walks through every step, including the mistakes I made along the way.
The Stack
- API: FastAPI
- Packaging: Python
uv(fast package manager + virtual env) - Container: Docker (multi-stage build)
- Registry: AWS ECR
- Cluster: AWS EKS (Kubernetes)
- CI/CD: GitHub Actions
- GitOps: ArgoCD + Helm
Part 1: Running Locally with uv
The project uses uv as a modern, fast replacement for pip + virtualenv. You don't need to create a virtual environment manually — uv handles it.
# Install dependencies and create .venv automatically
uv sync
# Run the API server
uv run uvicorn main:app --reload
# Run tests
uv run pytest
uv sync reads pyproject.toml and uv.lock, creates a .venv, and installs all dependencies. The uv run prefix ensures commands use that virtual environment without needing to activate it.
Part 2: Docker — Multi-Stage Build
Some dependencies may require compiling C extensions at install time. This means the build environment needs build-essential, but we don't want that in the final image.
# Stage 1: Builder — has gcc/make to compile C extensions
FROM ghcr.io/astral-sh/uv:python3.13-bookworm-slim AS builder
WORKDIR /app
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
&& rm -rf /var/lib/apt/lists/*
COPY pyproject.toml uv.lock ./
RUN uv sync --frozen --no-dev --no-install-project # deps only (layer cache)
COPY main.py cli.py ./
COPY app/ ./app/
RUN uv sync --frozen --no-dev # install project
# Stage 2: Runtime — slim, no build tools
FROM python:3.13-slim-bookworm AS runtime
WORKDIR /app
RUN groupadd --gid 1000 appuser && \
useradd --uid 1000 --gid 1000 --no-create-home appuser
COPY --from=builder /app/.venv /app/.venv
COPY main.py ./
COPY app/ ./app/
ENV PATH="/app/.venv/bin:$PATH"
ENV PYTHONUNBUFFERED=1
USER appuser
EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "2"]
Key decisions:
- Two-stage build: compiler tools stay in the builder, runtime image stays small and secure
- Non-root user: runs as
appuser(uid 1000), not root - Layer caching: dependencies copied and installed before application code, so code changes don't invalidate the deps layer
Part 3: AWS Infrastructure Setup
EKS Cluster
I used eksctl to create the cluster. eksctl is a purpose-built CLI (not a wrapper around aws CLI — it uses the AWS SDK directly) that handles the full cluster setup: VPC, subnets, security groups, IAM roles, and EC2 nodegroup.
eksctl create cluster \
--name your-service \
--region us-east-1 \
--version 1.29 \
--nodegroup-name workers \
--node-type t3.medium \
--nodes 2 \
--nodes-min 2 \
--nodes-max 4 \
--managed
eksctl vs kubectl vs aws CLI — these serve very different purposes:
| Tool | Talks to | Used for |
|---|---|---|
eksctl |
AWS APIs | Create/delete clusters, nodegroups |
kubectl |
Kubernetes API server | Deploy apps, check pods, port-forward |
aws CLI |
AWS APIs | IAM, ECR, security groups, everything else |
Once the cluster is created, you rarely need eksctl again. Day-to-day work is kubectl and aws CLI.
ECR Repository
aws ecr create-repository \
--repository-name your-service \
--region us-east-1 \
--image-scanning-configuration scanOnPush=true
One ECR repository per service is the standard pattern. The repository URI (<ACCOUNT_ID>.dkr.ecr.us-east-1.amazonaws.com/your-service) goes into helm/your-service/values.yaml.
OIDC Federation (No Long-Lived AWS Keys)
Instead of storing AWS access keys as GitHub secrets, I used OIDC federation — GitHub Actions gets a short-lived token from AWS via a trust relationship.
# 1. Register GitHub as an identity provider (once per account)
aws iam create-open-id-connect-provider \
--url https://token.actions.githubusercontent.com \
--client-id-list sts.amazonaws.com \
--thumbprint-list 6938fd4d98bab03faadb97b34396831e3780aea1
# 2. Create IAM role with trust policy scoped to your repo
aws iam create-role \
--role-name github-actions-ecr \
--assume-role-policy-document file://trust-policy.json
# 3. Attach ECR permissions
aws iam attach-role-policy \
--role-name github-actions-ecr \
--policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryPowerUser
The trust policy allows only your specific repo to assume the role:
{
"Condition": {
"StringLike": {
"token.actions.githubusercontent.com:sub": "repo:<OWNER>/your-service:*"
}
}
}
The only GitHub secret needed is AWS_ACCOUNT_ID — no access keys.
Part 4: CI/CD with GitHub Actions
The pipeline has four jobs that run in sequence on every push to main:
lint → test → build-and-push → update-helm-values
Job 1 & 2: Lint and Test
- run: uv run ruff check .
- run: uv run ruff format --check .
- run: uv run pytest --cov=app --cov-report=xml
Job 3: Build and Push to ECR
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::${{ secrets.AWS_ACCOUNT_ID }}:role/github-actions-ecr
aws-region: us-east-1
- name: Build and push Docker image
uses: docker/build-push-action@v6
with:
context: .
push: true
tags: |
${{ env.ECR_URI }}:${{ github.sha::7 }}
${{ env.ECR_URI }}:latest
cache-from: type=registry,ref=${{ env.ECR_URI }}:latest
cache-to: type=inline
The image is tagged with the first 7 characters of the Git SHA (e.g., 4565a6a).
Job 4: Update Helm Values (GitOps Trigger)
This is the GitOps glue. After pushing the image, CI updates values.yaml with the new tag and commits it back to the repo:
- run: |
sed -i "s/^ tag: .*/ tag: \"${{ needs.build-and-push.outputs.image_tag }}\"/" \
helm/your-service/values.yaml
- run: |
git config user.name "github-actions[bot]"
git add helm/your-service/values.yaml
git commit -m "ci: update image tag to $IMAGE_TAG"
git push
This commit triggers ArgoCD to detect a drift and sync.
Part 5: Helm Chart
The Helm chart lives at helm/your-service/ and templates:
Deployment— runs the API podsService— ClusterIP on port 80 → 8000HorizontalPodAutoscaler— scales pods 2–10 based on CPUServiceAccountSecret— placeholder (actual secret created manually viakubectl)
values.yaml is the single knob for configuration:
replicaCount: 2
image:
repository: <ACCOUNT_ID>.dkr.ecr.us-east-1.amazonaws.com/your-service
tag: "4565a6a" # auto-updated by CI
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 10
targetCPUUtilizationPercentage: 70
env:
API_KEY:
valueFrom:
secretKeyRef:
name: your-service-secret
key: api-key
The API key is injected from a Kubernetes Secret created manually:
kubectl create secret generic your-service-secret \
--from-literal=api-key=<YOUR_API_KEY> \
-n your-service
Part 6: ArgoCD — GitOps Deployments
ArgoCD is installed on the same EKS cluster and watches the GitHub repo. When values.yaml changes (after CI updates the image tag), ArgoCD detects the drift and runs helm upgrade automatically.
# argocd/application.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: your-service
namespace: argocd
spec:
source:
repoURL: https://github.com/<OWNER>/your-service.git
targetRevision: main
path: helm/your-service
destination:
server: https://kubernetes.default.svc
namespace: your-service
syncPolicy:
automated:
prune: true # delete resources removed from Git
selfHeal: true # revert manual changes to match Git
How ArgoCD works
ArgoCD runs inside the cluster as a set of pods. It polls your Git repo every ~3 minutes. When it detects a difference between what's in Git and what's running in the cluster, it syncs — running the equivalent of helm upgrade to bring the cluster in line with Git.
Developer pushes code
│
▼
GitHub Actions: lint → test → build → push to ECR
│
▼
CI commits new image tag to values.yaml
│
▼
ArgoCD detects Git change (polls every 3 min)
│
▼
ArgoCD runs helm upgrade with new values
│
▼
Kubernetes rolls out new pods (rolling update)
│
▼
Old pods terminate after new pods are healthy
ArgoCD vs Helm directly
ArgoCD doesn't replace Helm — it uses Helm under the hood. The difference: Helm is imperative (you run helm upgrade), ArgoCD is declarative (you commit to Git and ArgoCD converges). ArgoCD also adds drift detection, selfHeal, and rollback via the UI.
Part 7: Kubernetes Concepts That Came Up
Port Forwarding
kubectl port-forward doesn't connect to the node's public IP. It tunnels through the Kubernetes API server:
Your laptop → EKS API Server (HTTPS:443) → Pod (private IP)
The pod never needs to be publicly exposed. The API server endpoint is what's publicly accessible, and your ~/.kube/config already has the credentials to reach it.
kubectl port-forward svc/your-service 9000:80 -n your-service
curl http://localhost:9000/api/v1/health
Node Security
EKS nodes got public IPs assigned (because the VPC subnets have mapPublicIpOnLaunch=true). This sounds scary but the security group only allows:
- Inbound from other nodes in the same group
- Inbound from the EKS control plane
No 0.0.0.0/0 rules — the public IPs are unreachable from the internet. For proper security, nodes should be in private subnets, but the security group rules make it safe in practice.
Pod Autoscaling vs Node Scaling
Two different levels of scaling:
- HPA (Horizontal Pod Autoscaler) — adds more pod replicas when CPU exceeds 70%. Runs on existing nodes. Already configured.
- Cluster Autoscaler — adds more EC2 nodes when pods can't be scheduled due to lack of capacity. Requires separate installation.
To manually scale nodes:
aws eks update-nodegroup-config \
--cluster-name your-service \
--nodegroup-name workers \
--scaling-config minSize=2,maxSize=6,desiredSize=3 \
--region us-east-1
Multiple Nodegroups
You'd create multiple nodegroups for different workload types (e.g., spot instances for batch, GPU nodes for ML). Pods are assigned to nodegroups via nodeSelector labels — not manually per pod:
# On the nodegroup (set at creation time)
--node-labels workload=spot
# On the pod/deployment
nodeSelector:
workload: spot
For a single API service, one nodegroup is sufficient.
Lessons Learned
1. eksctl has credential issues with certain AWS credential types
If you use SSO or temporary session credentials, eksctl (Go SDK) may fail to refresh tokens. Workaround:
eval "$(aws configure export-credentials --format env)"
This exports the current credentials as environment variables that the Go SDK can read.
2. Bootstrap problem: nodes need ECR read permissions
When ArgoCD first deployed before CI had run, pods went into ImagePullBackOff — the ECR repo was empty. Even after CI pushed an image, the nodegroup IAM role needed the AmazonEC2ContainerRegistryReadOnly policy:
aws iam attach-role-policy \
--role-name <NodeInstanceRole> \
--policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
3. Layer caching in Docker matters for compile-heavy dependencies
Dependencies that compile C code can be slow to install. Splitting the COPY + RUN uv sync into two steps (deps first, then code) means code changes don't re-trigger the slow compilation step.
The Final Flow
Once everything is set up, the developer workflow is simply:
git push origin main
Everything else is automatic:
- GitHub Actions lints, tests, builds, and pushes the image
- CI commits the new image tag to
values.yaml - ArgoCD detects the Git change and syncs
- Kubernetes rolls out new pods with zero downtime
To verify:
# Check pods
kubectl get pods -n your-service -o wide
# Check ArgoCD sync status
argocd app get your-service
# Test the API
kubectl port-forward svc/your-service 9000:80 -n your-service
curl http://localhost:9000/api/v1/health
Summary
| Component | Purpose |
|---|---|
uv |
Fast Python package manager, replaces pip + venv |
| FastAPI | API framework |
| Docker (multi-stage) | Small, secure runtime image |
| ECR | Docker image registry |
| EKS | Managed Kubernetes cluster |
| OIDC federation | Keyless AWS auth from GitHub Actions |
| GitHub Actions | CI: lint → test → build → push → update Helm |
| Helm | Kubernetes manifest templating |
| ArgoCD | GitOps: auto-sync cluster state to Git |
The key insight of this setup is that Git is the source of truth. No one runs kubectl apply or helm upgrade manually. The cluster always reflects what's in the repo.