Back to home

From Local Dev to Production: Deploying a Python API to AWS EKS with GitOps

Zhenyu Wen
#AWS#Kubernetes#EKS#ArgoCD#GitOps#FastAPI#Python

From Local Dev to Production: Deploying a Python API to AWS EKS with GitOps

I recently went through the full journey of taking a Python API — built with FastAPI — from running locally to a production-grade deployment on AWS EKS, complete with a CI/CD pipeline via GitHub Actions, and GitOps-based deployments via ArgoCD. This post walks through every step, including the mistakes I made along the way.


The Stack

  • API: FastAPI
  • Packaging: Python uv (fast package manager + virtual env)
  • Container: Docker (multi-stage build)
  • Registry: AWS ECR
  • Cluster: AWS EKS (Kubernetes)
  • CI/CD: GitHub Actions
  • GitOps: ArgoCD + Helm

Part 1: Running Locally with uv

The project uses uv as a modern, fast replacement for pip + virtualenv. You don't need to create a virtual environment manually — uv handles it.

# Install dependencies and create .venv automatically
uv sync

# Run the API server
uv run uvicorn main:app --reload

# Run tests
uv run pytest

uv sync reads pyproject.toml and uv.lock, creates a .venv, and installs all dependencies. The uv run prefix ensures commands use that virtual environment without needing to activate it.


Part 2: Docker — Multi-Stage Build

Some dependencies may require compiling C extensions at install time. This means the build environment needs build-essential, but we don't want that in the final image.

# Stage 1: Builder — has gcc/make to compile C extensions
FROM ghcr.io/astral-sh/uv:python3.13-bookworm-slim AS builder

WORKDIR /app

RUN apt-get update && apt-get install -y --no-install-recommends \
    build-essential \
    && rm -rf /var/lib/apt/lists/*

COPY pyproject.toml uv.lock ./
RUN uv sync --frozen --no-dev --no-install-project   # deps only (layer cache)

COPY main.py cli.py ./
COPY app/ ./app/
RUN uv sync --frozen --no-dev                         # install project

# Stage 2: Runtime — slim, no build tools
FROM python:3.13-slim-bookworm AS runtime

WORKDIR /app

RUN groupadd --gid 1000 appuser && \
    useradd --uid 1000 --gid 1000 --no-create-home appuser

COPY --from=builder /app/.venv /app/.venv
COPY main.py ./
COPY app/ ./app/

ENV PATH="/app/.venv/bin:$PATH"
ENV PYTHONUNBUFFERED=1

USER appuser
EXPOSE 8000

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "2"]

Key decisions:

  • Two-stage build: compiler tools stay in the builder, runtime image stays small and secure
  • Non-root user: runs as appuser (uid 1000), not root
  • Layer caching: dependencies copied and installed before application code, so code changes don't invalidate the deps layer

Part 3: AWS Infrastructure Setup

EKS Cluster

I used eksctl to create the cluster. eksctl is a purpose-built CLI (not a wrapper around aws CLI — it uses the AWS SDK directly) that handles the full cluster setup: VPC, subnets, security groups, IAM roles, and EC2 nodegroup.

eksctl create cluster \
  --name your-service \
  --region us-east-1 \
  --version 1.29 \
  --nodegroup-name workers \
  --node-type t3.medium \
  --nodes 2 \
  --nodes-min 2 \
  --nodes-max 4 \
  --managed

eksctl vs kubectl vs aws CLI — these serve very different purposes:

Tool Talks to Used for
eksctl AWS APIs Create/delete clusters, nodegroups
kubectl Kubernetes API server Deploy apps, check pods, port-forward
aws CLI AWS APIs IAM, ECR, security groups, everything else

Once the cluster is created, you rarely need eksctl again. Day-to-day work is kubectl and aws CLI.

ECR Repository

aws ecr create-repository \
  --repository-name your-service \
  --region us-east-1 \
  --image-scanning-configuration scanOnPush=true

One ECR repository per service is the standard pattern. The repository URI (<ACCOUNT_ID>.dkr.ecr.us-east-1.amazonaws.com/your-service) goes into helm/your-service/values.yaml.

OIDC Federation (No Long-Lived AWS Keys)

Instead of storing AWS access keys as GitHub secrets, I used OIDC federation — GitHub Actions gets a short-lived token from AWS via a trust relationship.

# 1. Register GitHub as an identity provider (once per account)
aws iam create-open-id-connect-provider \
  --url https://token.actions.githubusercontent.com \
  --client-id-list sts.amazonaws.com \
  --thumbprint-list 6938fd4d98bab03faadb97b34396831e3780aea1

# 2. Create IAM role with trust policy scoped to your repo
aws iam create-role \
  --role-name github-actions-ecr \
  --assume-role-policy-document file://trust-policy.json

# 3. Attach ECR permissions
aws iam attach-role-policy \
  --role-name github-actions-ecr \
  --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryPowerUser

The trust policy allows only your specific repo to assume the role:

{
  "Condition": {
    "StringLike": {
      "token.actions.githubusercontent.com:sub": "repo:<OWNER>/your-service:*"
    }
  }
}

The only GitHub secret needed is AWS_ACCOUNT_ID — no access keys.


Part 4: CI/CD with GitHub Actions

The pipeline has four jobs that run in sequence on every push to main:

lint → test → build-and-push → update-helm-values

Job 1 & 2: Lint and Test

- run: uv run ruff check .
- run: uv run ruff format --check .
- run: uv run pytest --cov=app --cov-report=xml

Job 3: Build and Push to ECR

- name: Configure AWS credentials
  uses: aws-actions/configure-aws-credentials@v4
  with:
    role-to-assume: arn:aws:iam::${{ secrets.AWS_ACCOUNT_ID }}:role/github-actions-ecr
    aws-region: us-east-1

- name: Build and push Docker image
  uses: docker/build-push-action@v6
  with:
    context: .
    push: true
    tags: |
      ${{ env.ECR_URI }}:${{ github.sha::7 }}
      ${{ env.ECR_URI }}:latest
    cache-from: type=registry,ref=${{ env.ECR_URI }}:latest
    cache-to: type=inline

The image is tagged with the first 7 characters of the Git SHA (e.g., 4565a6a).

Job 4: Update Helm Values (GitOps Trigger)

This is the GitOps glue. After pushing the image, CI updates values.yaml with the new tag and commits it back to the repo:

- run: |
    sed -i "s/^  tag: .*/  tag: \"${{ needs.build-and-push.outputs.image_tag }}\"/" \
      helm/your-service/values.yaml

- run: |
    git config user.name "github-actions[bot]"
    git add helm/your-service/values.yaml
    git commit -m "ci: update image tag to $IMAGE_TAG"
    git push

This commit triggers ArgoCD to detect a drift and sync.


Part 5: Helm Chart

The Helm chart lives at helm/your-service/ and templates:

  • Deployment — runs the API pods
  • Service — ClusterIP on port 80 → 8000
  • HorizontalPodAutoscaler — scales pods 2–10 based on CPU
  • ServiceAccount
  • Secret — placeholder (actual secret created manually via kubectl)

values.yaml is the single knob for configuration:

replicaCount: 2

image:
  repository: <ACCOUNT_ID>.dkr.ecr.us-east-1.amazonaws.com/your-service
  tag: "4565a6a"   # auto-updated by CI

autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70

env:
  API_KEY:
    valueFrom:
      secretKeyRef:
        name: your-service-secret
        key: api-key

The API key is injected from a Kubernetes Secret created manually:

kubectl create secret generic your-service-secret \
  --from-literal=api-key=<YOUR_API_KEY> \
  -n your-service

Part 6: ArgoCD — GitOps Deployments

ArgoCD is installed on the same EKS cluster and watches the GitHub repo. When values.yaml changes (after CI updates the image tag), ArgoCD detects the drift and runs helm upgrade automatically.

# argocd/application.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: your-service
  namespace: argocd
spec:
  source:
    repoURL: https://github.com/<OWNER>/your-service.git
    targetRevision: main
    path: helm/your-service

  destination:
    server: https://kubernetes.default.svc
    namespace: your-service

  syncPolicy:
    automated:
      prune: true       # delete resources removed from Git
      selfHeal: true    # revert manual changes to match Git

How ArgoCD works

ArgoCD runs inside the cluster as a set of pods. It polls your Git repo every ~3 minutes. When it detects a difference between what's in Git and what's running in the cluster, it syncs — running the equivalent of helm upgrade to bring the cluster in line with Git.

Developer pushes code
       │
       ▼
GitHub Actions: lint → test → build → push to ECR
       │
       ▼
CI commits new image tag to values.yaml
       │
       ▼
ArgoCD detects Git change (polls every 3 min)
       │
       ▼
ArgoCD runs helm upgrade with new values
       │
       ▼
Kubernetes rolls out new pods (rolling update)
       │
       ▼
Old pods terminate after new pods are healthy

ArgoCD vs Helm directly

ArgoCD doesn't replace Helm — it uses Helm under the hood. The difference: Helm is imperative (you run helm upgrade), ArgoCD is declarative (you commit to Git and ArgoCD converges). ArgoCD also adds drift detection, selfHeal, and rollback via the UI.


Part 7: Kubernetes Concepts That Came Up

Port Forwarding

kubectl port-forward doesn't connect to the node's public IP. It tunnels through the Kubernetes API server:

Your laptop → EKS API Server (HTTPS:443) → Pod (private IP)

The pod never needs to be publicly exposed. The API server endpoint is what's publicly accessible, and your ~/.kube/config already has the credentials to reach it.

kubectl port-forward svc/your-service 9000:80 -n your-service
curl http://localhost:9000/api/v1/health

Node Security

EKS nodes got public IPs assigned (because the VPC subnets have mapPublicIpOnLaunch=true). This sounds scary but the security group only allows:

  • Inbound from other nodes in the same group
  • Inbound from the EKS control plane

No 0.0.0.0/0 rules — the public IPs are unreachable from the internet. For proper security, nodes should be in private subnets, but the security group rules make it safe in practice.

Pod Autoscaling vs Node Scaling

Two different levels of scaling:

  • HPA (Horizontal Pod Autoscaler) — adds more pod replicas when CPU exceeds 70%. Runs on existing nodes. Already configured.
  • Cluster Autoscaler — adds more EC2 nodes when pods can't be scheduled due to lack of capacity. Requires separate installation.

To manually scale nodes:

aws eks update-nodegroup-config \
  --cluster-name your-service \
  --nodegroup-name workers \
  --scaling-config minSize=2,maxSize=6,desiredSize=3 \
  --region us-east-1

Multiple Nodegroups

You'd create multiple nodegroups for different workload types (e.g., spot instances for batch, GPU nodes for ML). Pods are assigned to nodegroups via nodeSelector labels — not manually per pod:

# On the nodegroup (set at creation time)
--node-labels workload=spot

# On the pod/deployment
nodeSelector:
  workload: spot

For a single API service, one nodegroup is sufficient.


Lessons Learned

1. eksctl has credential issues with certain AWS credential types

If you use SSO or temporary session credentials, eksctl (Go SDK) may fail to refresh tokens. Workaround:

eval "$(aws configure export-credentials --format env)"

This exports the current credentials as environment variables that the Go SDK can read.

2. Bootstrap problem: nodes need ECR read permissions

When ArgoCD first deployed before CI had run, pods went into ImagePullBackOff — the ECR repo was empty. Even after CI pushed an image, the nodegroup IAM role needed the AmazonEC2ContainerRegistryReadOnly policy:

aws iam attach-role-policy \
  --role-name <NodeInstanceRole> \
  --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly

3. Layer caching in Docker matters for compile-heavy dependencies

Dependencies that compile C code can be slow to install. Splitting the COPY + RUN uv sync into two steps (deps first, then code) means code changes don't re-trigger the slow compilation step.


The Final Flow

Once everything is set up, the developer workflow is simply:

git push origin main

Everything else is automatic:

  1. GitHub Actions lints, tests, builds, and pushes the image
  2. CI commits the new image tag to values.yaml
  3. ArgoCD detects the Git change and syncs
  4. Kubernetes rolls out new pods with zero downtime

To verify:

# Check pods
kubectl get pods -n your-service -o wide

# Check ArgoCD sync status
argocd app get your-service

# Test the API
kubectl port-forward svc/your-service 9000:80 -n your-service
curl http://localhost:9000/api/v1/health

Summary

Component Purpose
uv Fast Python package manager, replaces pip + venv
FastAPI API framework
Docker (multi-stage) Small, secure runtime image
ECR Docker image registry
EKS Managed Kubernetes cluster
OIDC federation Keyless AWS auth from GitHub Actions
GitHub Actions CI: lint → test → build → push → update Helm
Helm Kubernetes manifest templating
ArgoCD GitOps: auto-sync cluster state to Git

The key insight of this setup is that Git is the source of truth. No one runs kubectl apply or helm upgrade manually. The cluster always reflects what's in the repo.