Back to home

Deploying a FastAPI Service to EKS with GitOps and Zero Long-Lived AWS Keys

Zhenyu Wen
#AWS#Kubernetes#EKS#ArgoCD#GitOps#FastAPI#Python

Deploying a FastAPI Service to EKS with GitOps and Zero Long-Lived AWS Keys

Getting a Python service to production on Kubernetes involves more moving parts than it looks: package management, multi-stage Docker builds, IAM federation, CI/CD, Helm charts, and GitOps wiring. This post walks through the full setup end to end — including the two production gotchas that cost me time.


The Stack

  • API: FastAPI
  • Packaging: Python uv (fast package manager + virtual env)
  • Container: Docker (multi-stage build)
  • Registry: AWS ECR
  • Cluster: AWS EKS (Kubernetes)
  • CI/CD: GitHub Actions
  • GitOps: ArgoCD + Helm

Part 1: Running Locally with uv

The project uses uv as a modern, fast replacement for pip + virtualenv. You don't need to create a virtual environment manually — uv handles it.

# Install dependencies and create .venv automatically
uv sync

# Run the API server
uv run uvicorn main:app --reload

# Run tests
uv run pytest

uv sync reads pyproject.toml and uv.lock, creates a .venv, and installs all dependencies. The uv run prefix ensures commands use that virtual environment without needing to activate it.


Part 2: Docker — Multi-Stage Build

Some dependencies require compiling C extensions at install time. The build environment needs build-essential, but that shouldn't be in the final image.

# Stage 1: Builder — has gcc/make to compile C extensions
FROM ghcr.io/astral-sh/uv:python3.13-bookworm-slim AS builder

WORKDIR /app

RUN apt-get update && apt-get install -y --no-install-recommends \
    build-essential \
    && rm -rf /var/lib/apt/lists/*

COPY pyproject.toml uv.lock ./
RUN uv sync --frozen --no-dev --no-install-project   # deps only (layer cache)

COPY main.py cli.py ./
COPY app/ ./app/
RUN uv sync --frozen --no-dev                         # install project

# Stage 2: Runtime — slim, no build tools
FROM python:3.13-slim-bookworm AS runtime

WORKDIR /app

RUN groupadd --gid 1000 appuser && \
    useradd --uid 1000 --gid 1000 --no-create-home appuser

COPY --from=builder /app/.venv /app/.venv
COPY main.py ./
COPY app/ ./app/

ENV PATH="/app/.venv/bin:$PATH"
ENV PYTHONUNBUFFERED=1

USER appuser
EXPOSE 8000

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "2"]

Three decisions worth calling out:

  • Two-stage build: compiler tools stay in the builder stage, runtime image stays small and secure
  • Non-root user: runs as appuser (uid 1000), not root
  • Layer caching: dependencies are copied and installed before application code — code changes don't invalidate the slow compilation step

Part 3: AWS Infrastructure Setup

EKS Cluster

I used eksctl to create the cluster. It's a purpose-built CLI (not a wrapper around the aws CLI — it uses the AWS SDK directly) that handles VPC, subnets, security groups, IAM roles, and the EC2 nodegroup in one shot.

eksctl create cluster \
  --name your-service \
  --region us-east-1 \
  --version 1.29 \
  --nodegroup-name workers \
  --node-type t3.medium \
  --nodes 2 \
  --nodes-min 2 \
  --nodes-max 4 \
  --managed

These three tools serve completely different purposes and it's worth being clear on which one to reach for:

Tool Talks to Used for
eksctl AWS APIs Create/delete clusters, nodegroups
kubectl Kubernetes API server Deploy apps, check pods, port-forward
aws CLI AWS APIs IAM, ECR, security groups, everything else

Once the cluster is created, you rarely need eksctl again. Day-to-day work is kubectl and aws CLI.

ECR Repository

aws ecr create-repository \
  --repository-name your-service \
  --region us-east-1 \
  --image-scanning-configuration scanOnPush=true

One ECR repository per service is the standard pattern. The repository URI goes into helm/your-service/values.yaml.

OIDC Federation (No Long-Lived AWS Keys)

Instead of storing AWS access keys as GitHub secrets, I used OIDC federation — GitHub Actions gets a short-lived token from AWS via a trust relationship.

# 1. Register GitHub as an identity provider (once per account)
aws iam create-open-id-connect-provider \
  --url https://token.actions.githubusercontent.com \
  --client-id-list sts.amazonaws.com \
  --thumbprint-list 6938fd4d98bab03faadb97b34396831e3780aea1

# 2. Create IAM role with trust policy scoped to your repo
aws iam create-role \
  --role-name github-actions-ecr \
  --assume-role-policy-document file://trust-policy.json

# 3. Attach ECR permissions
aws iam attach-role-policy \
  --role-name github-actions-ecr \
  --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryPowerUser

The trust policy allows only your specific repo to assume the role:

{
  "Condition": {
    "StringLike": {
      "token.actions.githubusercontent.com:sub": "repo:<OWNER>/your-service:*"
    }
  }
}

The only GitHub secret needed is AWS_ACCOUNT_ID — no access keys.


Part 4: CI/CD with GitHub Actions

The pipeline runs four jobs in sequence on every push to main:

lint → test → build-and-push → update-helm-values

Job 1 & 2: Lint and Test

- run: uv run ruff check .
- run: uv run ruff format --check .
- run: uv run pytest --cov=app --cov-report=xml

Job 3: Build and Push to ECR

- name: Configure AWS credentials
  uses: aws-actions/configure-aws-credentials@v4
  with:
    role-to-assume: arn:aws:iam::${{ secrets.AWS_ACCOUNT_ID }}:role/github-actions-ecr
    aws-region: us-east-1

- name: Build and push Docker image
  uses: docker/build-push-action@v6
  with:
    context: .
    push: true
    tags: |
      ${{ env.ECR_URI }}:${{ github.sha::7 }}
      ${{ env.ECR_URI }}:latest
    cache-from: type=registry,ref=${{ env.ECR_URI }}:latest
    cache-to: type=inline

The image is tagged with the first 7 characters of the Git SHA (e.g., 4565a6a).

Job 4: Update Helm Values (GitOps Trigger)

After pushing the image, CI updates values.yaml with the new tag and commits it back to the repo:

- run: |
    sed -i "s/^  tag: .*/  tag: \"${{ needs.build-and-push.outputs.image_tag }}\"/" \
      helm/your-service/values.yaml

- run: |
    git config user.name "github-actions[bot]"
    git add helm/your-service/values.yaml
    git commit -m "ci: update image tag to $IMAGE_TAG"
    git push

This commit is what triggers ArgoCD to detect a drift and sync.


Part 5: Helm Chart

The Helm chart lives at helm/your-service/ and templates a Deployment, Service, HorizontalPodAutoscaler, ServiceAccount, and a Secret placeholder. values.yaml is the single configuration knob:

replicaCount: 2

image:
  repository: <ACCOUNT_ID>.dkr.ecr.us-east-1.amazonaws.com/your-service
  tag: "4565a6a"   # auto-updated by CI

autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70

env:
  API_KEY:
    valueFrom:
      secretKeyRef:
        name: your-service-secret
        key: api-key

The API key is injected from a Kubernetes Secret created manually:

kubectl create secret generic your-service-secret \
  --from-literal=api-key=<YOUR_API_KEY> \
  -n your-service

Part 6: ArgoCD — GitOps Deployments

ArgoCD is installed on the same EKS cluster and watches the GitHub repo. When values.yaml changes (after CI updates the image tag), ArgoCD detects the drift and runs helm upgrade automatically.

# argocd/application.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: your-service
  namespace: argocd
spec:
  source:
    repoURL: https://github.com/<OWNER>/your-service.git
    targetRevision: main
    path: helm/your-service

  destination:
    server: https://kubernetes.default.svc
    namespace: your-service

  syncPolicy:
    automated:
      prune: true       # delete resources removed from Git
      selfHeal: true    # revert manual changes to match Git

The full deployment flow once everything is wired up:

Developer pushes code
       │
       ▼
GitHub Actions: lint → test → build → push to ECR
       │
       ▼
CI commits new image tag to values.yaml
       │
       ▼
ArgoCD detects Git change (polls every 3 min)
       │
       ▼
ArgoCD runs helm upgrade with new values
       │
       ▼
Kubernetes rolls out new pods (rolling update)
       │
       ▼
Old pods terminate after new pods are healthy

ArgoCD doesn't replace Helm — it uses Helm under the hood. The difference: Helm is imperative (you run helm upgrade), ArgoCD is declarative (you commit to Git and ArgoCD converges the cluster). ArgoCD adds drift detection, selfHeal, and rollback via the UI on top of that.


Lessons Learned

eksctl has credential issues with AWS SSO

If you use SSO or temporary session credentials, eksctl (Go SDK) may fail to refresh tokens. Workaround:

eval "$(aws configure export-credentials --format env)"

This exports the current credentials as environment variables that the Go SDK can read.

Bootstrap problem: nodes need ECR read permissions before ArgoCD can deploy

When ArgoCD first tried to deploy, pods went into ImagePullBackOff — the nodegroup IAM role didn't have permission to pull from ECR. The fix:

aws iam attach-role-policy \
  --role-name <NodeInstanceRole> \
  --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly

This needs to be set up before the first deploy, not after pods start failing.

Layer caching matters for compile-heavy dependencies

Splitting the COPY + RUN uv sync into two steps — deps first, then application code — means code changes don't re-trigger the slow C compilation step. A build that takes 4 minutes cold runs in 30 seconds when only the app code changed.


The Final Flow

Once everything is set up, the developer workflow is:

git push origin main

Everything else is automatic:

  1. GitHub Actions lints, tests, builds, and pushes the image
  2. CI commits the new image tag to values.yaml
  3. ArgoCD detects the Git change and syncs
  4. Kubernetes rolls out new pods with zero downtime

To verify:

kubectl get pods -n your-service -o wide
argocd app get your-service
kubectl port-forward svc/your-service 9000:80 -n your-service
curl http://localhost:9000/api/v1/health

Summary

Component Purpose
uv Fast Python package manager, replaces pip + venv
FastAPI API framework
Docker (multi-stage) Small, secure runtime image
ECR Docker image registry
EKS Managed Kubernetes cluster
OIDC federation Keyless AWS auth from GitHub Actions
GitHub Actions CI: lint → test → build → push → update Helm
Helm Kubernetes manifest templating
ArgoCD GitOps: auto-sync cluster state to Git

The key insight of this setup: Git is the source of truth. No one runs kubectl apply or helm upgrade manually. The cluster always reflects what's in the repo.