Sapari Infrastructure Architecture¶

Context¶

Sapari is an AI-powered video editing platform that automates post-production for content creators: silence removal, false start detection, profanity filtering, asset compositing, and caption generation. Users interact with a browser-based editor (React 19) backed by a FastAPI API, with heavy processing handled by background workers consuming from RabbitMQ priority queues via TaskIQ.

The infrastructure progresses through versions as load grows. v1.0 is the launch architecture optimized for cost. v2.0 is the endgame with GPU acceleration and Kubernetes autoscaling. Intermediate versions split the workload incrementally.

Current version: v1.0 (launch)

Design Principles¶

Scale vertically as long as possible. A bigger box is operationally cheaper than a cluster.
No hard lock-in. Every external service has a migration path. Workers are Docker containers that only need queue access.
Colocate compute and I/O. Workers, Redis, and temporary video storage live on the same Hetzner machine (or same datacenter at v1.2+) for low latency.
Minimize ops surface. Plain Docker Compose at v1.x. Move to k3s only when horizontal scaling forces it.
Defer expensive infrastructure. GPU server (€184/mo) is justified only when transcription costs on the OpenAI API exceed the GPU's amortized cost.

Version Progression¶

Version	Topology	Cost/mo	Trigger to upgrade
v1.0	Single CCX23 (everything) + CX33 staging + Cloudflare edge	~€39	Launch state
v1.1	Vertical scale -- CCX33 or CCX43	~€70-130	RAM consistently >75% during peak
v1.2	Horizontal split -- API on one box, workers on another, WireGuard between	~€80-140	Concurrent renders regularly queueing
v1.3	Managed data plane -- Upstash Redis + CloudAMQP RabbitMQ	~€100-160	Self-hosted Redis/RabbitMQ becomes ops burden
v2.0	k3s cluster, GEX44 GPU, KEDA autoscaling, NVENC, local Whisper	~€225 + variable	Worker box capacity exceeded OR transcription cost > $200/mo

Each version is a fully working state. You can stay at any version indefinitely.

v1.0: Launch Architecture (current)¶

Overview¶

graph TB
    subgraph Cloudflare
        CF_PAGES["Cloudflare Pages<br/><i>Frontend (React 19) + Landing (Astro)</i>"]
        CF_R2["R2 Storage<br/><i>3 buckets: raw, exports, assets</i>"]
        CF_DNS["DNS + CDN"]
        CF_WORKER["Workers<br/><i>/api/* proxy to backend</i>"]
        CF_ACCESS["Access<br/><i>Staging gate (GitHub OAuth)</i>"]
    end

    subgraph Neon
        POSTGRES_PROD[("PostgreSQL<br/><i>sapari-production project</i>")]
        POSTGRES_STG[("PostgreSQL<br/><i>sapari-staging project</i>")]
    end

    subgraph "Production Server (Hetzner CCX23, HIL (Hillsboro, US-West) or ASH (Ashburn, US-East))"
        CADDY_P["Caddy 2 (TLS via DNS-01)"]
        API_P["FastAPI Backend"]
        WORKERS_P["6 Workers + Scheduler<br/><i>analysis, render, download, proxy, asset-edit, email</i>"]
        REDIS_P[("Redis 7<br/><i>cache + sessions + pub/sub</i>")]
        RMQ_P[("RabbitMQ 3<br/><i>task broker + management UI</i>")]
    end

    subgraph "Staging Server (Hetzner CX33, HIL (Hillsboro, US-West) or ASH (Ashburn, US-East))"
        CADDY_S["Caddy 2"]
        API_S["FastAPI Backend"]
        WORKERS_S["6 Workers + Scheduler"]
        REDIS_S[("Redis 7")]
        RMQ_S[("RabbitMQ 3")]
    end

    %% User flows -- production
    USER["User Browser"] -- "HTTPS" --> CF_DNS
    CF_DNS --> CF_PAGES
    CF_DNS --> CF_WORKER
    CF_WORKER -- "/api/*" --> CADDY_P
    CF_WORKER -- "static" --> CF_PAGES
    CADDY_P --> API_P
    API_P --> POSTGRES_PROD
    API_P -- "presigned" --> CF_R2
    API_P --> REDIS_P
    API_P --> RMQ_P
    RMQ_P --> WORKERS_P
    WORKERS_P --> POSTGRES_PROD
    WORKERS_P --> CF_R2
    WORKERS_P --> REDIS_P

    %% Staging gated by Access
    DEV["Developer"] -- "GitHub login" --> CF_ACCESS
    CF_ACCESS --> CADDY_S
    CADDY_S --> API_S

Physical Layout¶

Machine	Spec	Cost	Role
Hetzner CCX23 (production)	4 vCPU dedicated, 16 GB RAM, 160 GB SSD, HIL (Hillsboro, US-West) or ASH (Ashburn, US-East)	€31.99/mo + 20% backups (€6.40)	Backend API + 6 workers + scheduler + Redis + RabbitMQ + Caddy
Hetzner CX33 (staging)	4 vCPU shared, 8 GB RAM, 80 GB SSD, HIL (Hillsboro, US-West) or ASH (Ashburn, US-East)	€6.99/mo + 20% backups (€1.40)	Same stack as production

Managed Services¶

Service	Provider	Plan	Purpose
Postgres (production)	Neon	Free tier (separate project)	100 CU-hrs/mo, 0.5 GB storage, 6h restore
Postgres (staging)	Neon	Free tier (separate project)	Same as above, isolated
Object storage	Cloudflare R2	Free tier (10 GB)	3 buckets per env, free egress
DNS + CDN + Pages + Workers + Access	Cloudflare	Free tier	Frontend hosting, API proxy, staging gate
Payments	Stripe	Pay-as-you-go	Subscriptions, credit billing
Email	Postmark	Pay-as-you-go	Transactional email
Observability	Logfire	Free tier	Structured logs, OpenTelemetry traces

Networking¶

graph LR
    USER["User Browser"]

    subgraph "Cloudflare Edge"
        WORKER["Worker proxy"]
        PAGES["Pages (static assets)"]
    end

    subgraph "Hetzner Server (Production)"
        CADDY["Caddy<br/>api.sapari.io<br/>(TLS via DNS-01)"]
        BACKEND["Backend container"]
    end

    USER -- "https://app.sapari.io" --> WORKER
    WORKER -- "/api/* (HTTPS)" --> CADDY
    WORKER -- "static" --> PAGES
    CADDY --> BACKEND

Why Cloudflare Worker proxy: Frontend hardcodes relative /api/v1 path (shared/api/client.ts:7) and cookies use SameSite=Strict in production (auth/session/manager.py:18,523). Cross-origin would require code changes + weakening CSRF posture. The Worker makes frontend and backend appear same-origin to the browser. Backend cookies are host-only (no domain= set, verified manager.py:526-544), so they bind naturally to the Worker's domain.

Why DNS-only (gray cloud) on api.sapari.io: Browser never sees this domain. Worker fetches it directly. Adding Cloudflare proxy would double-proxy (CF -> CF -> Caddy -> Backend), adding latency.

TLS strategy: Caddy uses Cloudflare DNS-01 ACME challenge (no port 80 exposed). UFW only allows 443 publicly + 22 from operator IP.

Stack on Each Server¶

Single Docker Compose file (docker-compose.prod.yml), shared between staging and production with different env files:

Container	Image	Purpose	Memory limit
caddy	caddy:2-alpine	TLS + reverse proxy	128m
backend	ghcr.io/sapari-backend (target: prod)	FastAPI single worker	512m
analysis-worker	same image	Whisper API + LLM	1g
render-worker	same image	FFmpeg libx264	3g
download-worker	same image	yt-dlp + audio extraction + waveform + thumbnail	1g
proxy-worker	same image	FFmpeg H.264 480p re-encode for non-web-compatible clips	2g (2 vCPUs)
asset-edit-worker	same image	Image operations	512m
email-worker	same image	Postmark API	512m
scheduler	same image	TaskIQ cron	256m
redis	redis:7-alpine	Cache + sessions + pub/sub	256m
rabbitmq	`rabbitmq/Dockerfile` (rabbitmq:3.13.7-management-alpine + delayed-message-exchange plugin)	Broker + management UI	512m

Single image, multiple commands: All backend containers run from the same Docker image. They differ only in the command (fastapi run vs taskiq worker analysis_broker etc.). Trade-off: code changes affect all containers, but build time stays low and operations stay simple. See INFRASTRUCTURE_PROVISIONING_PLAN.md for detail.

Deployment¶

GitHub Actions builds + pushes Docker images to GHCR on push to staging or main. Server SSH is firewalled to the tailnet only, so deploy workflows run tailscale/github-action@v3 to join the tailnet before the appleboy/ssh-action step, then execute ./scripts/deployment/deploy.sh on the server. Cloudflare Pages auto-deploys frontend + landing on the same push.

CD branch flow: - staging branch -> Build -> auto-deploy to staging server (+ staging Pages) - main branch -> Build only. Production deploys are manual via Actions -> Deploy Production -> Run workflow with typed YES confirmation. Landing Pages still auto-deploys from main.

Production uses a manual trigger (workflow_dispatch + typed confirm) rather than a required-reviewer gate, because private repos without GitHub Team can't use reviewer or wait-timer environment protection rules. The typed confirm is the deliberate-action equivalent.

Deploy = stop-then-start per service. Brief downtime per service (seconds) during docker compose up -d. React Query retries + maintenance screen handle it.

Backups¶

Data	Mechanism	Cost
Postgres	Neon built-in 6h time travel	Free
R2 buckets	Cloudflare's built-in durability + application-level protection (soft deletes, IntegrityError handling on ClipFile, reconcile cron). R2 doesn't currently support S3-style versioning.	None (built-in)
Server data (Redis, RabbitMQ, Caddy, .env files)	Hetzner automated backups (toggle at provisioning)	20% surcharge per server
`.env` secrets	Stored in 1Password / Bitwarden as you create them	Free

No backup scripts in v1.0 -- everything is a managed-service toggle.

Cost Summary (v1.0)¶

Item	Provider	Cost
CCX23 production server	Hetzner	€31.99
Hetzner backups (production)	Hetzner	€6.40
CX33 staging server	Hetzner	€6.99
Hetzner backups (staging)	Hetzner	€1.40
Postgres (both projects)	Neon	Free
R2 storage	Cloudflare	Free
Cloudflare Pages + Workers + Access + DNS	Cloudflare	Free
Logfire	Logfire	Free
Total monthly baseline		€46.78 / ~$50

Variable costs: - LLM API (DeepSeek + OpenAI Whisper + GPT-5.x for analysis steps) -- per-token - Postmark email -- $1.25 per 10K emails - Stripe -- 2.9% + 30¢ per transaction (standard) - R2 beyond 10 GB storage -- $0.015/GB/mo - Neon Launch upgrade if 100 CU-hrs/mo exceeded -- ~$20-30/mo per project

v1.1: Vertical Scale¶

Trigger: Production CCX23 RAM consistently >75% during peak hours, or render queue regularly >2 deep.

Change: Resize the Hetzner box in-place (1-2 minutes downtime during reboot). Same architecture, more resources.

Upgrade	Spec	Cost/mo
CCX23 -> CCX33	8 vCPU dedicated, 32 GB RAM	€62.99 + €12.60 backups
CCX23 -> CCX43	16 vCPU dedicated, 64 GB RAM	€125.49 + €25.10 backups

What changes: - Increase TASKIQ_WORKER_CONCURRENCY from 1 to 2 or 3 in .env.production (more concurrent renders) - Increase mem_limit on workers in docker-compose.prod.yml (more headroom) - Optionally raise FFMPEG_THREADS if CPU is consistently underutilized

What doesn't change: Domain layout, Cloudflare config, services, deployment process. Just a bigger box.

v1.2: Horizontal Split (API + Workers)¶

Trigger: Even at CCX43 (€125/mo), workers and API contend for resources. Worker OOM is starting to affect API responsiveness.

Change: Move workers to a dedicated server. API stays on its own. WireGuard tunnel between them for private Redis + RabbitMQ access.

graph TB
    subgraph "API Server (CCX23)"
        CADDY_API["Caddy"]
        API["FastAPI"]
    end

    subgraph "Worker Server (CCX33 or larger)"
        WORKERS["6 workers + scheduler"]
        REDIS[("Redis")]
        RMQ[("RabbitMQ")]
    end

    USER -- "https://app.sapari.io" --> CF["Cloudflare Worker proxy"]
    CF --> CADDY_API
    CADDY_API --> API
    API -- "WireGuard 10.0.0.0/24" --> REDIS
    API -- "WireGuard" --> RMQ
    WORKERS --> REDIS
    WORKERS --> RMQ
    API --> NEON[("Neon Postgres")]
    WORKERS --> NEON
    WORKERS --> R2[("R2")]
    API --> R2

Cost example: CCX23 API (€32) + CCX33 workers (€63) + backups (€19) = ~€114/mo + same managed services.

What changes: - Provision second Hetzner box, set up WireGuard between API box and worker box - API box .env.production points Redis/RabbitMQ to 10.0.0.2 (worker box's WireGuard IP) - Workers move to the new box, API stays on the old one - Backup Redis + RabbitMQ data on the worker box (Hetzner backups still cover both) - DNS unchanged (api.sapari.io still points to API box)

What doesn't change: Cloudflare layer, Neon, R2, deployment scripts (just deploy to two servers instead of one).

v1.3: Managed Data Plane¶

Trigger: Self-hosting Redis and RabbitMQ becomes operational burden. RabbitMQ broker config drifts, Redis memory pressure causes evictions, you've spent more time tuning them than the value they provide.

Change: Replace self-hosted Redis with Upstash, RabbitMQ with CloudAMQP. WireGuard tunnel goes away (back to public TLS for everything).

Service	Provider	Free tier	Paid tier
Redis	Upstash	256 MB, 500K commands/day	Pay-per-request beyond
RabbitMQ	CloudAMQP	1M messages/mo	$19/mo for Cluster Standard

What changes: - Both servers connect to managed Redis/RabbitMQ over public TLS - Server budget shrinks (no need for the worker box's RAM to host Redis/RabbitMQ) - WireGuard goes away - Backup story simplifies (managed services back themselves up)

What doesn't change: Cloudflare layer, Neon, R2, application code (env vars point at new endpoints).

v2.0: GPU + Kubernetes¶

Trigger: EITHER (a) worker box capacity exceeded even with managed data plane, OR (b) OpenAI Whisper API costs exceed ~$200/mo (~33,000 minutes of audio at $0.006/min).

Change: Full architecture from the original infrastructure plan. GEX44 GPU server runs analysis (local Whisper) + render (NVENC) + asset-edit. CCX13 control plane runs k3s + remaining workers + managed RMQ/Redis (or keep managed). KEDA autoscales render workers per queue depth. Burst CCX13 nodes spin up on demand.

v2.0 Architecture (full)¶

This was the original infrastructure plan. Preserving it as the v2.0 endgame:

Physical Infrastructure¶

All Hetzner resources in HIL (Hillsboro, US-West) or ASH (Ashburn, US-East) datacenter, vSwitch private network (10.0.0.0/24).

Machine	Spec	Role	Cost	Always on?
GEX44	i5-13500 (14 cores), RTX 4000 20GB GPU, 64GB RAM, 2×1.92TB NVMe	Analysis + render + asset-edit workers, local Whisper	€184/mo + €264 setup	Yes
CCX13 #1	2 vCPU, 8GB RAM, 80GB SSD	k3s control plane, RabbitMQ, Redis, download + email workers	$13.49/mo	Yes
CCX13 #2	2 vCPU, 8GB RAM, 80GB SSD	k3s agent, overflow render workers	$13.49/mo	On-demand (autoscaled)

k3s Cluster¶

graph TB
    SERVER["k3s Server (CCX13 #1)<br/>Control plane + etcd"]
    AGENT1["k3s Agent (GEX44)<br/>Labels: gpu=true, role=worker"]
    AGENT2["k3s Agent (CCX13 #2)<br/>Labels: role=burst-render"]

    SERVER --> AGENT1
    SERVER --> AGENT2

KEDA Autoscaling¶

Render worker: 0-6 replicas, 1 pod per pending task in queue, 5-min cooldown
Analysis worker: 1-2 replicas (GPU shared via semaphore), scales when queue >3
Asset-edit worker: 0-3 replicas, 2 tasks per pod, 2-min cooldown

When pending pods exceed GEX44 capacity, cluster-autoscaler provisions a CCX13 burst node (~2-3 minute provisioning + k3s join time).

GPU Acceleration¶

The RTX 4000 has independent NVENC/NVDEC and CUDA engines, enabling concurrent Whisper transcription (CUDA) and FFmpeg rendering (NVENC). Render workers detect GPU at startup and use h264_nvenc; CPU-only nodes (burst) fall back to libx264.

Pluggable Transcription¶

TranscriptionRouter selects backends based on availability and cost: - LocalWhisperBackend (GPU): preferred, ~€0/min amortized, 1 concurrent (semaphore) - OpenAIWhisperBackend: overflow when GPU busy, $0.006/min, 5 concurrent - GroqWhisperBackend (future): $0.003/min if added

Networking¶

WireGuard tunnel from API host (still on Hetzner or moved to Render/FastAPI Cloud) to control plane. Internal Hetzner traffic uses vSwitch.

Migration from v1.x to v2.0¶

This is a substantial migration, not a toggle. Major steps: 1. Provision GEX44 + CCX13 #1 2. Install k3s, NVIDIA drivers, container toolkit 3. Deploy backend image + workers as k8s manifests (existing Dockerfile works) 4. Wire KEDA against existing RabbitMQ 5. DNS cutover for api.sapari.io (or split into separate subdomains per worker pool) 6. Decommission v1.x server

Not in scope until v2.0 trigger fires. Documented as the endgame, not the next step.

Observability (all versions)¶

Logfire (free tier) -- structured logs + OpenTelemetry traces. FastAPI, SQLAlchemy, Redis, Pydantic AI all instrumented.
System Health admin page -- in-app dashboard for component status, queue depths, server resources. Auto-refreshes every 10s.
RabbitMQ Management UI (port 15672 via SSH tunnel) -- queue depth, message rates, per-priority breakdown.
Hetzner Cloud console -- CPU, RAM, disk graphs.
Sentry (frontend only at v1.0) -- backend Sentry is a follow-up.

Migration Paths Summary¶

Component	v1.x escape	v2.0 escape
Server hosting (Hetzner)	Resize box	Add GEX44 as k3s agent
Postgres (Neon)	Neon Launch -> Scale	Self-hosted on Hetzner
Object storage (R2)	(no escape needed -- free egress)	Same
API hosting	Stay self-hosted, OR move to FastAPI Cloud / Render	Same
Workers	Single box -> two boxes -> k3s cluster	k3s with KEDA
Redis	Self-hosted -> Upstash	Either
RabbitMQ	Self-hosted -> CloudAMQP	Either

Open Questions¶

When to split API + workers (v1.0 -> v1.2)? Concrete trigger: 2+ concurrent renders queueing for >5 min during peak. Vague "feels slow" doesn't qualify.
When to add GPU (v1.x -> v2.0)? Math-based: when monthly OpenAI Whisper bill > Hetzner GEX44 cost (€184). At $0.006/min, that's ~30,500 minutes of audio per month -- equivalent to ~500 medium-length videos.
When to move backend off Hetzner self-host? Likely never. FastAPI Cloud might tempt, but custom domain support and pricing are still maturing.
NVENC quality vs CPU libx264? NVENC at preset p5 is faster but slightly lower quality. CPU libx264 produces smaller files at the same quality. v2.0 could offer "high quality" exports routed to CPU burst nodes.