Sapari Infrastructure Architecture¶
Context¶
Sapari is an AI-powered video editing platform that automates post-production for content creators: silence removal, false start detection, profanity filtering, asset compositing, and caption generation. Users interact with a browser-based editor (React 19) backed by a FastAPI API, with heavy processing handled by background workers consuming from RabbitMQ priority queues via TaskIQ.
The infrastructure progresses through versions as load grows. v1.0 is the launch architecture optimized for cost. v2.0 is the endgame with GPU acceleration and Kubernetes autoscaling. Intermediate versions split the workload incrementally.
Current version: v1.0 (launch)
Design Principles¶
- Scale vertically as long as possible. A bigger box is operationally cheaper than a cluster.
- No hard lock-in. Every external service has a migration path. Workers are Docker containers that only need queue access.
- Colocate compute and I/O. Workers, Redis, and temporary video storage live on the same Hetzner machine (or same datacenter at v1.2+) for low latency.
- Minimize ops surface. Plain Docker Compose at v1.x. Move to k3s only when horizontal scaling forces it.
- Defer expensive infrastructure. GPU server (€184/mo) is justified only when transcription costs on the OpenAI API exceed the GPU's amortized cost.
Version Progression¶
| Version | Topology | Cost/mo | Trigger to upgrade |
|---|---|---|---|
| v1.0 | Single CCX23 (everything) + CX33 staging + Cloudflare edge | ~€39 | Launch state |
| v1.1 | Vertical scale -- CCX33 or CCX43 | ~€70-130 | RAM consistently >75% during peak |
| v1.2 | Horizontal split -- API on one box, workers on another, WireGuard between | ~€80-140 | Concurrent renders regularly queueing |
| v1.3 | Managed data plane -- Upstash Redis + CloudAMQP RabbitMQ | ~€100-160 | Self-hosted Redis/RabbitMQ becomes ops burden |
| v2.0 | k3s cluster, GEX44 GPU, KEDA autoscaling, NVENC, local Whisper | ~€225 + variable | Worker box capacity exceeded OR transcription cost > $200/mo |
Each version is a fully working state. You can stay at any version indefinitely.
v1.0: Launch Architecture (current)¶
Overview¶
graph TB
subgraph Cloudflare
CF_PAGES["Cloudflare Pages<br/><i>Frontend (React 19) + Landing (Astro)</i>"]
CF_R2["R2 Storage<br/><i>3 buckets: raw, exports, assets</i>"]
CF_DNS["DNS + CDN"]
CF_WORKER["Workers<br/><i>/api/* proxy to backend</i>"]
CF_ACCESS["Access<br/><i>Staging gate (GitHub OAuth)</i>"]
end
subgraph Neon
POSTGRES_PROD[("PostgreSQL<br/><i>sapari-production project</i>")]
POSTGRES_STG[("PostgreSQL<br/><i>sapari-staging project</i>")]
end
subgraph "Production Server (Hetzner CCX23, HIL (Hillsboro, US-West) or ASH (Ashburn, US-East))"
CADDY_P["Caddy 2 (TLS via DNS-01)"]
API_P["FastAPI Backend"]
WORKERS_P["6 Workers + Scheduler<br/><i>analysis, render, download, proxy, asset-edit, email</i>"]
REDIS_P[("Redis 7<br/><i>cache + sessions + pub/sub</i>")]
RMQ_P[("RabbitMQ 3<br/><i>task broker + management UI</i>")]
end
subgraph "Staging Server (Hetzner CX33, HIL (Hillsboro, US-West) or ASH (Ashburn, US-East))"
CADDY_S["Caddy 2"]
API_S["FastAPI Backend"]
WORKERS_S["6 Workers + Scheduler"]
REDIS_S[("Redis 7")]
RMQ_S[("RabbitMQ 3")]
end
%% User flows -- production
USER["User Browser"] -- "HTTPS" --> CF_DNS
CF_DNS --> CF_PAGES
CF_DNS --> CF_WORKER
CF_WORKER -- "/api/*" --> CADDY_P
CF_WORKER -- "static" --> CF_PAGES
CADDY_P --> API_P
API_P --> POSTGRES_PROD
API_P -- "presigned" --> CF_R2
API_P --> REDIS_P
API_P --> RMQ_P
RMQ_P --> WORKERS_P
WORKERS_P --> POSTGRES_PROD
WORKERS_P --> CF_R2
WORKERS_P --> REDIS_P
%% Staging gated by Access
DEV["Developer"] -- "GitHub login" --> CF_ACCESS
CF_ACCESS --> CADDY_S
CADDY_S --> API_S
Physical Layout¶
| Machine | Spec | Cost | Role |
|---|---|---|---|
| Hetzner CCX23 (production) | 4 vCPU dedicated, 16 GB RAM, 160 GB SSD, HIL (Hillsboro, US-West) or ASH (Ashburn, US-East) | €31.99/mo + 20% backups (€6.40) | Backend API + 6 workers + scheduler + Redis + RabbitMQ + Caddy |
| Hetzner CX33 (staging) | 4 vCPU shared, 8 GB RAM, 80 GB SSD, HIL (Hillsboro, US-West) or ASH (Ashburn, US-East) | €6.99/mo + 20% backups (€1.40) | Same stack as production |
Managed Services¶
| Service | Provider | Plan | Purpose |
|---|---|---|---|
| Postgres (production) | Neon | Free tier (separate project) | 100 CU-hrs/mo, 0.5 GB storage, 6h restore |
| Postgres (staging) | Neon | Free tier (separate project) | Same as above, isolated |
| Object storage | Cloudflare R2 | Free tier (10 GB) | 3 buckets per env, free egress |
| DNS + CDN + Pages + Workers + Access | Cloudflare | Free tier | Frontend hosting, API proxy, staging gate |
| Payments | Stripe | Pay-as-you-go | Subscriptions, credit billing |
| Postmark | Pay-as-you-go | Transactional email | |
| Observability | Logfire | Free tier | Structured logs, OpenTelemetry traces |
Networking¶
graph LR
USER["User Browser"]
subgraph "Cloudflare Edge"
WORKER["Worker proxy"]
PAGES["Pages (static assets)"]
end
subgraph "Hetzner Server (Production)"
CADDY["Caddy<br/>api.sapari.io<br/>(TLS via DNS-01)"]
BACKEND["Backend container"]
end
USER -- "https://app.sapari.io" --> WORKER
WORKER -- "/api/* (HTTPS)" --> CADDY
WORKER -- "static" --> PAGES
CADDY --> BACKEND
Why Cloudflare Worker proxy: Frontend hardcodes relative /api/v1 path (shared/api/client.ts:7) and cookies use SameSite=Strict in production (auth/session/manager.py:18,523). Cross-origin would require code changes + weakening CSRF posture. The Worker makes frontend and backend appear same-origin to the browser. Backend cookies are host-only (no domain= set, verified manager.py:526-544), so they bind naturally to the Worker's domain.
Why DNS-only (gray cloud) on api.sapari.io: Browser never sees this domain. Worker fetches it directly. Adding Cloudflare proxy would double-proxy (CF -> CF -> Caddy -> Backend), adding latency.
TLS strategy: Caddy uses Cloudflare DNS-01 ACME challenge (no port 80 exposed). UFW only allows 443 publicly + 22 from operator IP.
Stack on Each Server¶
Single Docker Compose file (docker-compose.prod.yml), shared between staging and production with different env files:
| Container | Image | Purpose | Memory limit |
|---|---|---|---|
| caddy | caddy:2-alpine | TLS + reverse proxy | 128m |
| backend | ghcr.io/sapari-backend (target: prod) | FastAPI single worker | 512m |
| analysis-worker | same image | Whisper API + LLM | 1g |
| render-worker | same image | FFmpeg libx264 | 3g |
| download-worker | same image | yt-dlp + audio extraction + waveform + thumbnail | 1g |
| proxy-worker | same image | FFmpeg H.264 480p re-encode for non-web-compatible clips | 2g (2 vCPUs) |
| asset-edit-worker | same image | Image operations | 512m |
| email-worker | same image | Postmark API | 512m |
| scheduler | same image | TaskIQ cron | 256m |
| redis | redis:7-alpine | Cache + sessions + pub/sub | 256m |
| rabbitmq | rabbitmq/Dockerfile (rabbitmq:3.13.7-management-alpine + delayed-message-exchange plugin) |
Broker + management UI | 512m |
Single image, multiple commands: All backend containers run from the same Docker image. They differ only in the command (fastapi run vs taskiq worker analysis_broker etc.). Trade-off: code changes affect all containers, but build time stays low and operations stay simple. See INFRASTRUCTURE_PROVISIONING_PLAN.md for detail.
Deployment¶
GitHub Actions builds + pushes Docker images to GHCR on push to staging or main. Server SSH is firewalled to the tailnet only, so deploy workflows run tailscale/github-action@v3 to join the tailnet before the appleboy/ssh-action step, then execute ./scripts/deployment/deploy.sh on the server. Cloudflare Pages auto-deploys frontend + landing on the same push.
CD branch flow:
- staging branch -> Build -> auto-deploy to staging server (+ staging Pages)
- main branch -> Build only. Production deploys are manual via Actions -> Deploy Production -> Run workflow with typed YES confirmation. Landing Pages still auto-deploys from main.
Production uses a manual trigger (workflow_dispatch + typed confirm) rather than a required-reviewer gate, because private repos without GitHub Team can't use reviewer or wait-timer environment protection rules. The typed confirm is the deliberate-action equivalent.
Deploy = stop-then-start per service. Brief downtime per service (seconds) during docker compose up -d. React Query retries + maintenance screen handle it.
Backups¶
| Data | Mechanism | Cost |
|---|---|---|
| Postgres | Neon built-in 6h time travel | Free |
| R2 buckets | Cloudflare's built-in durability + application-level protection (soft deletes, IntegrityError handling on ClipFile, reconcile cron). R2 doesn't currently support S3-style versioning. | None (built-in) |
| Server data (Redis, RabbitMQ, Caddy, .env files) | Hetzner automated backups (toggle at provisioning) | 20% surcharge per server |
.env secrets |
Stored in 1Password / Bitwarden as you create them | Free |
No backup scripts in v1.0 -- everything is a managed-service toggle.
Cost Summary (v1.0)¶
| Item | Provider | Cost |
|---|---|---|
| CCX23 production server | Hetzner | €31.99 |
| Hetzner backups (production) | Hetzner | €6.40 |
| CX33 staging server | Hetzner | €6.99 |
| Hetzner backups (staging) | Hetzner | €1.40 |
| Postgres (both projects) | Neon | Free |
| R2 storage | Cloudflare | Free |
| Cloudflare Pages + Workers + Access + DNS | Cloudflare | Free |
| Logfire | Logfire | Free |
| Total monthly baseline | €46.78 / ~$50 |
Variable costs: - LLM API (DeepSeek + OpenAI Whisper + GPT-5.x for analysis steps) -- per-token - Postmark email -- $1.25 per 10K emails - Stripe -- 2.9% + 30¢ per transaction (standard) - R2 beyond 10 GB storage -- \(0.015/GB/mo - Neon Launch upgrade if 100 CU-hrs/mo exceeded -- ~\)20-30/mo per project
v1.1: Vertical Scale¶
Trigger: Production CCX23 RAM consistently >75% during peak hours, or render queue regularly >2 deep.
Change: Resize the Hetzner box in-place (1-2 minutes downtime during reboot). Same architecture, more resources.
| Upgrade | Spec | Cost/mo |
|---|---|---|
| CCX23 -> CCX33 | 8 vCPU dedicated, 32 GB RAM | €62.99 + €12.60 backups |
| CCX23 -> CCX43 | 16 vCPU dedicated, 64 GB RAM | €125.49 + €25.10 backups |
What changes:
- Increase TASKIQ_WORKER_CONCURRENCY from 1 to 2 or 3 in .env.production (more concurrent renders)
- Increase mem_limit on workers in docker-compose.prod.yml (more headroom)
- Optionally raise FFMPEG_THREADS if CPU is consistently underutilized
What doesn't change: Domain layout, Cloudflare config, services, deployment process. Just a bigger box.
v1.2: Horizontal Split (API + Workers)¶
Trigger: Even at CCX43 (€125/mo), workers and API contend for resources. Worker OOM is starting to affect API responsiveness.
Change: Move workers to a dedicated server. API stays on its own. WireGuard tunnel between them for private Redis + RabbitMQ access.
graph TB
subgraph "API Server (CCX23)"
CADDY_API["Caddy"]
API["FastAPI"]
end
subgraph "Worker Server (CCX33 or larger)"
WORKERS["6 workers + scheduler"]
REDIS[("Redis")]
RMQ[("RabbitMQ")]
end
USER -- "https://app.sapari.io" --> CF["Cloudflare Worker proxy"]
CF --> CADDY_API
CADDY_API --> API
API -- "WireGuard 10.0.0.0/24" --> REDIS
API -- "WireGuard" --> RMQ
WORKERS --> REDIS
WORKERS --> RMQ
API --> NEON[("Neon Postgres")]
WORKERS --> NEON
WORKERS --> R2[("R2")]
API --> R2
Cost example: CCX23 API (€32) + CCX33 workers (€63) + backups (€19) = ~€114/mo + same managed services.
What changes:
- Provision second Hetzner box, set up WireGuard between API box and worker box
- API box .env.production points Redis/RabbitMQ to 10.0.0.2 (worker box's WireGuard IP)
- Workers move to the new box, API stays on the old one
- Backup Redis + RabbitMQ data on the worker box (Hetzner backups still cover both)
- DNS unchanged (api.sapari.io still points to API box)
What doesn't change: Cloudflare layer, Neon, R2, deployment scripts (just deploy to two servers instead of one).
v1.3: Managed Data Plane¶
Trigger: Self-hosting Redis and RabbitMQ becomes operational burden. RabbitMQ broker config drifts, Redis memory pressure causes evictions, you've spent more time tuning them than the value they provide.
Change: Replace self-hosted Redis with Upstash, RabbitMQ with CloudAMQP. WireGuard tunnel goes away (back to public TLS for everything).
| Service | Provider | Free tier | Paid tier |
|---|---|---|---|
| Redis | Upstash | 256 MB, 500K commands/day | Pay-per-request beyond |
| RabbitMQ | CloudAMQP | 1M messages/mo | $19/mo for Cluster Standard |
What changes: - Both servers connect to managed Redis/RabbitMQ over public TLS - Server budget shrinks (no need for the worker box's RAM to host Redis/RabbitMQ) - WireGuard goes away - Backup story simplifies (managed services back themselves up)
What doesn't change: Cloudflare layer, Neon, R2, application code (env vars point at new endpoints).
v2.0: GPU + Kubernetes¶
Trigger: EITHER (a) worker box capacity exceeded even with managed data plane, OR (b) OpenAI Whisper API costs exceed ~$200/mo (~33,000 minutes of audio at $0.006/min).
Change: Full architecture from the original infrastructure plan. GEX44 GPU server runs analysis (local Whisper) + render (NVENC) + asset-edit. CCX13 control plane runs k3s + remaining workers + managed RMQ/Redis (or keep managed). KEDA autoscales render workers per queue depth. Burst CCX13 nodes spin up on demand.
v2.0 Architecture (full)¶
This was the original infrastructure plan. Preserving it as the v2.0 endgame:
Physical Infrastructure¶
All Hetzner resources in HIL (Hillsboro, US-West) or ASH (Ashburn, US-East) datacenter, vSwitch private network (10.0.0.0/24).
| Machine | Spec | Role | Cost | Always on? |
|---|---|---|---|---|
| GEX44 | i5-13500 (14 cores), RTX 4000 20GB GPU, 64GB RAM, 2×1.92TB NVMe | Analysis + render + asset-edit workers, local Whisper | €184/mo + €264 setup | Yes |
| CCX13 #1 | 2 vCPU, 8GB RAM, 80GB SSD | k3s control plane, RabbitMQ, Redis, download + email workers | $13.49/mo | Yes |
| CCX13 #2 | 2 vCPU, 8GB RAM, 80GB SSD | k3s agent, overflow render workers | $13.49/mo | On-demand (autoscaled) |
k3s Cluster¶
graph TB
SERVER["k3s Server (CCX13 #1)<br/>Control plane + etcd"]
AGENT1["k3s Agent (GEX44)<br/>Labels: gpu=true, role=worker"]
AGENT2["k3s Agent (CCX13 #2)<br/>Labels: role=burst-render"]
SERVER --> AGENT1
SERVER --> AGENT2
KEDA Autoscaling¶
- Render worker: 0-6 replicas, 1 pod per pending task in queue, 5-min cooldown
- Analysis worker: 1-2 replicas (GPU shared via semaphore), scales when queue >3
- Asset-edit worker: 0-3 replicas, 2 tasks per pod, 2-min cooldown
When pending pods exceed GEX44 capacity, cluster-autoscaler provisions a CCX13 burst node (~2-3 minute provisioning + k3s join time).
GPU Acceleration¶
The RTX 4000 has independent NVENC/NVDEC and CUDA engines, enabling concurrent Whisper transcription (CUDA) and FFmpeg rendering (NVENC). Render workers detect GPU at startup and use h264_nvenc; CPU-only nodes (burst) fall back to libx264.
Pluggable Transcription¶
TranscriptionRouter selects backends based on availability and cost:
- LocalWhisperBackend (GPU): preferred, ~€0/min amortized, 1 concurrent (semaphore)
- OpenAIWhisperBackend: overflow when GPU busy, $0.006/min, 5 concurrent
- GroqWhisperBackend (future): $0.003/min if added
Networking¶
WireGuard tunnel from API host (still on Hetzner or moved to Render/FastAPI Cloud) to control plane. Internal Hetzner traffic uses vSwitch.
Migration from v1.x to v2.0¶
This is a substantial migration, not a toggle. Major steps:
1. Provision GEX44 + CCX13 #1
2. Install k3s, NVIDIA drivers, container toolkit
3. Deploy backend image + workers as k8s manifests (existing Dockerfile works)
4. Wire KEDA against existing RabbitMQ
5. DNS cutover for api.sapari.io (or split into separate subdomains per worker pool)
6. Decommission v1.x server
Not in scope until v2.0 trigger fires. Documented as the endgame, not the next step.
Observability (all versions)¶
- Logfire (free tier) -- structured logs + OpenTelemetry traces. FastAPI, SQLAlchemy, Redis, Pydantic AI all instrumented.
- System Health admin page -- in-app dashboard for component status, queue depths, server resources. Auto-refreshes every 10s.
- RabbitMQ Management UI (port 15672 via SSH tunnel) -- queue depth, message rates, per-priority breakdown.
- Hetzner Cloud console -- CPU, RAM, disk graphs.
- Sentry (frontend only at v1.0) -- backend Sentry is a follow-up.
Migration Paths Summary¶
| Component | v1.x escape | v2.0 escape |
|---|---|---|
| Server hosting (Hetzner) | Resize box | Add GEX44 as k3s agent |
| Postgres (Neon) | Neon Launch -> Scale | Self-hosted on Hetzner |
| Object storage (R2) | (no escape needed -- free egress) | Same |
| API hosting | Stay self-hosted, OR move to FastAPI Cloud / Render | Same |
| Workers | Single box -> two boxes -> k3s cluster | k3s with KEDA |
| Redis | Self-hosted -> Upstash | Either |
| RabbitMQ | Self-hosted -> CloudAMQP | Either |
Open Questions¶
-
When to split API + workers (v1.0 -> v1.2)? Concrete trigger: 2+ concurrent renders queueing for >5 min during peak. Vague "feels slow" doesn't qualify.
-
When to add GPU (v1.x -> v2.0)? Math-based: when monthly OpenAI Whisper bill > Hetzner GEX44 cost (€184). At $0.006/min, that's ~30,500 minutes of audio per month -- equivalent to ~500 medium-length videos.
-
When to move backend off Hetzner self-host? Likely never. FastAPI Cloud might tempt, but custom domain support and pricing are still maturing.
-
NVENC quality vs CPU libx264? NVENC at preset p5 is faster but slightly lower quality. CPU libx264 produces smaller files at the same quality. v2.0 could offer "high quality" exports routed to CPU burst nodes.