Data Flow¶
This page walks through how data moves through Sapari from upload to final export. Understanding this flow helps when debugging issues or adding new features.
The Complete Journey¶
A video goes through several stages in Sapari:
flowchart LR
A[Upload] --> B[Process]
B --> C[Analyze]
C --> D[Review]
D --> E[Render]
E --> F[Download]
style A fill:#ff3300,color:#fff
style F fill:#ff3300,color:#fff
Each stage involves different components, but the pattern is consistent: the API receives a request, queues a background task via RabbitMQ, and publishes events via Redis pub/sub when done.
Stage 1: Upload¶
The upload process uses presigned URLs so clients upload directly to R2 without going through our servers.
sequenceDiagram
participant Client
participant API
participant DB
participant R2
Client->>API: POST /clips/presign
API->>DB: Create Clip + ClipFile records
API->>R2: Generate presigned PUT URL
API->>Client: {upload_url, content_type, clip_uuid}
Client->>R2: PUT file bytes (direct upload)
Client->>API: POST /clips/{uuid}/confirm
API->>R2: HEAD object (read actual Content-Length for quota recheck)
API->>API: Queue process_clip_artifacts
For YouTube imports, the flow is simpler:
sequenceDiagram
participant Client
participant API
participant Worker
Client->>API: POST /clips/youtube-import {url}
API->>API: fetch_video_info (sync, bounded wait_for) — reject if duration > cap
API->>API: Create Clip + ClipFile records
API->>API: Queue download_youtube_video
API->>Client: {clip_uuid}
Worker->>Worker: Download + process
Worker-->>Client: ClipReadyEvent (SSE)
Stage 2: Process¶
The download_broker handles video processing. Proxy generation runs on a separate proxy_broker / taskiq-proxy-worker so CPU-heavy re-encodes don't block audio extraction for subsequent imports:
flowchart TB
subgraph DW["Download Worker (download_broker)"]
A[Pick up task] --> B[Download video]
B --> C[Extract audio]
C --> D[Generate waveform]
D --> E{Web compatible?}
E -->|Yes| F[Update ClipFile]
E -->|No| G[Enqueue on proxy_broker]
G --> F
F --> H[Publish ClipReadyEvent]
end
subgraph PW["Proxy Worker (proxy_broker)"]
P1[Pick up generate_clip_proxy] --> P2[Chained FFmpeg: 480p H.264 + 10x20 sprite]
P2 --> P3[Upload proxy.mp4 + sprite.jpg]
P3 --> P4[Set proxy_key, sprite_key, sprite_seconds_per_tile]
end
G -.-> P1
The extracted audio is 16kHz mono MP3, optimized for Whisper. The waveform is an array of ~100 peaks per second for timeline visualization.
Stage 3: Analyze¶
Analysis runs on the analysis_broker as a pipeline of steps:
flowchart TB
A[Load Audio] --> B[Transcribe]
B --> C[Detect Silences]
B --> D[Detect False Starts]
B --> P[Detect Profanity]
C --> E[Validate Edits]
D --> E
P --> E
E --> F[Create Edits]
F --> G[Update Project]
G --> H[Publish AnalysisCompleteEvent]
style B fill:#ff3300,color:#fff
style C fill:#ff6633,color:#fff
style D fill:#ff6633,color:#fff
style P fill:#ef4444,color:#fff
Silence, false start, and profanity detection run in parallel since they're independent. The pipeline publishes progress events after each step.
Stage 4: Review¶
This stage happens in the frontend:
flowchart LR
A[Fetch Edits] --> B[Display Timeline]
B --> C{User Action}
C -->|Toggle| D[PATCH /edits/{id}]
C -->|Adjust| D
C -->|Add Cut| F[POST /edits]
C -->|Save Draft| E[POST /drafts]
D --> B
E --> B
F --> B
Users see the transcript with detected edits highlighted. They can: - Toggle edits on/off - Adjust edit boundaries by dragging - Add manual cuts via the ADD CUT button (shown in cyan) - Save drafts with different edit configurations
Edit types: silence (detected pauses), false_start (detected repetitions), profanity (detected swear words), manual (user-created cuts).
Edits also have an action field: cut removes video+audio, mute keeps video but silences/bleeps audio (used for profanity).
Stage 5: Render¶
When the user triggers a render, we snapshot the current edit state:
sequenceDiagram
participant Client
participant API
participant DB
participant Worker
participant R2
Client->>API: POST /exports {name, settings}
API->>DB: Create Export with edit_snapshot
API->>API: Queue render_export
API->>Client: {export_uuid, status: pending}
Worker->>DB: Load Export + edit_snapshot
Worker->>R2: Download clips
Worker->>Worker: Apply cuts (FFmpeg)
Worker->>Worker: Audio processing (optional)
Worker->>R2: Upload rendered video
Worker->>DB: Update Export status
Worker-->>Client: ExportCompleteEvent (SSE)
The snapshot means users can keep editing while a render is in progress - the render uses the frozen state.
Audio Processing (when "Audio Clean" enabled): 1. Noise Reduction - FFT-based removal of background noise (AC, fans, room tone) 2. LUFS Normalization - Adjusts loudness to -14 LUFS (YouTube/Spotify standard)
Stage 6: Download¶
Completed exports live in R2:
sequenceDiagram
participant Client
participant API
participant R2
Client->>API: GET /exports/{id}/download
API->>R2: Generate presigned GET URL
API->>Client: {url, expires_in: 3600}
Client->>R2: GET (direct download)
Asset Editing¶
Assets (user-uploaded videos/images) can be trimmed or have audio extracted. This uses a fire-and-forget pattern - the API returns immediately while processing happens in the background. Trim bounds (start_ms, end_ms, cuts[i].end_ms) are validated against AssetFile.duration_ms at the API layer before the worker is enqueued — out-of-range values return 400. If the asset is still processing (duration_ms IS NULL), the bounds check is skipped and the worker silent-clamps at render.
sequenceDiagram
participant Client
participant API
participant DB
participant Worker
participant R2
Client->>API: POST /assets/{uuid}/edit
Note right of Client: {cuts: [...], extract_audio, save_mode}
API->>DB: Create pending Asset copy
API->>API: Queue edit_asset task
API->>Client: {new_asset_uuid}
Note right of Client: Returns immediately
Client->>Client: Poll asset list (refetchInterval)
Worker->>R2: Download source asset
Worker->>Worker: FFmpeg multi-cut processing
Worker->>R2: Upload result
Worker->>DB: Update Asset status → uploaded
Client->>API: GET /assets (polling)
API->>Client: Asset now has status: uploaded
Key features:
- Multi-cut support: Multiple regions can be removed in a single operation using FFmpeg filter_complex
- Fire-and-forget: New asset created with status: pending, updated to uploaded when done
- Save modes: copy creates a new asset, replace overwrites the original
- Polling: Frontend polls while any asset has status: pending
Event Flow¶
Events tie everything together:
flowchart LR
subgraph Workers
W1[Download]
W2[Analysis]
W3[Render]
W4[Asset Edit]
end
subgraph Redis
PS[(Pub/Sub)]
end
subgraph API
SSE[SSE Endpoint]
end
subgraph Frontend
ES[EventSource]
RQ[React Query]
end
W1 -->|Publish| PS
W2 -->|Publish| PS
W3 -->|Publish| PS
PS -->|Subscribe| SSE
SSE -->|Stream| ES
ES -->|Invalidate| RQ
Note: Asset Edit uses polling instead of SSE since assets are user-scoped (not project-scoped) and updates are infrequent.
Each project has its own channel: project:{uuid}:events. The frontend subscribes when you open a project and invalidates React Query caches when events arrive.
Storage Layout¶
All files live in R2 with a predictable structure:
flowchart TB
subgraph Clips["sapari-raw bucket"]
C1["clips/{prefix}/{uuid}/"]
C1 --> C2[original.mp4]
C1 --> C3[audio.mp3]
C1 --> C4[proxy.mp4]
C1 --> C5[waveform.json]
end
subgraph Exports["sapari-exports bucket"]
E1["exports/{project}/{export}/"]
E1 --> E2[Final Cut v1.mp4]
end
subgraph Assets["sapari-assets bucket"]
A1["assets/{prefix}/{uuid}/"]
A1 --> A2[video.mp4]
A1 --> A3[thumbnail.jpg]
end
The {prefix} is the first 2 characters of the UUID. This helps S3/R2 distribute files across partitions for better performance.
Debugging Tips¶
When something goes wrong, trace through the stages:
flowchart TB
A[Issue Reported] --> B{Which stage?}
B -->|Upload| C[Check ClipFile status]
B -->|Process| D[Check worker logs]
B -->|Analysis| E[Check Whisper API / LLM logs]
B -->|Render| F[Check Export.error_message]
B -->|Events| G[Check Redis pub/sub]
C --> H[PENDING = upload incomplete]
C --> I[PROCESSING = worker running]
C --> J[FAILED = check error_message]