Render Pipeline¶
The render pipeline takes a project with edits and produces a final video. It downloads the source clips, applies cuts with FFmpeg, and uploads the result.
Pipeline Structure¶
flowchart LR
A[Download Clips] --> B[Apply Cuts]
B --> C[Upload Export]
C --> D[Update Export]
style A fill:#ff3300,color:#fff
style B fill:#ff3300,color:#fff
Each step is sequential - we need the clips before we can cut them, and we need the cut video before we can upload it.
Triggering a Render¶
sequenceDiagram
participant Client
participant API
participant DB
participant Redis
participant Worker
participant R2
Client->>API: POST /projects/{uuid}/exports
API->>DB: Create Export with edit_snapshot
API->>Redis: Queue render_export
API->>Client: {export_uuid, status: pending}
Worker->>Redis: Pick up task
Worker->>DB: Load Export + edit_snapshot
Worker->>R2: Download clips
Worker->>Worker: Apply cuts (FFmpeg)
Worker->>R2: Upload rendered video
Worker->>DB: Update Export status
Worker-->>Client: ExportCompleteEvent (SSE)
This creates an Export record with a snapshot of the current edits, then queues a render_export task.
Edit Snapshots¶
When you trigger a render, we freeze the current active edits into edit_snapshot:
flowchart TB
subgraph Project["Project State"]
E1[Edit 1: active]
E2[Edit 2: inactive]
E3[Edit 3: active]
end
subgraph Trigger["Render Triggered"]
S[Snapshot active edits]
end
subgraph Export["Export Record"]
ES["edit_snapshot: [Edit 1, Edit 3]"]
end
E1 --> S
E3 --> S
S --> ES
subgraph Later["User keeps editing"]
E1 -->|Toggle off| E1X[Edit 1: inactive]
E2 -->|Toggle on| E2X[Edit 2: active]
end
ES -.->|Unaffected| R[Render uses frozen snapshot]
{
"edits": [
{"start_ms": 1000, "end_ms": 2500, "type": "SILENCE", "action": "cut"},
{"start_ms": 5000, "end_ms": 6200, "type": "FALSE_START", "action": "cut"},
{"start_ms": 8000, "end_ms": 8500, "type": "PROFANITY", "action": "mute"}
],
"settings": {
"resolution": "1080p",
"audio_censorship": "bleep"
}
}
This means you can toggle edits after triggering a render without affecting the in-progress export. You can also re-render with different edits by triggering another export.
Steps¶
DownloadClipsStep¶
Downloads all clips for the project and concatenates them if there are multiple.
- Query clips in display order
- Generate presigned download URLs (1 hour expiry)
- Download via httpx streaming (faster than FFmpeg's HTTP input)
- If multiple clips, concatenate with FFmpeg
The output is a single video file ready for cutting.
ApplyCutsStep¶
Applies the edit snapshot using FFmpeg's select filter. Edits are separated by action: - CUT edits: Remove video and audio segments - MUTE edits: Keep video, silence or bleep audio
flowchart TB
A[Read edit_snapshot] --> B[Separate CUT vs MUTE edits]
B --> C[Calculate segments to keep from CUTs]
C --> D[Remap MUTE timestamps for cuts]
D --> E{Audio censorship mode?}
E -->|none| F[No audio processing]
E -->|mute| G[Apply volume=0 filter]
E -->|bleep| H[Mix with 1kHz sine tone]
F --> I[FFmpeg render]
G --> I
H --> I
I --> J[Output video]
style G fill:#f59e0b,color:#fff
style H fill:#ef4444,color:#fff
- Read
edit_snapshotfrom the export record - Separate edits by
actionfield (CUT vs MUTE) - Calculate segments to keep (inverse of CUT edits only)
- Remap MUTE edit timestamps to account for removed content
- Build FFmpeg filter graph based on
audio_censorshipsetting - Run FFmpeg
Mute Mode - Silences audio during profanity:
ffmpeg -i input.mp4 \
-vf "select='...'" \
-af "aselect='...',volume=enable='between(t,5,7)':volume=0" \
output.mp4
Bleep Mode - Overlays 1kHz tone during profanity using filter_complex:
ffmpeg -i input.mp4 -filter_complex "
[0:v]select='...',setpts=...[vout];
[0:a]aselect='...',volume=enable='between(t,5,7)':volume=0[main];
sine=frequency=1000:duration=15.5,aformat=channel_layouts=stereo,
volume='if(between(t,5,7),0.25,0)':eval=frame[bleep];
[main][bleep]amix=inputs=2:normalize=0[aout]
" -map "[vout]" -map "[aout]" output.mp4
The bleep tone is a standard 1kHz sine wave at 25% volume, mixed only during mute regions.
Audio Processing (Optional)¶
When the user enables "Audio Clean", two processing filters are applied:
flowchart LR
A[Source Audio] --> B[Noise Reduction]
B --> C[LUFS Normalization]
C --> D[Output Audio]
style B fill:#3b82f6,color:#fff
style C fill:#10b981,color:#fff
1. Noise Reduction (afftdn)
FFT-based spectral analysis removes constant background noise: - Fan/AC hum - Room tone - Computer noise
The noise floor of -25 dB provides moderate reduction without affecting voice quality. Lower values (e.g., -40) are more aggressive but may introduce artifacts.
2. LUFS Loudness Normalization (loudnorm)
Adjusts overall loudness to broadcast standards using EBU R128:
| Parameter | Value | Purpose |
|---|---|---|
I (Integrated) |
-14 LUFS | Target loudness (YouTube/Spotify standard) |
TP (True Peak) |
-1.5 dB | Prevents clipping on lossy codecs |
LRA (Loudness Range) |
11 | Preserves natural dynamics |
Why -14 LUFS? - YouTube normalizes to -14 LUFS (quieter content gets boosted, louder gets reduced) - Spotify uses -14 LUFS for podcasts - Matches typical professional podcast/video loudness
Processing Order
Noise reduction runs BEFORE normalization. This prevents the normalizer from amplifying background noise when boosting quiet audio.
Asset Overlay Compositing¶
When asset edits with visual_mode=overlay are present, the render pipeline composites them onto the main video using FFmpeg's overlay filter. Each overlay goes through:
loop (images only) → trim → setpts (PTS shift) → scale → hflip → vflip → rotate → colorchannelmixer (opacity) → overlay
| Filter | When Applied | Purpose |
|---|---|---|
loop=-1:size=1:start=0 |
Image inputs (no asset_duration_ms) |
Loop the single frame so the overlay stream fills the window; without this the overlay shows for one frame then the filter holds the last frame, producing a static image regardless of content |
trim=duration=W |
Always | Cap the overlay stream to the enable window duration |
setpts=PTS-STARTPTS+S/TB |
Always | Shift overlay's PTS so its frame 0 arrives at main t=start_ms. Without this, a video overlay starts playing at main t=0, exhausts before the enable window opens, and freezes on the last frame |
scale |
Always | Size overlay to overlay_size_percent of video |
hflip |
overlay_flip_h=true |
Mirror horizontally |
vflip |
overlay_flip_v=true |
Mirror vertically |
rotate |
overlay_rotation_deg > 0 |
Rotate by N degrees (converted to radians) |
colorchannelmixer |
overlay_opacity_percent < 100 |
Apply transparency |
overlay |
Always | Position on main video with time-based enable |
Multiple overlays are chained sequentially, each composited onto the result of the previous.
Asset B-roll Compositing (REPLACE + INSERT)¶
When asset edits with visual_mode=replace or insert are present, the render uses a two-pass pipeline. Pass 1 applies CUT edits to produce a trimmed main video; Pass 2 splits that trimmed video into main/asset segments and concatenates them via FFmpeg's concat filter.
concat is strict about input uniformity — it rejects the whole filter graph with Failed to configure output pad on Parsed_concat_N if any two inputs differ on:
- Video dimensions or sample aspect ratio. Every segment is scaled and padded to the exact
output_width × output_heightcomputed from the main's aspect, thensetsar=1forces square pixels. The variable-widthscale=-2:'min(H,ih)'fallback is only used when output dimensions aren't known; within the B-roll path they always are. - Audio sample rate, channel layout, or sample format. Every audio sub-chain (real
[0:a]atrim, real[N:a]atrim, and synthesizedanullsrc) ends witharesample=48000,aformat=sample_fmts=fltp:sample_rates=48000:channel_layouts=stereoso heterogeneous sources (e.g. a 44.1kHz mono screen recording + a 48kHz stereo broll) concatenate cleanly.
Image inputs (PNG/JPEG assets in REPLACE or INSERT) have a single frame at PTS=0. Without intervention, concat emits one frame for a multi-second window and the user sees the image "blink". Image segments are detected via absent asset_duration_ms and prepend loop=-1:size=1:start=0,trim=duration=D to fill the window.
Silent inputs — screen recordings without microphone, muted brolls, PNGs — are probed via has_audio_stream() (ffprobe) before the command is built. The main's probe feeds main_has_audio; asset probes feed inputs_without_audio: frozenset[int]. Any segment whose underlying input has no audio stream synthesizes anullsrc instead of referencing [N:a] — FFmpeg would otherwise reject the filter graph with Stream specifier ':a' ... matches no streams.
REPLACE audio semantics. REPLACE swaps video visually but keeps the main's audio continuous. VideoReplaceSegment.audio_mode defaults to "original_only" for REPLACE segments, pulling audio from [0:a]atrim={output_start}:{output_end} rather than from the asset. The other audio modes (asset_only, mix, none) still fall through to the pre-existing asset-audio or silence branches; original_only is the only new branch. INSERT keeps its legacy default of "mix" because INSERT adds a new segment of time and asset audio is the natural choice there.
Main Video Transforms¶
The main video can be flipped horizontally and/or vertically. These transforms are applied in the FFmpeg filter chain after scaling/padding but before subtitle burn-in:
| Filter | When Applied | Purpose |
|---|---|---|
hflip |
video_flip_h=true |
Mirror video horizontally |
vflip |
video_flip_v=true |
Mirror video vertically |
Both transforms also apply in the live preview via CSS scale(-1, 1) / scale(1, -1) on the <video> element only (overlays and controls remain unaffected).
Main Volume Control¶
The main_volume_percent setting (0-100%) adjusts the main video's audio level before mute/normalization processing:
Processing order: volume → mute censorship → noise reduction → LUFS normalization.
UploadExportStep¶
Uploads the rendered video to R2:
- Bucket:
STORAGE_BUCKET_EXPORTS - Key:
exports/{project_uuid}/{export_uuid}/{filename}.mp4
UpdateExportStep¶
Finalizes the export record:
- Sets
statustoCOMPLETED - Stores
storage_key - Publishes
ExportCompleteEvent
Timeouts¶
Long videos need time to process:
| Step | Timeout |
|---|---|
| Download clips | 15 minutes |
| Apply cuts | 30 minutes |
| Upload export | 10 minutes |
| Total pipeline | 1 hour |
If a step exceeds its timeout, the export fails and we set status to FAILED with an error message.
Error Handling¶
flowchart TB
A[Step Executing] --> B{Success?}
B -->|Yes| C[Next Step]
B -->|No| D[Catch Exception]
D --> E[Set status = FAILED]
E --> F[Store error_message]
F --> G[Publish ExportFailedEvent]
G --> H[Cleanup temp files]
H --> I[Frontend shows error]
I --> J{User action}
J -->|Retry| A
If any step fails:
- The pipeline catches the exception
- Sets
export.status = FAILED - Stores the error message in
export.error_message - Publishes
ExportFailedEvent - Cleans up temporary files
The frontend shows the error to the user so they can retry.
Downloading Exports¶
Once complete, users can download via:
This returns a presigned download URL valid for 1 hour.
Key Files¶
| Component | Location |
|---|---|
| Pipeline definition | backend/src/workers/render/pipeline.py |
| Task entry | backend/src/workers/render/tasks.py |
| Pipeline steps | backend/src/workers/render/steps.py |
| FFmpeg utilities | backend/src/workers/render/ffmpeg/ |