Render Pipeline¶

The render pipeline takes a project with edits and produces a final video. It downloads the source clips, applies cuts with FFmpeg, and uploads the result.

Pipeline Structure¶

flowchart LR
    A[Download Clips] --> B[Apply Cuts]
    B --> C[Upload Export]
    C --> D[Update Export]

    style A fill:#ff3300,color:#fff
    style B fill:#ff3300,color:#fff

Each step is sequential - we need the clips before we can cut them, and we need the cut video before we can upload it.

Triggering a Render¶

sequenceDiagram
    participant Client
    participant API
    participant DB
    participant Redis
    participant Worker
    participant R2

    Client->>API: POST /projects/{uuid}/exports
    API->>DB: Create Export with edit_snapshot
    API->>Redis: Queue render_export
    API->>Client: {export_uuid, status: pending}

    Worker->>Redis: Pick up task
    Worker->>DB: Load Export + edit_snapshot
    Worker->>R2: Download clips
    Worker->>Worker: Apply cuts (FFmpeg)
    Worker->>R2: Upload rendered video
    Worker->>DB: Update Export status
    Worker-->>Client: ExportCompleteEvent (SSE)

POST /api/v1/projects/{project_uuid}/exports
{
    "name": "Final Cut v1"
}

This creates an Export record with a snapshot of the current edits, then queues a render_export task.

Edit Snapshots¶

When you trigger a render, we freeze the current active edits into edit_snapshot:

flowchart TB
    subgraph Project["Project State"]
        E1[Edit 1: active]
        E2[Edit 2: inactive]
        E3[Edit 3: active]
    end

    subgraph Trigger["Render Triggered"]
        S[Snapshot active edits]
    end

    subgraph Export["Export Record"]
        ES["edit_snapshot: [Edit 1, Edit 3]"]
    end

    E1 --> S
    E3 --> S
    S --> ES

    subgraph Later["User keeps editing"]
        E1 -->|Toggle off| E1X[Edit 1: inactive]
        E2 -->|Toggle on| E2X[Edit 2: active]
    end

    ES -.->|Unaffected| R[Render uses frozen snapshot]

{
    "edits": [
        {"start_ms": 1000, "end_ms": 2500, "type": "SILENCE", "action": "cut"},
        {"start_ms": 5000, "end_ms": 6200, "type": "FALSE_START", "action": "cut"},
        {"start_ms": 8000, "end_ms": 8500, "type": "PROFANITY", "action": "mute"}
    ],
    "settings": {
        "resolution": "1080p",
        "audio_censorship": "bleep"
    }
}

This means you can toggle edits after triggering a render without affecting the in-progress export. You can also re-render with different edits by triggering another export.

Steps¶

DownloadClipsStep¶

Downloads all clips for the project and concatenates them if there are multiple.

Query clips in display order
Generate presigned download URLs (1 hour expiry)
Download via httpx streaming (faster than FFmpeg's HTTP input)
If multiple clips, concatenate with FFmpeg

The output is a single video file ready for cutting.

ApplyCutsStep¶

Applies the edit snapshot using FFmpeg's select filter. Edits are separated by action: - CUT edits: Remove video and audio segments - MUTE edits: Keep video, silence or bleep audio

flowchart TB
    A[Read edit_snapshot] --> B[Separate CUT vs MUTE edits]
    B --> C[Calculate segments to keep from CUTs]
    C --> D[Remap MUTE timestamps for cuts]
    D --> E{Audio censorship mode?}
    E -->|none| F[No audio processing]
    E -->|mute| G[Apply volume=0 filter]
    E -->|bleep| H[Mix with 1kHz sine tone]
    F --> I[FFmpeg render]
    G --> I
    H --> I
    I --> J[Output video]

    style G fill:#f59e0b,color:#fff
    style H fill:#ef4444,color:#fff

Read edit_snapshot from the export record
Separate edits by action field (CUT vs MUTE)
Calculate segments to keep (inverse of CUT edits only)
Remap MUTE edit timestamps to account for removed content
Build FFmpeg filter graph based on audio_censorship setting
Run FFmpeg

Mute Mode - Silences audio during profanity:

ffmpeg -i input.mp4 \
  -vf "select='...'" \
  -af "aselect='...',volume=enable='between(t,5,7)':volume=0" \
  output.mp4

Bleep Mode - Overlays 1kHz tone during profanity using filter_complex:

ffmpeg -i input.mp4 -filter_complex "
  [0:v]select='...',setpts=...[vout];
  [0:a]aselect='...',volume=enable='between(t,5,7)':volume=0[main];
  sine=frequency=1000:duration=15.5,aformat=channel_layouts=stereo,
    volume='if(between(t,5,7),0.25,0)':eval=frame[bleep];
  [main][bleep]amix=inputs=2:normalize=0[aout]
" -map "[vout]" -map "[aout]" output.mp4

The bleep tone is a standard 1kHz sine wave at 25% volume, mixed only during mute regions.

Audio Processing (Optional)¶

When the user enables "Audio Clean", two processing filters are applied:

flowchart LR
    A[Source Audio] --> B[Noise Reduction]
    B --> C[LUFS Normalization]
    C --> D[Output Audio]

    style B fill:#3b82f6,color:#fff
    style C fill:#10b981,color:#fff

1. Noise Reduction (afftdn)

FFT-based spectral analysis removes constant background noise: - Fan/AC hum - Room tone - Computer noise

afftdn=nf=-25

The noise floor of -25 dB provides moderate reduction without affecting voice quality. Lower values (e.g., -40) are more aggressive but may introduce artifacts.

2. LUFS Loudness Normalization (loudnorm)

Adjusts overall loudness to broadcast standards using EBU R128:

loudnorm=I=-14:TP=-1.5:LRA=11

Parameter	Value	Purpose
`I` (Integrated)	-14 LUFS	Target loudness (YouTube/Spotify standard)
`TP` (True Peak)	-1.5 dB	Prevents clipping on lossy codecs
`LRA` (Loudness Range)	11	Preserves natural dynamics

Why -14 LUFS? - YouTube normalizes to -14 LUFS (quieter content gets boosted, louder gets reduced) - Spotify uses -14 LUFS for podcasts - Matches typical professional podcast/video loudness

Processing Order

Noise reduction runs BEFORE normalization. This prevents the normalizer from amplifying background noise when boosting quiet audio.

# Combined filter chain
-af "afftdn=nf=-25,loudnorm=I=-14:TP=-1.5:LRA=11"

Asset Overlay Compositing¶

When asset edits with visual_mode=overlay are present, the render pipeline composites them onto the main video using FFmpeg's overlay filter. Each overlay goes through:

loop (images only) → trim → setpts (PTS shift) → scale → hflip → vflip → rotate → colorchannelmixer (opacity) → overlay

Filter	When Applied	Purpose
`loop=-1:size=1:start=0`	Image inputs (no `asset_duration_ms`)	Loop the single frame so the overlay stream fills the window; without this the overlay shows for one frame then the filter holds the last frame, producing a static image regardless of content
`trim=duration=W`	Always	Cap the overlay stream to the enable window duration
`setpts=PTS-STARTPTS+S/TB`	Always	Shift overlay's PTS so its frame 0 arrives at main `t=start_ms`. Without this, a video overlay starts playing at main `t=0`, exhausts before the enable window opens, and freezes on the last frame
`scale`	Always	Size overlay to `overlay_size_percent` of video
`hflip`	`overlay_flip_h=true`	Mirror horizontally
`vflip`	`overlay_flip_v=true`	Mirror vertically
`rotate`	`overlay_rotation_deg > 0`	Rotate by N degrees (converted to radians)
`colorchannelmixer`	`overlay_opacity_percent < 100`	Apply transparency
`overlay`	Always	Position on main video with time-based enable

Multiple overlays are chained sequentially, each composited onto the result of the previous.

Asset B-roll Compositing (REPLACE + INSERT)¶

When asset edits with visual_mode=replace or insert are present, the render uses a two-pass pipeline. Pass 1 applies CUT edits to produce a trimmed main video; Pass 2 splits that trimmed video into main/asset segments and concatenates them via FFmpeg's concat filter.

concat is strict about input uniformity — it rejects the whole filter graph with Failed to configure output pad on Parsed_concat_N if any two inputs differ on:

Video dimensions or sample aspect ratio. Every segment is scaled and padded to the exact output_width × output_height computed from the main's aspect, then setsar=1 forces square pixels. The variable-width scale=-2:'min(H,ih)' fallback is only used when output dimensions aren't known; within the B-roll path they always are.
Audio sample rate, channel layout, or sample format. Every audio sub-chain (real [0:a]atrim, real [N:a]atrim, and synthesized anullsrc) ends with aresample=48000,aformat=sample_fmts=fltp:sample_rates=48000:channel_layouts=stereo so heterogeneous sources (e.g. a 44.1kHz mono screen recording + a 48kHz stereo broll) concatenate cleanly.

Image inputs (PNG/JPEG assets in REPLACE or INSERT) have a single frame at PTS=0. Without intervention, concat emits one frame for a multi-second window and the user sees the image "blink". Image segments are detected via absent asset_duration_ms and prepend loop=-1:size=1:start=0,trim=duration=D to fill the window.

Silent inputs — screen recordings without microphone, muted brolls, PNGs — are probed via has_audio_stream() (ffprobe) before the command is built. The main's probe feeds main_has_audio; asset probes feed inputs_without_audio: frozenset[int]. Any segment whose underlying input has no audio stream synthesizes anullsrc instead of referencing [N:a] — FFmpeg would otherwise reject the filter graph with Stream specifier ':a' ... matches no streams.

REPLACE audio semantics. REPLACE swaps video visually but keeps the main's audio continuous. VideoReplaceSegment.audio_mode defaults to "original_only" for REPLACE segments, pulling audio from [0:a]atrim={output_start}:{output_end} rather than from the asset. The other audio modes (asset_only, mix, none) still fall through to the pre-existing asset-audio or silence branches; original_only is the only new branch. INSERT keeps its legacy default of "mix" because INSERT adds a new segment of time and asset audio is the natural choice there.

Main Video Transforms¶

The main video can be flipped horizontally and/or vertically. These transforms are applied in the FFmpeg filter chain after scaling/padding but before subtitle burn-in:

select → setpts → scale/pad → hflip/vflip → subtitles

Filter	When Applied	Purpose
`hflip`	`video_flip_h=true`	Mirror video horizontally
`vflip`	`video_flip_v=true`	Mirror video vertically

Both transforms also apply in the live preview via CSS scale(-1, 1) / scale(1, -1) on the <video> element only (overlays and controls remain unaffected).

Main Volume Control¶

The main_volume_percent setting (0-100%) adjusts the main video's audio level before mute/normalization processing:

# Volume at 50%
-af "volume=0.50"

Processing order: volume → mute censorship → noise reduction → LUFS normalization.

UploadExportStep¶

Uploads the rendered video to R2:

Bucket: STORAGE_BUCKET_EXPORTS
Key: exports/{project_uuid}/{export_uuid}/{filename}.mp4

UpdateExportStep¶

Finalizes the export record:

Sets status to COMPLETED
Stores storage_key
Publishes ExportCompleteEvent

Timeouts¶

Long videos need time to process:

Step	Timeout
Download clips	15 minutes
Apply cuts	30 minutes
Upload export	10 minutes
Total pipeline	1 hour

If a step exceeds its timeout, the export fails and we set status to FAILED with an error message.

Error Handling¶

flowchart TB
    A[Step Executing] --> B{Success?}
    B -->|Yes| C[Next Step]
    B -->|No| D[Catch Exception]
    D --> E[Set status = FAILED]
    E --> F[Store error_message]
    F --> G[Publish ExportFailedEvent]
    G --> H[Cleanup temp files]
    H --> I[Frontend shows error]
    I --> J{User action}
    J -->|Retry| A

If any step fails:

The pipeline catches the exception
Sets export.status = FAILED
Stores the error message in export.error_message
Publishes ExportFailedEvent
Cleans up temporary files

The frontend shows the error to the user so they can retry.

Downloading Exports¶

Once complete, users can download via:

GET /api/v1/exports/{export_uuid}/download

This returns a presigned download URL valid for 1 hour.

Key Files¶

Component	Location
Pipeline definition	`backend/src/workers/render/pipeline.py`
Task entry	`backend/src/workers/render/tasks.py`
Pipeline steps	`backend/src/workers/render/steps.py`
FFmpeg utilities	`backend/src/workers/render/ffmpeg/`

← API Endpoints Download Pipeline →