Skip to content

Data Flow

This page walks through how data moves through Sapari from upload to final export. Understanding this flow helps when debugging issues or adding new features.

The Complete Journey

A video goes through several stages in Sapari:

flowchart LR
    A[Upload] --> B[Process]
    B --> C[Analyze]
    C --> D[Review]
    D --> E[Render]
    E --> F[Download]

    style A fill:#ff3300,color:#fff
    style F fill:#ff3300,color:#fff

Each stage involves different components, but the pattern is consistent: the API receives a request, queues a background task via RabbitMQ, and publishes events via Redis pub/sub when done.

Stage 1: Upload

The upload process uses presigned URLs so clients upload directly to R2 without going through our servers.

sequenceDiagram
    participant Client
    participant API
    participant DB
    participant R2

    Client->>API: POST /clips/presign
    API->>DB: Create Clip + ClipFile records
    API->>R2: Generate presigned PUT URL
    API->>Client: {upload_url, content_type, clip_uuid}
    Client->>R2: PUT file bytes (direct upload)
    Client->>API: POST /clips/{uuid}/confirm
    API->>R2: HEAD object (read actual Content-Length for quota recheck)
    API->>API: Queue process_clip_artifacts

For YouTube imports, the flow is simpler:

sequenceDiagram
    participant Client
    participant API
    participant Worker

    Client->>API: POST /clips/youtube-import {url}
    API->>API: fetch_video_info (sync, bounded wait_for) — reject if duration > cap
    API->>API: Create Clip + ClipFile records
    API->>API: Queue download_youtube_video
    API->>Client: {clip_uuid}
    Worker->>Worker: Download + process
    Worker-->>Client: ClipReadyEvent (SSE)

Stage 2: Process

The download_broker handles video processing. Proxy generation runs on a separate proxy_broker / taskiq-proxy-worker so CPU-heavy re-encodes don't block audio extraction for subsequent imports:

flowchart TB
    subgraph DW["Download Worker (download_broker)"]
        A[Pick up task] --> B[Download video]
        B --> C[Extract audio]
        C --> D[Generate waveform]
        D --> E{Web compatible?}
        E -->|Yes| F[Update ClipFile]
        E -->|No| G[Enqueue on proxy_broker]
        G --> F
        F --> H[Publish ClipReadyEvent]
    end
    subgraph PW["Proxy Worker (proxy_broker)"]
        P1[Pick up generate_clip_proxy] --> P2[Chained FFmpeg: 480p H.264 + 10x20 sprite]
        P2 --> P3[Upload proxy.mp4 + sprite.jpg]
        P3 --> P4[Set proxy_key, sprite_key, sprite_seconds_per_tile]
    end
    G -.-> P1

The extracted audio is 16kHz mono MP3, optimized for Whisper. The waveform is an array of ~100 peaks per second for timeline visualization.

Stage 3: Analyze

Analysis runs on the analysis_broker as a pipeline of steps:

flowchart TB
    A[Load Audio] --> B[Transcribe]
    B --> C[Detect Silences]
    B --> D[Detect False Starts]
    B --> P[Detect Profanity]
    C --> E[Validate Edits]
    D --> E
    P --> E
    E --> F[Create Edits]
    F --> G[Update Project]
    G --> H[Publish AnalysisCompleteEvent]

    style B fill:#ff3300,color:#fff
    style C fill:#ff6633,color:#fff
    style D fill:#ff6633,color:#fff
    style P fill:#ef4444,color:#fff

Silence, false start, and profanity detection run in parallel since they're independent. The pipeline publishes progress events after each step.

Stage 4: Review

This stage happens in the frontend:

flowchart LR
    A[Fetch Edits] --> B[Display Timeline]
    B --> C{User Action}
    C -->|Toggle| D[PATCH /edits/{id}]
    C -->|Adjust| D
    C -->|Add Cut| F[POST /edits]
    C -->|Save Draft| E[POST /drafts]
    D --> B
    E --> B
    F --> B

Users see the transcript with detected edits highlighted. They can: - Toggle edits on/off - Adjust edit boundaries by dragging - Add manual cuts via the ADD CUT button (shown in cyan) - Save drafts with different edit configurations

Edit types: silence (detected pauses), false_start (detected repetitions), profanity (detected swear words), manual (user-created cuts).

Edits also have an action field: cut removes video+audio, mute keeps video but silences/bleeps audio (used for profanity).

Stage 5: Render

When the user triggers a render, we snapshot the current edit state:

sequenceDiagram
    participant Client
    participant API
    participant DB
    participant Worker
    participant R2

    Client->>API: POST /exports {name, settings}
    API->>DB: Create Export with edit_snapshot
    API->>API: Queue render_export
    API->>Client: {export_uuid, status: pending}

    Worker->>DB: Load Export + edit_snapshot
    Worker->>R2: Download clips
    Worker->>Worker: Apply cuts (FFmpeg)
    Worker->>Worker: Audio processing (optional)
    Worker->>R2: Upload rendered video
    Worker->>DB: Update Export status
    Worker-->>Client: ExportCompleteEvent (SSE)

The snapshot means users can keep editing while a render is in progress - the render uses the frozen state.

Audio Processing (when "Audio Clean" enabled): 1. Noise Reduction - FFT-based removal of background noise (AC, fans, room tone) 2. LUFS Normalization - Adjusts loudness to -14 LUFS (YouTube/Spotify standard)

Stage 6: Download

Completed exports live in R2:

sequenceDiagram
    participant Client
    participant API
    participant R2

    Client->>API: GET /exports/{id}/download
    API->>R2: Generate presigned GET URL
    API->>Client: {url, expires_in: 3600}
    Client->>R2: GET (direct download)

Asset Editing

Assets (user-uploaded videos/images) can be trimmed or have audio extracted. This uses a fire-and-forget pattern - the API returns immediately while processing happens in the background. Trim bounds (start_ms, end_ms, cuts[i].end_ms) are validated against AssetFile.duration_ms at the API layer before the worker is enqueued — out-of-range values return 400. If the asset is still processing (duration_ms IS NULL), the bounds check is skipped and the worker silent-clamps at render.

sequenceDiagram
    participant Client
    participant API
    participant DB
    participant Worker
    participant R2

    Client->>API: POST /assets/{uuid}/edit
    Note right of Client: {cuts: [...], extract_audio, save_mode}
    API->>DB: Create pending Asset copy
    API->>API: Queue edit_asset task
    API->>Client: {new_asset_uuid}
    Note right of Client: Returns immediately

    Client->>Client: Poll asset list (refetchInterval)

    Worker->>R2: Download source asset
    Worker->>Worker: FFmpeg multi-cut processing
    Worker->>R2: Upload result
    Worker->>DB: Update Asset status → uploaded

    Client->>API: GET /assets (polling)
    API->>Client: Asset now has status: uploaded

Key features: - Multi-cut support: Multiple regions can be removed in a single operation using FFmpeg filter_complex - Fire-and-forget: New asset created with status: pending, updated to uploaded when done - Save modes: copy creates a new asset, replace overwrites the original - Polling: Frontend polls while any asset has status: pending

Event Flow

Events tie everything together:

flowchart LR
    subgraph Workers
        W1[Download]
        W2[Analysis]
        W3[Render]
        W4[Asset Edit]
    end

    subgraph Redis
        PS[(Pub/Sub)]
    end

    subgraph API
        SSE[SSE Endpoint]
    end

    subgraph Frontend
        ES[EventSource]
        RQ[React Query]
    end

    W1 -->|Publish| PS
    W2 -->|Publish| PS
    W3 -->|Publish| PS
    PS -->|Subscribe| SSE
    SSE -->|Stream| ES
    ES -->|Invalidate| RQ

Note: Asset Edit uses polling instead of SSE since assets are user-scoped (not project-scoped) and updates are infrequent.

Each project has its own channel: project:{uuid}:events. The frontend subscribes when you open a project and invalidates React Query caches when events arrive.

Storage Layout

All files live in R2 with a predictable structure:

flowchart TB
    subgraph Clips["sapari-raw bucket"]
        C1["clips/{prefix}/{uuid}/"]
        C1 --> C2[original.mp4]
        C1 --> C3[audio.mp3]
        C1 --> C4[proxy.mp4]
        C1 --> C5[waveform.json]
    end

    subgraph Exports["sapari-exports bucket"]
        E1["exports/{project}/{export}/"]
        E1 --> E2[Final Cut v1.mp4]
    end

    subgraph Assets["sapari-assets bucket"]
        A1["assets/{prefix}/{uuid}/"]
        A1 --> A2[video.mp4]
        A1 --> A3[thumbnail.jpg]
    end

The {prefix} is the first 2 characters of the UUID. This helps S3/R2 distribute files across partitions for better performance.

Debugging Tips

When something goes wrong, trace through the stages:

flowchart TB
    A[Issue Reported] --> B{Which stage?}
    B -->|Upload| C[Check ClipFile status]
    B -->|Process| D[Check worker logs]
    B -->|Analysis| E[Check Whisper API / LLM logs]
    B -->|Render| F[Check Export.error_message]
    B -->|Events| G[Check Redis pub/sub]

    C --> H[PENDING = upload incomplete]
    C --> I[PROCESSING = worker running]
    C --> J[FAILED = check error_message]

← Overview Storage →