Skip to content

Analysis Pipeline

The analysis pipeline takes a video and produces edits (regions to cut). It runs Whisper for transcription, then detects silences from word gaps and false starts with an LLM.

Pipeline Structure

The pipeline is a DAG defined in backend/src/workers/analysis/pipeline.py:

flowchart TB
    A[Load Audio] --> B[Transcribe]
    B --> C[Detect Silences]
    B --> D[Detect False Starts]
    B --> IT[Improve Transcript]
    B --> IA[Insert Fixed Assets]
    IT --> P[Censor Profanity]
    C --> E[Validate Edits]
    D --> E
    E --> F[Create Edits]
    P --> F
    IA --> F
    F --> G[Update Project]

    style B fill:#ff3300,color:#fff
    style C fill:#ff6633,color:#fff
    style D fill:#ff6633,color:#fff
    style P fill:#ef4444,color:#fff

Silence detection, false start detection, transcript improvement, and fixed asset insertion run in parallel after transcription. Profanity censoring runs after transcript improvement (so the matcher sees natural text). All branches merge at create edits.

Triggering Analysis

Analysis is triggered via the API:

sequenceDiagram
    participant Client
    participant API
    participant Redis
    participant Worker
    participant Whisper as Whisper API
    participant LLM as DeepSeek / GPT-5

    Client->>API: POST /projects/{uuid}/analyze
    API->>Redis: Queue analyze_project
    API->>Client: 202 Accepted

    Worker->>Redis: Pick up task
    Worker->>Worker: Load audio from R2
    Worker->>Whisper: Send audio
    Whisper->>Worker: Transcription + word timing

    par Parallel Detection
        Worker->>Worker: Detect silences
    and
        Worker->>LLM: Detect false starts
        LLM->>Worker: False start regions
    end

    Worker->>Worker: Validate & merge edits
    Worker->>API: Create Edit records
    Worker-->>Client: AnalysisCompleteEvent (SSE)
POST /api/v1/projects/{project_uuid}/analyze
{
    "pacing_level": 50,
    "false_start_sensitivity": 50,
    "language": "en"
}

This queues an analyze_project task on the analysis broker. The task runs the pipeline and publishes events as steps complete.

Steps

LoadAudioStep

Downloads the pre-extracted audio from R2. During clip processing, we extract audio at 16kHz mono (Whisper-compatible) and store it alongside the video.

  • Input: project_uuid
  • Output: Path to local audio file

TranscribeStep

Sends the audio to OpenAI's Whisper API.

  • Input: Audio file path
  • Output: TranscriptionResult with words, timestamps, and duration

The result includes word-level timing like:

[
    {"word": "Hello", "start": 0.0, "end": 0.5},
    {"word": "world", "start": 0.8, "end": 1.2},
]

DetectSilencesStep

Analyzes gaps between words to find pauses worth cutting.

  • Input: Transcription + audio file
  • Output: list[SilenceRegion] with start/end ms and confidence

The pacing_level parameter (0-100) controls sensitivity. Higher values mean more aggressive silence detection - a fast-paced video might use 70, a contemplative one might use 30.

We also analyze the audio waveform to confirm silences aren't just transcription gaps but actual quiet periods.

DetectFalseStartsStep

Uses an LLM to find repeated phrases where someone started a sentence, stopped, and tried again.

  • Input: Transcription text
  • Output: list[FalseStartRegion] with abandoned/completed text

The LLM prompt asks it to find patterns like "I think... I think we should" where the first "I think" should be cut. For long transcripts, words are split into overlapping chunks and processed independently.

The false_start_sensitivity parameter (0-100) controls how aggressive detection is. Higher values mean more detections (lower confidence threshold).

ValidateEditsStep

Cleans up the detected edits before saving:

  1. Merges overlapping edits (within 50ms proximity)
  2. Prefers FALSE_START type when merging (more significant)
  3. Optionally uses LLM to judge edit quality

CreateEditsStep

Bulk creates Edit records in the database, tagged with the current analysis_run_id. Old runs' edits are preserved (no clearing).

Edit(
    project_id=project.id,
    analysis_run_id=run.uuid,  # Tags edit to this run
    type=EditType.SILENCE,  # or FALSE_START, PROFANITY, ASSET
    start_ms=1000,
    end_ms=2500,
    active=True,
    confidence=0.85,
    reason="Word gap of 1.5s detected between words",
    reason_tag="word_gap",
)

UpdateProjectStep

Marks the project as analyzed and stores metadata:

  • Sets status to ANALYZED
  • Stores transcript text in transcript (project convenience copy)
  • Stores word timing in transcript_words
  • Generates caption lines tagged with analysis_run_id
  • Publishes AnalysisCompleteEvent

The task then stores transcript + counts on the AnalysisRun record and sets project.active_run_id to the new run.

Edit Types

flowchart LR
    subgraph Detection["Detection Sources"]
        W[Word Gaps] --> S[SILENCE]
        WF[Waveform Analysis] --> S
        LLM[LLM Analysis] --> F[FALSE_START]
    end

    subgraph Output["Edit Records"]
        S --> E[Edit Record]
        F --> E
    end

    E --> V{User Review}
    V -->|Toggle On| R[Include in Render]
    V -->|Toggle Off| X[Exclude from Render]
Type Source Action Description
SILENCE Word gaps + waveform CUT Pauses in speech
FALSE_START LLM analysis CUT Repeated/abandoned phrases
PROFANITY Dictionary matcher MUTE Words to censor (bleep/silence)

Users can toggle edits on/off before rendering. Only active edits are applied during render.

Key Files

Component Location
Pipeline definition backend/src/workers/analysis/pipeline.py
Task entry backend/src/workers/analysis/tasks.py
Silence detection backend/src/workers/analysis/silence/
False start detection backend/src/workers/analysis/false_starts/
Edit creation backend/src/workers/analysis/edits/step.py
Transcription backend/src/workers/analysis/transcription/step.py

← Workers Detection →