Analysis Pipeline¶

The analysis pipeline takes a video and produces edits (regions to cut). It runs Whisper for transcription, then detects silences from word gaps and false starts with an LLM.

Pipeline Structure¶

The pipeline is a DAG defined in backend/src/workers/analysis/pipeline.py:

flowchart TB
    A[Load Audio] --> B[Transcribe]
    B --> C[Detect Silences]
    B --> D[Detect False Starts]
    B --> IT[Improve Transcript]
    B --> IA[Insert Fixed Assets]
    IT --> P[Censor Profanity]
    C --> E[Validate Edits]
    D --> E
    E --> F[Create Edits]
    P --> F
    IA --> F
    F --> G[Update Project]

    style B fill:#ff3300,color:#fff
    style C fill:#ff6633,color:#fff
    style D fill:#ff6633,color:#fff
    style P fill:#ef4444,color:#fff

Silence detection, false start detection, transcript improvement, and fixed asset insertion run in parallel after transcription. Profanity censoring runs after transcript improvement (so the matcher sees natural text). All branches merge at create edits.

Triggering Analysis¶

Analysis is triggered via the API:

sequenceDiagram
    participant Client
    participant API
    participant Redis
    participant Worker
    participant Whisper as Whisper API
    participant LLM as DeepSeek / GPT-5

    Client->>API: POST /projects/{uuid}/analyze
    API->>Redis: Queue analyze_project
    API->>Client: 202 Accepted

    Worker->>Redis: Pick up task
    Worker->>Worker: Load audio from R2
    Worker->>Whisper: Send audio
    Whisper->>Worker: Transcription + word timing

    par Parallel Detection
        Worker->>Worker: Detect silences
    and
        Worker->>LLM: Detect false starts
        LLM->>Worker: False start regions
    end

    Worker->>Worker: Validate & merge edits
    Worker->>API: Create Edit records
    Worker-->>Client: AnalysisCompleteEvent (SSE)

POST /api/v1/projects/{project_uuid}/analyze
{
    "pacing_level": 50,
    "false_start_sensitivity": 50,
    "language": "en"
}

This queues an analyze_project task on the analysis broker. The task runs the pipeline and publishes events as steps complete.

Steps¶

LoadAudioStep¶

Downloads the pre-extracted audio from R2. During clip processing, we extract audio at 16kHz mono (Whisper-compatible) and store it alongside the video.

Input: project_uuid
Output: Path to local audio file

TranscribeStep¶

Sends the audio to OpenAI's Whisper API.

Input: Audio file path
Output: TranscriptionResult with words, timestamps, and duration

The result includes word-level timing like:

[
    {"word": "Hello", "start": 0.0, "end": 0.5},
    {"word": "world", "start": 0.8, "end": 1.2},
]

DetectSilencesStep¶

Analyzes gaps between words to find pauses worth cutting.

Input: Transcription + audio file
Output: list[SilenceRegion] with start/end ms and confidence

The pacing_level parameter (0-100) controls sensitivity. Higher values mean more aggressive silence detection - a fast-paced video might use 70, a contemplative one might use 30.

We also analyze the audio waveform to confirm silences aren't just transcription gaps but actual quiet periods.

DetectFalseStartsStep¶

Uses an LLM to find repeated phrases where someone started a sentence, stopped, and tried again.

Input: Transcription text
Output: list[FalseStartRegion] with abandoned/completed text

The LLM prompt asks it to find patterns like "I think... I think we should" where the first "I think" should be cut. For long transcripts, words are split into overlapping chunks and processed independently.

The false_start_sensitivity parameter (0-100) controls how aggressive detection is. Higher values mean more detections (lower confidence threshold).

ValidateEditsStep¶

Cleans up the detected edits before saving:

Merges overlapping edits (within 50ms proximity)
Prefers FALSE_START type when merging (more significant)
Optionally uses LLM to judge edit quality

CreateEditsStep¶

Bulk creates Edit records in the database, tagged with the current analysis_run_id. Old runs' edits are preserved (no clearing).

Edit(
    project_id=project.id,
    analysis_run_id=run.uuid,  # Tags edit to this run
    type=EditType.SILENCE,  # or FALSE_START, PROFANITY, ASSET
    start_ms=1000,
    end_ms=2500,
    active=True,
    confidence=0.85,
    reason="Word gap of 1.5s detected between words",
    reason_tag="word_gap",
)

UpdateProjectStep¶

Marks the project as analyzed and stores metadata:

Sets status to ANALYZED
Stores transcript text in transcript (project convenience copy)
Stores word timing in transcript_words
Generates caption lines tagged with analysis_run_id
Publishes AnalysisCompleteEvent

The task then stores transcript + counts on the AnalysisRun record and sets project.active_run_id to the new run.

Edit Types¶

flowchart LR
    subgraph Detection["Detection Sources"]
        W[Word Gaps] --> S[SILENCE]
        WF[Waveform Analysis] --> S
        LLM[LLM Analysis] --> F[FALSE_START]
    end

    subgraph Output["Edit Records"]
        S --> E[Edit Record]
        F --> E
    end

    E --> V{User Review}
    V -->|Toggle On| R[Include in Render]
    V -->|Toggle Off| X[Exclude from Render]

Type	Source	Action	Description
`SILENCE`	Word gaps + waveform	CUT	Pauses in speech
`FALSE_START`	LLM analysis	CUT	Repeated/abandoned phrases
`PROFANITY`	Dictionary matcher	MUTE	Words to censor (bleep/silence)

Users can toggle edits on/off before rendering. Only active edits are applied during render.

Key Files¶

Component	Location
Pipeline definition	`backend/src/workers/analysis/pipeline.py`
Task entry	`backend/src/workers/analysis/tasks.py`
Silence detection	`backend/src/workers/analysis/silence/`
False start detection	`backend/src/workers/analysis/false_starts/`
Edit creation	`backend/src/workers/analysis/edits/step.py`
Transcription	`backend/src/workers/analysis/transcription/step.py`

← Workers Detection →