What It Does
Cadence analyzes spoken audio using WhisperX AI and automatically places a labeled, frame-accurate marker at the start of every spoken word in your After Effects timeline. Select an audio layer, click “Generate word markers,” and your timeline becomes a readable script you can animate against directly.
The problem it solves is real: manually placing markers by ear is slow, imprecise, and pulls you out of the creative process. Cadence removes that entirely.
Key Features
Smart labeling. Each marker displays the spoken word, so your timeline reads like a transcript. No more guessing which marker corresponds to which word.
WhisperX alignment. Standard speech-to-text tools suffer from timestamp drift. WhisperX uses forced phoneme-level alignment via Wav2Vec2 to pin each marker to the exact millisecond a word begins, with zero drift even on long-form audio. Supports 90+ languages.
Multi-layer processing. Select several audio layers and process them all in one operation rather than running each through separately.
Marker sync. Copy generated markers to the current timeline, a parent layer, or a parent timeline with one click, giving you flexibility in how you organize your comp.
Non-destructive placement. Markers go on the audio layer itself, keeping the composition clean.
Who It’s For
Cadence fits anyone animating to a voiceover: motion designers doing kinetic typography, explainer video editors syncing diagram reveals to narration, character animators working on lip-sync timing, or anyone building infographics where stats need to appear exactly when the narrator mentions them. If you spend time rewinding audio to find the right frame, this replaces that process entirely.
Requires an internet connection and a Replicate account (replicate.com) for AI processing.
Pricing
Cadence is a one-time purchase at $29.99, with a free trial available. A separate ongoing cost applies for AI processing through Replicate, billed pay-as-you-go at approximately $0.005 per minute of audio. A $1 minimum deposit gets you started, which covers roughly 3.5 hours of processing.