phase — 01100ms latency
capture
Record a phrase. The signal is segmented by Silero VAD, windowed to match the reference length.
audio → vad → window
slaytheaccent is a phoneme-level pronunciation coach for L2 English speakers. Three acoustic models align your recording against a native reference. You see exactly which sounds to fix — and which to leave alone.
[⏎] enter · [⌘K] docsRecord a phrase. The signal is segmented by Silero VAD, windowed to match the reference length.
Three wav2vec2 variants emit per-frame phoneme distributions. A DP alignment against the reference phoneme sequence yields a score per phoneme.
The weakest phonemes surface in the drill queue. You shadow the native, re-record, iterate until the score moves.
We don’t tell you “80%”. We tell you the /ɾ/ in water was realized as /t/, the /ɚ/ lost its rhotacism, and the /w/ was perfect. Actionable. Repeatable.