● LIVESR 16 000 hz · 23 dialects · wav2vec2-xlsr
v0.1-betanode: sta-mtl-01■ rec ready
01system · slaytheaccent2026-04-17

the mouth
is an instrument.
we tune it.

slaytheaccent is a phoneme-level pronunciation coach for L2 English speakers. Three acoustic models align your recording against a native reference. You see exactly which sounds to fix — and which to leave alone.

initialize session[⏎] enter · [⌘K] docs
00:00“water” · /w ɔ ɾ ɚ/00:00.54
02phoneme · matrix36/44
p
b
t
d
k
g
f
v
θ
ð
s
z
ʃ
ʒ
h
m
n
ŋ
l
ɫ
ɹ
w
j
ɾ
i
ɪ
e
ɛ
æ
ʌ
ə
ɚ
u
ʊ
o
ɔ
consonants + vowels◷ hover to preview
phonemes ·ipa
44
trained on mixed corpus
dialects
23
en-us · en-gb · ...
assess · median
540ms
per 3-sec utterance
learners · beta
1 402
from 41 L1 backgrounds
phase — 01100ms latency

capture

Record a phrase. The signal is segmented by Silero VAD, windowed to match the reference length.

audio → vad → window
phase — 023-model vote

score

Three wav2vec2 variants emit per-frame phoneme distributions. A DP alignment against the reference phoneme sequence yields a score per phoneme.

frames → align → score
phase — 03adaptive

drill

The weakest phonemes surface in the drill queue. You shadow the native, re-record, iterate until the score moves.

delta → queue → repeat
output · sample

the feedback is
a diagnosis,
not a grade.

We don’t tell you “80%”. We tell you the /ɾ/ in water was realized as /t/, the /ɚ/ lost its rhotacism, and the /w/ was perfect. Actionable. Repeatable.

// POST /v0/assess { "word": "water", "assess_score": 0.74, "phonemes": [ { expected: "w", observed: "w", score: 1.0 }, { expected: "ɾ", observed: "t", score: 0.12 }, { expected: "ɚ", observed: "ə", score: 0.55 } ] }

boot the coach.

start session