Blog

How AI patients change OET practice

Live-audio AI patients have rewritten OET Speaking practice. What they do well, where they still need a human, and how we close the gaps.

May 3, 20265 min readBy OET Live

A year ago, an AI patient that could hold a five-minute conversation in real time, stay in clinical character, and respond to what you actually said — not what a transcription thought you said — was not technically possible. The latency was wrong. The model behaviour was wrong. The audio path was wrong.

That changed in the last year. A new generation of live, native-audio AI models closed the latency gap, and the conversational quality crossed a threshold where solo OET Speaking practice can be genuinely useful.

This post is the honest version of what AI patients do well, what they still get wrong, and where we think the line is between "AI is enough" and "you need a human coach."

What AI patients do well

1. Tempo

A native-audio model can interject naturally. It pauses where a human would pause. It interrupts when a patient would interrupt. The conversational tempo is preserved.

This matters more than it sounds. Earlier-generation Speaking practice tools used transcription → text-to-text → text-to-speech pipelines with 800–1500ms latency. That latency breaks the conversation rhythm and makes fluency practice useless — you can't practice tempo against a partner that doesn't keep tempo.

2. Availability

The single biggest practical advantage. You can do a 5-minute role-play at 11pm on a Tuesday after your shift, or 6am before it. There is no scheduling friction, no calendar negotiation, no time zone problem. For migrating healthcare professionals on shift work, this is the difference between practicing five times a week and practicing twice a month.

3. Patience

An AI patient will let you do twelve role-plays in a row, scored each time, without judgement, without time pressure on its end, without needing a coffee break. You can drill one specific scenario type until it's automatic.

4. Consistency

If you're testing "what happens if I open with [X] phrase vs [Y]", an AI patient can give you something close to two comparable runs. A human role-play partner cannot — they remember the previous attempt and adjust.

5. Scoring at scale

Once you have a role-play transcript, scoring it on the 9 OET criteria is a tractable problem. Examiner-calibrated rubric prompts can grade thousands of role-plays per day. This is what makes per-session feedback economically viable. Human-only scoring at this scale would price the practice loop out of reach for most candidates.

Where AI patients still fall short

1. Reading subtle emotional cues

A skilled human role-play partner will play with their body language — shift discomfort to their shoulders, look down when discussing a sensitive topic, pause longer than the script suggests. An audio-only AI patient cannot deliver those cues.

For most of the OET rubric this doesn't matter — the test is on phone-like audio and the examiner scores from audio. But if you're training for the actual clinical interaction downstream (which is what the test is a proxy for), the missing emotional channel is real.

2. Truly improvised conversation

AI patients work best inside a defined scenario with task cues. They handle planned wobbles well. They handle genuine off-script improvisation less well — if you ask them something completely outside the role-card brief, they'll often default to "I don't know" rather than improvise a coherent patient response.

This is actually fine for OET prep — the exam itself is scripted role-plays — but it limits AI patients for clinical-communication practice more broadly.

3. Cultural and linguistic nuance

A British interlocutor will react differently to "Are you alright with that?" than an Australian one. An AI patient currently averages across a broad training set. We can prompt it to lean a specific cultural register, but it won't catch the specific micro-cues that mark a phrase as out-of-register in a particular country.

The OET exam is country-agnostic, so this doesn't affect scoring. It does mean AI practice won't fully prepare you for the cultural-fit dimension of working in the destination country.

4. Long-form coaching

A great human OET coach does something AI doesn't yet do: they remember the candidate, notice patterns across sessions, push the candidate's weakest sub-skill specifically, and adjust the curriculum over time.

We're working on the structural version of this — a recommender that targets weak criteria and under-practiced topics — but it's not the same as a coach who notices that you've been holding back on empathy markers ever since your divorce.

Where the line is

A reasonable default rule:

Use an AI patient for volume practice. Use a human coach for diagnosis and direction.

In a typical 8-week prep cycle:

1–2 sessions with a human coach to diagnose your weakest criteria and set a study plan
20–40 sessions with an AI patient to drill the practice volume
1 final session with a human coach to validate readiness

That's roughly 1/20 the cost of all-human practice while preserving the parts of human coaching that AI doesn't replicate.

What we're doing about the gaps

Three things in progress at OET Live:

Pattern detection across sessions. Identify candidates whose understanding the patient perspective drops specifically when the scenario involves bad-news delivery, vs. routine consultation. Surface the pattern.
Per-profession nuance. A nursing role-play has different register than a veterinary role-play. The patient script should reflect that. Currently it's broadly profession-aware; we're tightening per-scenario.
A handoff path to human coaches. For candidates approaching the test, optional 1:1 sessions with curated human coaches who can review your per-session data and run the high-judgement parts that AI doesn't replicate yet.

If those are the practice loops you want, join the waitlist. We'll write more about the human-coach handoff once we've shipped it.