Blog

What makes OET Speaking hard

OET Speaking is a structured, time-pressured performance scored on 9 criteria. What makes it hard, and why fluent candidates still fail.

May 6, 20265 min readBy OET Live

If you came to OET Speaking thinking "I speak English fluently, I will be fine" and walked out of your first sitting with a C+, you are in extremely good company.

Speaking is the sub-test where the most candidates lose the most marks, and the gap between expected and actual performance is wider than for any other sub-test. The reason isn't English. It's the cognitive load and the rubric mismatch.

This post breaks down the four things that make OET Speaking specifically hard — even for people whose English is great in every other context.

1. Nine criteria, scored simultaneously, in real time

Most language exams score Speaking on 3–5 criteria. OET scores on 9 — four linguistic, five clinical-communication. Each one is independent. Each one is 0–5.

The cognitive load is real. While you're talking, you have to:

Pronounce intelligibly and maintain fluency and pick appropriate vocabulary and stay grammatically clean
Build rapport and elicit the patient's perspective and structure the consultation and gather information and give information clearly

Over-rotate on grammar accuracy and you lose marks on empathy. Over-rotate on empathy and you forget to cover a task cue. Score well on all four linguistic criteria but skip the patient perspective and your band drops.

The 9-criteria scoring is what makes OET Speaking specifically a clinical-communication test rather than a general English test. It rewards exactly the same skills you'd be assessed on by a clinical educator watching you take a real history — except you have to demonstrate all of them in 5 minutes with a stranger.

For the full rubric breakdown, see how OET Speaking is actually scored.

2. Five minutes is shorter than it sounds

A real consultation is 15–30 minutes. The OET role-play compresses the same communication arc into 5. You have to:

Greet and orient (~15 seconds)
Build rapport and elicit concerns (~1 minute)
Gather information / take history (~1.5 minutes)
Provide information or address the patient's concern (~1.5 minutes)
Wrap up and close (~30 seconds)

If you spend an extra 30 seconds on greeting, you eat into history-taking. If you spend 2 minutes giving information, you skip the wrap-up. Time-budgeting under pressure is a skill in itself, and it's a skill you only develop with timed practice — not with reading about it.

3. You read the role-card three minutes before performing

The role-card lists the scenario, the patient's profile, and 4–5 task cues. You get three minutes to read it, take notes, and ask the interlocutor any clarifying questions.

That's it. The card is gone.

What this rewards is fast scenario internalisation: the ability to convert a one-page brief into a five-minute conversation plan in under three minutes, then execute it. Candidates who try to memorise the card verbatim run out of working memory. Candidates who skim and miss a task cue lose the corresponding mark.

The strategy that works: read the card once for setting + patient, then read the tasks. For each task, mentally rehearse a 30-second "opener". Don't memorise — internalise.

4. The patient is unpredictable in scripted ways

The interlocutor is a trained actor following a script. They have planned responses, planned objections, and planned moments of disclosure. They will:

Resist your initial suggestion (because the rubric wants to see you handle resistance)
Disclose a relevant detail late (because the rubric wants to see you follow up on cues)
Express an emotion (because the rubric wants to see empathy markers)

A candidate who's been drilling textbook role-plays without practicing live can be thrown by the interlocutor's planned wobble and lose 20–30 seconds finding their footing. That's 7–10% of the role-play time, and it almost always shows up as a fluency knock and a structure knock.

Why fluent English speakers still fail

The story we hear most often:

"I've been speaking English at work for ten years. I scored band 8 on IELTS Speaking five years ago when I took it as a backup. I just got C+ on OET Speaking. What happened?"

Three things, usually:

The candidate's English is conversational but not clinical-rubric-aligned. They know how to take a history; they don't know how to take a history while also signposting transitions and explicitly eliciting ideas/concerns/expectations.
The candidate trained alone. Self-practice without a partner who plays the patient is roughly half the value. The interlocutor's planned wobble exposes weaknesses self-practice cannot.
The candidate didn't drill phrase banks for the clinical criteria. Empathic openers, signposting phrases, comprehension checks — these are formulaic in OET Speaking. If you don't have a stock of them ready, you spend cognitive load inventing them mid-role-play.

How to make it less hard

Three high-leverage practices:

Build phrase banks for the 5 clinical criteria. 5–10 phrases per criterion. Drill them until they're automatic. Examples in our Speaking sub-test guide.
Do timed role-plays with a partner who plays the patient. This is non-negotiable. Reading about the format is not preparation; performing it is. If you don't have a partner, this is what OET Live was built for.
Get per-criterion feedback. "You could have shown more empathy" is not actionable. "At minute 2:30 the patient said they were nervous about taking medication and you moved on to explaining the next steps without acknowledging the nervousness" — that's actionable.

If you're stuck at C+ in Speaking and the feedback you've been getting is qualitative, join the OET Live waitlist. The per-criterion + per-task quote-level feedback is the part of practice that's hardest to do alone.