Speaking Medicine: AI-Powered Documentation for Indonesian Healthcare

Project Owner

BPPT, Solusi247, RS Jantung Harapan Kita, LPDP

Research & Analytics

The Problem: Why Cardiologist Are Drowning in Paperwork

Cardiac hospitals have a serious problem that's making doctors' lives miserable and potentially putting patients at risk. Cardiologist are spending way too much time writing reports instead of actually treating patients. It's like having a firefighter who spends most of their day filling out forms about fires instead of putting them out.

Here's What's Actually Happening:

Doctors Are Buried in Paperwork:

Imagine you're a heart surgeon who just finished a complex 6-hour surgery to save someone's life. Instead of resting or seeing other patients, you now have to spend 2-3 hours writing detailed reports about everything that happened during the surgery. Every medication given, every procedure step, every complication, all written by hand or typed manually.

Memory Isn't Perfect:

After a long, stressful procedure, doctors have to rely on their memory to write reports. Think about trying to remember every detail of a complex task you did hours ago when you were under pressure. Sometimes important details get forgotten or mixed up, which could be dangerous for patient care.

Handwriting Problems:

Ever tried to read a doctor's handwriting? It's famously difficult to read. When nurses or other doctors can't understand handwritten notes, it can lead to medication errors or missed important information about a patient's condition.

Doctors Are Getting Burned Out:

When doctors spend more time on paperwork than with patients, they become frustrated and exhausted. Many heart doctors report feeling like clerks rather than healers. This leads to:

Mistakes due to fatigue and frustration
Less time available for patients who need care
Delayed treatment decisions

Patients Suffer:

When documentation is delayed or incomplete:

Other doctors can't quickly understand a patient's condition
Treatment decisions get postponed
Important medical information might be lost
Patient safety is compromised

The Real Impact:

Picture this: A patient comes to the emergency room with chest pain. The heart doctor who treated them yesterday hasn't finished writing the report yet because they were too busy with other patients. The emergency doctor can't access complete information about the patient's recent treatment, potentially leading to duplicate tests, medication conflicts, or missed diagnoses.

It's a Vicious Cycle:

The more time doctors spend on paperwork, the less time they have for patients. This creates a backlog of both documentation and patient care, making the problem worse every day. It's like trying to dig out of a hole that keeps getting deeper.

Why This Matters:

Heart conditions are often life-threatening emergencies where every minute counts. When doctors are bogged down with administrative tasks instead of focusing on patient care, it's not just an inconvenience , it can literally be a matter of life and death.

This is why hospitals desperately need a solution that lets doctors focus on what they do best: saving lives, not writing reports.

How We Built a Smart Voice System for Cardiologist

The Big Idea: Creating Medical-Specific Voice Recognition

We developed a specialized voice-to-text system for Indonesian medical professionals by creating and evaluating the BPPT Medical Speech Corpus, a comprehensive dataset specifically designed for medical speech recognition. This research was a collaboration between BPPT (Agency for the Assessment and Application of Technology), Solusi247, and Harapan Kita National Heart and Vascular Hospital, funded by Indonesia's LPDP (Endowment Fund for Education).

Step 1: Building the Medical Speech Database

Creating the BPPT Medical Speech Corpus:

We recorded 100 speakers (50 males, 50 females) aged 25 and older, all with medical education backgrounds, for a total of 81.68 hours of medical speech data. This wasn't just random recording - we carefully supervised the process with phoneticians because Indonesia's diverse ethnic groups and professional backgrounds create various accents and dialects that could make the data inconsistent.

What Made Our Dataset Special:

600 unique sentences containing medical terminology
Content organized into categories: medicines, illnesses and diseases, medical treatments, vitamins and minerals, and human organs
Sentences compiled from medical websites to reflect how doctors and medical staff actually speak
2,746-word pronunciation dictionary including medical terms

Why This Approach Worked:

Unlike general speech systems, we focused specifically on medical language patterns. The research showed that medical speech has unique characteristics that require specialized training data.

Step 2: Comparing Different Training Approaches

Three Different Models Tested:

We didn't just build one system - we created and compared three different approaches:

BPPT General Speech Corpus: 50 hours of general Indonesian speech from 200 speakers
BPPT Medical Speech Corpus: Our 81.68-hour medical-specific dataset
BPPT Combined Speech Corpus: Both datasets combined (131.68 hours from 300 speakers)

The Training Process:

We used PyChain technology. A modern, efficient training method that's fully parallelized and designed for end-to-end speech recognition. The system used:

MFCC (Mel-Frequency Cepstrum Coefficients) for audio feature extraction
Language model built from 16.2 million unique sentences from news articles
Comprehensive lexicon with 232,000 general words and 2,000 medical words
TDNN architecture with 6 convolution layers for deep learning

Step 3: Real-World Testing

Creating Realistic Test Conditions:

Instead of testing with the same sentences used for training, we created 100 new test sentences by rephrasing the original training data through insertion, deletion, and substitution of words. Five new speakers (2 males, 3 females) recorded these test sentences to simulate real-world usage.

Measuring Success:

We used Word Error Rate (WER) - the standard measure for speech recognition accuracy. This counts insertions (missed words), deletions (wrong words), and substitutions (almost-correct words) compared to the actual transcript.

Step 4: The Results - What We Discovered

Performance Comparison:

General Speech Model: 9.85% error rate
Medical Speech Model: 8.11% error rate
Combined Model: 6.10% error rate (best performance)

Why the Combined Model Won:

The research revealed something important: while the medical-specific model was better than the general model for medical terms, the combined approach performed best overall. Here's why:

For General Words: The medical-only model struggled with common words like "kanan" (right) and "pembentukan" (establishment), while the combined model handled them perfectly.

For Medical Terms: The general model failed with medical terms like "COVID19," "darah" (blood), and "folat" (folate), but both the medical and combined models succeeded.

The Sweet Spot: The combined model had learned enough general language patterns to handle everyday words while maintaining strong medical vocabulary recognition.

Real Examples from Our Testing

Success Story - Combined Model:

Original: "asam folat berperan dalam memproduksi sel darah merah yang sehat dan nutrisi pertumbuhan otak pada bayi"
Combined Model Result: Perfect transcription
General Model: "alat berat dengan memproduksi sel darah merah yang sehat dan lucy penyembuhan lainnya play" (major errors)
Medical Model: "folat berperan dalam memproduksi sel darah merah yang sehat dan lokasi pertemuan kota kenya way" (some errors)

What Made This Research Special

Addressing Indonesian Medical Language Complexity:

Our system tackled the unique challenge of Indonesian medical terminology, which mixes:

Traditional Indonesian words
English medical terms
Dutch colonial medical terminology
Regional dialect variations

Scientific Rigor:

Unlike many speech recognition projects, we followed strict scientific methodology:

Controlled speaker demographics
Phonetician supervision during recording
Systematic comparison of different approaches
Standardized testing protocols

Real-World Application Focus:

The research was designed specifically for practical medical record transcription, not just academic achievement.

The Impact and Future

Proven Effectiveness:

With a 6.10% error rate, our system achieved accuracy levels suitable for medical documentation, significantly better than the 78-92% accuracy range typically seen in medical speech recognition systems.

Continuous Improvement Path:

The research identified clear directions for enhancement:

Adding more varied medical sentences
Testing with spontaneous speech (how doctors actually talk)
Enriching the medical language model
Developing error analysis for critical medical terms

Foundation for Indonesian Medical AI:

This work created the first comprehensive Indonesian medical speech corpus, providing a foundation for future medical AI development in Indonesia's healthcare system.

The research demonstrated that specialized, domain-specific training combined with general language understanding creates the most effective medical speech recognition systems, a finding that has implications for medical AI development worldwide.