Whisper v3 Error Rate
On radiology dictation, Whisper v3 Large produces one error every four words — unacceptable for clinical use where accuracy is critical.
The Conformer-based medical ASR model with 105M parameters. Achieves 4.6% WER on radiology dictation — outperforming Whisper v3 Large by 5x. Built for clinical documentation, healthcare transcription, and medical AI applications.
Powered by Google Health AI • Open Source • Apache 2.0 License
Generic ASR models like Siri, Google Speech, and even Whisper struggle with complex medical terminology, leading to critical errors in clinical documentation.
On radiology dictation, Whisper v3 Large produces one error every four words — unacceptable for clinical use where accuracy is critical.
Physicians spend an average of 16 minutes on EHR documentation for every patient encounter, contributing to burnout.
General ASR misrecognizes medication names like "Lisinopril" and "Losartan" or confuses "ileum" with "ilium" — potentially dangerous mistakes.
Commercial solutions like Nuance Dragon Medical One cost $99+ per user per month, creating significant barriers for smaller practices.
MedASR is purpose-built for clinical documentation with specialized training on 5,000+ hours of real medical audio data.
Industry-leading accuracy on radiology dictation, outperforming Gemini 2.5 Pro (10.0%) and Whisper v3 (25.3%)
Lightweight Conformer architecture runs on consumer GPUs (RTX 4060 8GB) — 15x smaller than Whisper Large
Apache 2.0 licensed code, HAI-DEF model license. Full transparency, customizable, no vendor lock-in
De-identified physician dictations across radiology, internal medicine, family medicine, and patient conversations
Independent benchmarks on medical dictation datasets demonstrate MedASR's superior accuracy for healthcare transcription.
| Model | RAD-DICT (Radiology) | FM-DICT (Family Medicine) | Parameters | License |
|---|---|---|---|---|
| MedASR + 6-gram LM | 4.6% WER | 5.8% WER | 105M | Open Source |
| MedASR (Greedy) | 6.6% WER | 8.2% WER | 105M | Open Source |
| Gemini 2.5 Pro | 10.0% WER | 14.6% WER | N/A (API) | Proprietary |
| Gemini 2.5 Flash | 12.7% WER | 17.3% WER | N/A (API) | Proprietary |
| Whisper v3 Large | 25.3% WER | 32.5% WER | 1.55B | Open Source |
| Whisper v3 Large Turbo | 28.1% WER | 35.2% WER | 809M | Open Source |
Lower WER (Word Error Rate) is better. Benchmarks from Google Health AI Model Card, December 2025.
From radiology dictation to physician-patient conversations, MedASR integrates seamlessly into healthcare documentation workflows.
MedASR excels at radiology reporting where terminology precision is critical. From chest X-rays to MRI interpretations, capture complex anatomical terms and measurements with clinical-grade accuracy.
🎤 → 📋
Radiology Report Generation
"Impression: No acute cardiopulmonary process. Stable cardiomegaly."
Transform physician-patient conversations into structured clinical notes. MedASR captures medication names, dosages, and clinical findings accurately for seamless EHR integration.
👨⚕️ 💬 🤖
Ambient Clinical Intelligence
"Patient reports taking Metformin 500mg twice daily..."
Combine MedASR speech-to-text with MedGemma's clinical understanding to automatically generate structured SOAP notes from voice recordings. End-to-end documentation automation.
🎤 → 📝 → 📄
Voice to SOAP Pipeline
MedASR → MedGemma → Structured Notes
Get started with medical speech recognition using Python and Hugging Face Transformers. No complex setup required.
# Step 1: Install dependencies
# pip install transformers torch librosa
from transformers import pipeline
# Step 2: Load MedASR model from Hugging Face
medasr = pipeline(
"automatic-speech-recognition",
model="google/medasr"
)
# Step 3: Transcribe medical audio (16kHz mono required)
result = medasr("radiology_dictation.wav")
# Output: Accurate medical transcription
print(result["text"])
# "Impression: No acute cardiopulmonary process.
# Mild cardiomegaly is stable compared to prior examination."
from transformers import AutoModelForCTC, AutoProcessor
import torch
import librosa
# Load MedASR model and processor
processor = AutoProcessor.from_pretrained("google/medasr")
model = AutoModelForCTC.from_pretrained("google/medasr")
# Load and preprocess audio (must be 16kHz mono)
audio, sr = librosa.load("patient_conversation.wav", sr=16000)
# Process input for MedASR
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")
# Run inference
with torch.no_grad():
logits = model(**inputs).logits
# Decode predictions
predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(predicted_ids)[0]
print(transcription)
MedASR uses Google's Conformer architecture, combining convolutional neural networks for local feature extraction with Transformer attention for global context understanding.
Combines convolution for local patterns with self-attention for long-range dependencies in medical terminology.
Pre-trained on 5,000+ hours of physician dictations covering radiology, internal medicine, and family medicine.
Connectionist Temporal Classification enables efficient streaming transcription without explicit alignment.
Optional 6-gram LM reduces WER from 6.6% to 4.6% on radiology dictation tasks.
From local development to enterprise cloud deployment, MedASR supports flexible infrastructure options.
Compare MedASR with leading medical speech recognition solutions across key features.
| Feature | MedASR | Nuance Dragon | Whisper v3 | Amazon Transcribe Medical |
|---|---|---|---|---|
| Radiology WER | 4.6% | ~5-8% | 25.3% | ~10-15% |
| Open Source | ✓ Yes | ✗ No | ✓ Yes | ✗ No |
| Local Deployment | ✓ Yes | Limited | ✓ Yes | ✗ No |
| Medical Training | 5,000+ hrs | Yes | General | Yes |
| Fine-tuning | ✓ Full | Limited | ✓ Full | ✗ No |
| Parameters | 105M | N/A | 1.55B | N/A |
| Cost | Free | $99/mo+ | Free | Per-minute |
Adapt MedASR to your specific medical domain, accent variations, or custom terminology using your own data.
Collect and format your medical audio with transcriptions in 16kHz mono format.
Set hyperparameters and define your custom vocabulary or specialty terms.
Train on your data using Hugging Face Trainer or custom PyTorch loop.
Validate on held-out test set and deploy your specialized model.
Deploy MedASR within your secure infrastructure for complete data privacy. Open-source transparency means no black boxes — audit every line of code.
Audio never leaves your network. Full control over patient data.
Business Associate Agreement support for Google Cloud deployments.
Local deployments don't send data to external services.
Connect MedASR to your existing electronic health record systems and clinical workflows via standard APIs.
Transparency about model capabilities ensures appropriate use in clinical settings.
MedASR is currently optimized for English medical terminology. Multi-language support is planned for future releases.
Best results with clear 16kHz mono audio. Background noise and poor microphones may impact accuracy.
Some date/time formats and numerical notations may require post-processing for clinical documentation standards.
MedASR is a research model. Always review transcriptions before clinical documentation. Human oversight required.
Common questions about Google's open-source medical speech recognition model.
MedASR is an open-source medical speech recognition model developed by Google Health AI, released in December 2025. It uses Conformer architecture with 105 million parameters, specifically trained on over 5,000 hours of de-identified medical audio including physician dictations and patient conversations. MedASR achieves state-of-the-art accuracy on medical transcription tasks, outperforming both commercial and open-source alternatives.
MedASR significantly outperforms Whisper v3 Large on medical transcription tasks. On radiology dictation (RAD-DICT benchmark), MedASR achieves 4.6% WER compared to Whisper's 25.3% WER — a 5x improvement. This is because MedASR was specifically trained on medical terminology, drug names, and clinical vocabulary, while Whisper is a general-purpose ASR model. Additionally, MedASR is 15x smaller (105M vs 1.55B parameters), making it more efficient for deployment.
MedASR can be deployed in HIPAA-compliant configurations. Since it's open source, you can run it entirely on-premise within your secure infrastructure, ensuring patient audio never leaves your network. For Google Cloud deployments via Vertex AI, Business Associate Agreements (BAA) are available. However, HIPAA compliance depends on your overall implementation — the model itself is just one component of a compliant solution.
Yes, MedASR fully supports fine-tuning on custom datasets. You can adapt it for specific specialties (oncology, cardiology, neurology, etc.), accent variations, or custom terminology. The model works with standard Hugging Face Transformers training workflows. You'll need paired audio-transcription data in 16kHz mono format for fine-tuning.
MedASR's lightweight 105M parameter design runs efficiently on consumer-grade hardware. For GPU inference, an RTX 4060 8GB or equivalent is sufficient. For CPU-only inference, modern multi-core processors (Intel i7/AMD Ryzen 7 or better) work well with slightly longer processing times. The model can also be deployed on Google Cloud TPUs for high-throughput production environments.
MedASR requires 16kHz mono audio input in int16 format. Common audio formats like WAV, MP3, and FLAC can be converted using libraries like librosa or ffmpeg. For best results, use high-quality microphones and minimize background noise. The model supports processing long audio files through chunking with configurable stride settings.
Comprehensive documentation, tutorials, and research papers to help you get the most from MedASR.
Complete API reference, installation guides, and best practices from Google Health AI.
Read Docs →Model weights, pipeline usage, and community discussions on Hugging Face Hub.
View Model →Source code, issues, and contribution guidelines. Star us to stay updated!
View Code →Interactive tutorials for quick start, fine-tuning, and advanced usage.
Open Colab →Academic publications on Conformer architecture and medical ASR benchmarks.
Read Papers →One-click deployment on Google Cloud with managed infrastructure.
Deploy Now →Contribute to the future of medical speech recognition. Report issues, submit PRs, and collaborate with developers worldwide.
No license fees, no per-minute charges. Deploy MedASR on your infrastructure at zero software cost.
Hear from developers and researchers using MedASR in production.
"MedASR cut our radiology transcription errors by 80%. The difference in medical terminology accuracy compared to Whisper is night and day."
"We deployed MedASR on-premise in 2 weeks. The lightweight model runs on our existing GPU servers without any infrastructure changes."
"Finally an open-source medical ASR that actually works. Fine-tuning for our cardiology department took less than a day."
Stay informed about new features, improvements, and releases.
Initial public release with support for radiology dictation, physician-patient conversations, and 6-gram language model integration.
View Release Notes →Comprehensive documentation including quick start guide, fine-tuning tutorials, and deployment options now available.
Read Documentation →Help improve medical speech recognition for everyone. We welcome contributions from developers, researchers, and healthcare professionals.
If you use MedASR in your research, please cite the following:
@software{medasr2025,
title = {MedASR: Open-Source Medical Speech Recognition},
author = {Google Health AI},
year = {2025},
url = {https://github.com/google-health/medasr},
note = {Conformer-based ASR model for medical dictation}
}
@article{conformer2020,
title = {Conformer: Convolution-augmented Transformer for Speech Recognition},
author = {Gulati, Anmol and others},
journal = {arXiv preprint arXiv:2005.08100},
year = {2020}
}
Join thousands of developers using Google's open-source medical speech recognition. Free, accurate, and ready for production.