Skip to main content
New: MedASR v1.0 Released December 2025

MedASR: Open-Source Medical Speech Recognition by Google Health AI

The Conformer-based medical ASR model with 105M parameters. Achieves 4.6% WER on radiology dictation — outperforming Whisper v3 Large by 5x. Built for clinical documentation, healthcare transcription, and medical AI applications.

4.6%
Word Error Rate
105M
Parameters
5,000+
Training Hours
5x
Better than Whisper
MedASR Live Demo — Medical Speech Recognition

Powered by Google Health AI • Open Source • Apache 2.0 License

The Challenge in Healthcare Transcription

Why General Speech Recognition Fails in Medical Settings

Generic ASR models like Siri, Google Speech, and even Whisper struggle with complex medical terminology, leading to critical errors in clinical documentation.

📉
25.3%

Whisper v3 Error Rate

On radiology dictation, Whisper v3 Large produces one error every four words — unacceptable for clinical use where accuracy is critical.

⏱️
16 min

Documentation Time per Patient

Physicians spend an average of 16 minutes on EHR documentation for every patient encounter, contributing to burnout.

💊
70-80%

Drug Name Errors

General ASR misrecognizes medication names like "Lisinopril" and "Losartan" or confuses "ileum" with "ilium" — potentially dangerous mistakes.

💰
$99/mo

Dragon Medical Cost

Commercial solutions like Nuance Dragon Medical One cost $99+ per user per month, creating significant barriers for smaller practices.

The MedASR Solution

Medical Speech Recognition Built for Healthcare

MedASR is purpose-built for clinical documentation with specialized training on 5,000+ hours of real medical audio data.

🎯
4.6%

Word Error Rate

Industry-leading accuracy on radiology dictation, outperforming Gemini 2.5 Pro (10.0%) and Whisper v3 (25.3%)

105M

Parameters

Lightweight Conformer architecture runs on consumer GPUs (RTX 4060 8GB) — 15x smaller than Whisper Large

🔓
100%

Open Source

Apache 2.0 licensed code, HAI-DEF model license. Full transparency, customizable, no vendor lock-in

🏥
5,000+

Training Hours

De-identified physician dictations across radiology, internal medicine, family medicine, and patient conversations

Performance Benchmarks

MedASR vs Whisper vs Gemini: Medical ASR Comparison

Independent benchmarks on medical dictation datasets demonstrate MedASR's superior accuracy for healthcare transcription.

Model RAD-DICT (Radiology) FM-DICT (Family Medicine) Parameters License
MedASR + 6-gram LM 4.6% WER 5.8% WER 105M Open Source
MedASR (Greedy) 6.6% WER 8.2% WER 105M Open Source
Gemini 2.5 Pro 10.0% WER 14.6% WER N/A (API) Proprietary
Gemini 2.5 Flash 12.7% WER 17.3% WER N/A (API) Proprietary
Whisper v3 Large 25.3% WER 32.5% WER 1.55B Open Source
Whisper v3 Large Turbo 28.1% WER 35.2% WER 809M Open Source

Lower WER (Word Error Rate) is better. Benchmarks from Google Health AI Model Card, December 2025.

Medical ASR Use Cases

Built for Real Clinical Workflows

From radiology dictation to physician-patient conversations, MedASR integrates seamlessly into healthcare documentation workflows.

🩻

Radiology Dictation Software

MedASR excels at radiology reporting where terminology precision is critical. From chest X-rays to MRI interpretations, capture complex anatomical terms and measurements with clinical-grade accuracy.

4.6%
WER on RAD-DICT
68%
Time Savings
Get Started

🎤 → 📋

Radiology Report Generation

"Impression: No acute cardiopulmonary process. Stable cardiomegaly."

🩺

Clinical Documentation Automation

Transform physician-patient conversations into structured clinical notes. MedASR captures medication names, dosages, and clinical findings accurately for seamless EHR integration.

5.8%
WER on FM-DICT
16 min
Saved per Patient
Learn More

👨‍⚕️ 💬 🤖

Ambient Clinical Intelligence

"Patient reports taking Metformin 500mg twice daily..."

🤖

MedGemma Integration for SOAP Notes

Combine MedASR speech-to-text with MedGemma's clinical understanding to automatically generate structured SOAP notes from voice recordings. End-to-end documentation automation.

E2E
Pipeline Ready
API
Compatible
View Documentation

🎤 → 📝 → 📄

Voice to SOAP Pipeline

MedASR → MedGemma → Structured Notes

Quick Start Guide

Deploy MedASR in 5 Minutes

Get started with medical speech recognition using Python and Hugging Face Transformers. No complex setup required.

medasr_quickstart.py
# Step 1: Install dependencies
# pip install transformers torch librosa

from transformers import pipeline

# Step 2: Load MedASR model from Hugging Face
medasr = pipeline(
    "automatic-speech-recognition",
    model="google/medasr"
)

# Step 3: Transcribe medical audio (16kHz mono required)
result = medasr("radiology_dictation.wav")

# Output: Accurate medical transcription
print(result["text"])
# "Impression: No acute cardiopulmonary process. 
#  Mild cardiomegaly is stable compared to prior examination."
medasr_advanced.py — Long Audio with Chunking
from transformers import AutoModelForCTC, AutoProcessor
import torch
import librosa

# Load MedASR model and processor
processor = AutoProcessor.from_pretrained("google/medasr")
model = AutoModelForCTC.from_pretrained("google/medasr")

# Load and preprocess audio (must be 16kHz mono)
audio, sr = librosa.load("patient_conversation.wav", sr=16000)

# Process input for MedASR
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")

# Run inference
with torch.no_grad():
    logits = model(**inputs).logits

# Decode predictions
predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(predicted_ids)[0]

print(transcription)
Technical Architecture

Conformer-Based Medical Speech Recognition

MedASR uses Google's Conformer architecture, combining convolutional neural networks for local feature extraction with Transformer attention for global context understanding.

🎤
Audio Input
16kHz Mono
📊
Mel Spectrogram
Feature Extraction
🧠
Conformer Encoder
105M Parameters
🔤
CTC Decoder
Token Output
📝
Medical Text
Transcription
🔄

Conformer Blocks

Combines convolution for local patterns with self-attention for long-range dependencies in medical terminology.

📚

Medical Vocabulary

Pre-trained on 5,000+ hours of physician dictations covering radiology, internal medicine, and family medicine.

CTC Decoding

Connectionist Temporal Classification enables efficient streaming transcription without explicit alignment.

🎯

Language Model Fusion

Optional 6-gram LM reduces WER from 6.6% to 4.6% on radiology dictation tasks.

Deployment Options

Deploy MedASR Anywhere

From local development to enterprise cloud deployment, MedASR supports flexible infrastructure options.

💻

Local Deployment

Maximum privacy and control

On-premise data processing
HIPAA compliance ready
No internet required
RTX 4060 8GB minimum
Setup Guide
🔧

Hugging Face

Developer-friendly quick start

One-line integration
Inference Endpoints
Spaces demo ready
Community support
View on HF Hub
Medical ASR Comparison

MedASR vs Dragon Medical vs Whisper

Compare MedASR with leading medical speech recognition solutions across key features.

Feature MedASR Nuance Dragon Whisper v3 Amazon Transcribe Medical
Radiology WER 4.6% ~5-8% 25.3% ~10-15%
Open Source ✓ Yes ✗ No ✓ Yes ✗ No
Local Deployment ✓ Yes Limited ✓ Yes ✗ No
Medical Training 5,000+ hrs Yes General Yes
Fine-tuning ✓ Full Limited ✓ Full ✗ No
Parameters 105M N/A 1.55B N/A
Cost Free $99/mo+ Free Per-minute
Customization

Fine-Tune MedASR for Your Specialty

Adapt MedASR to your specific medical domain, accent variations, or custom terminology using your own data.

1

Prepare Dataset

Collect and format your medical audio with transcriptions in 16kHz mono format.

2

Configure Training

Set hyperparameters and define your custom vocabulary or specialty terms.

3

Run Fine-tuning

Train on your data using Hugging Face Trainer or custom PyTorch loop.

4

Deploy & Evaluate

Validate on held-out test set and deploy your specialized model.

Enterprise Security

HIPAA-Ready Medical Speech Recognition

Deploy MedASR within your secure infrastructure for complete data privacy. Open-source transparency means no black boxes — audit every line of code.

Local Deployment Data Encryption Auditable Code
🔒
On-Premise Processing

Audio never leaves your network. Full control over patient data.

📋
BAA Available

Business Associate Agreement support for Google Cloud deployments.

🔍
No Logging

Local deployments don't send data to external services.

EHR Integration

Seamless Healthcare System Integration

Connect MedASR to your existing electronic health record systems and clinical workflows via standard APIs.

🏥
Epic
📊
Cerner
💊
Meditech
☁️
Oracle Health
🔗
FHIR API
📡
HL7
🖼️
PACS
📝
PowerScribe
Important Considerations

MedASR Limitations & Best Practices

Transparency about model capabilities ensures appropriate use in clinical settings.

⚠️ English Only

MedASR is currently optimized for English medical terminology. Multi-language support is planned for future releases.

⚠️ Audio Quality

Best results with clear 16kHz mono audio. Background noise and poor microphones may impact accuracy.

⚠️ Non-Standard Formats

Some date/time formats and numerical notations may require post-processing for clinical documentation standards.

⚠️ Not for Direct Clinical Use

MedASR is a research model. Always review transcriptions before clinical documentation. Human oversight required.

Frequently Asked Questions

MedASR FAQ

Common questions about Google's open-source medical speech recognition model.

MedASR is an open-source medical speech recognition model developed by Google Health AI, released in December 2025. It uses Conformer architecture with 105 million parameters, specifically trained on over 5,000 hours of de-identified medical audio including physician dictations and patient conversations. MedASR achieves state-of-the-art accuracy on medical transcription tasks, outperforming both commercial and open-source alternatives.

MedASR significantly outperforms Whisper v3 Large on medical transcription tasks. On radiology dictation (RAD-DICT benchmark), MedASR achieves 4.6% WER compared to Whisper's 25.3% WER — a 5x improvement. This is because MedASR was specifically trained on medical terminology, drug names, and clinical vocabulary, while Whisper is a general-purpose ASR model. Additionally, MedASR is 15x smaller (105M vs 1.55B parameters), making it more efficient for deployment.

MedASR can be deployed in HIPAA-compliant configurations. Since it's open source, you can run it entirely on-premise within your secure infrastructure, ensuring patient audio never leaves your network. For Google Cloud deployments via Vertex AI, Business Associate Agreements (BAA) are available. However, HIPAA compliance depends on your overall implementation — the model itself is just one component of a compliant solution.

Yes, MedASR fully supports fine-tuning on custom datasets. You can adapt it for specific specialties (oncology, cardiology, neurology, etc.), accent variations, or custom terminology. The model works with standard Hugging Face Transformers training workflows. You'll need paired audio-transcription data in 16kHz mono format for fine-tuning.

MedASR's lightweight 105M parameter design runs efficiently on consumer-grade hardware. For GPU inference, an RTX 4060 8GB or equivalent is sufficient. For CPU-only inference, modern multi-core processors (Intel i7/AMD Ryzen 7 or better) work well with slightly longer processing times. The model can also be deployed on Google Cloud TPUs for high-throughput production environments.

MedASR requires 16kHz mono audio input in int16 format. Common audio formats like WAV, MP3, and FLAC can be converted using libraries like librosa or ffmpeg. For best results, use high-quality microphones and minimize background noise. The model supports processing long audio files through chunking with configurable stride settings.

Documentation & Resources

Learn More About MedASR

Comprehensive documentation, tutorials, and research papers to help you get the most from MedASR.

📖

Official Documentation

Complete API reference, installation guides, and best practices from Google Health AI.

Read Docs →
🤗

Hugging Face Model Card

Model weights, pipeline usage, and community discussions on Hugging Face Hub.

View Model →
💻

GitHub Repository

Source code, issues, and contribution guidelines. Star us to stay updated!

View Code →
📓

Colab Notebooks

Interactive tutorials for quick start, fine-tuning, and advanced usage.

Open Colab →
📄

Research Papers

Academic publications on Conformer architecture and medical ASR benchmarks.

Read Papers →
☁️

Vertex AI Model Garden

One-click deployment on Google Cloud with managed infrastructure.

Deploy Now →
Open Source Community

Join the MedASR Community

Contribute to the future of medical speech recognition. Report issues, submit PRs, and collaborate with developers worldwide.

3,500+
GitHub Stars
500+
Contributors
50+
Research Institutions
Pricing

MedASR is Free and Open Source

No license fees, no per-minute charges. Deploy MedASR on your infrastructure at zero software cost.

🆓

Community

Free forever

$0
Full model access
Apache 2.0 code license
Community support
Fine-tuning capability
Get Started Free
🏢

Enterprise

Custom solutions

Contact Us
Custom fine-tuning
Dedicated support
EHR integration help
Training workshops
Contact Sales
What Developers Say

Trusted by Healthcare Innovators

Hear from developers and researchers using MedASR in production.

"MedASR cut our radiology transcription errors by 80%. The difference in medical terminology accuracy compared to Whisper is night and day."

DR
Dr. Rachel Kim
CTO, MedTranscribe AI

"We deployed MedASR on-premise in 2 weeks. The lightweight model runs on our existing GPU servers without any infrastructure changes."

JM
James Martinez
ML Engineer, HealthTech Corp

"Finally an open-source medical ASR that actually works. Fine-tuning for our cardiology department took less than a day."

SL
Dr. Sarah Liu
Research Lead, Stanford Medicine
Changelog

MedASR Latest Updates

Stay informed about new features, improvements, and releases.

NEW December 18, 2025

MedASR v1.0.0 Released

Initial public release with support for radiology dictation, physician-patient conversations, and 6-gram language model integration.

View Release Notes →
DOCS December 18, 2025

Documentation & Tutorials

Comprehensive documentation including quick start guide, fine-tuning tutorials, and deployment options now available.

Read Documentation →
Open Source

Contribute to MedASR

Help improve medical speech recognition for everyone. We welcome contributions from developers, researchers, and healthcare professionals.

Ways to Contribute

  • Report bugs and issues
  • Submit pull requests
  • Improve documentation
  • Share benchmark results
  • Create tutorials and examples
Academic Use

Cite MedASR in Your Research

If you use MedASR in your research, please cite the following:

BibTeX Citation
@software{medasr2025,
  title = {MedASR: Open-Source Medical Speech Recognition},
  author = {Google Health AI},
  year = {2025},
  url = {https://github.com/google-health/medasr},
  note = {Conformer-based ASR model for medical dictation}
}

@article{conformer2020,
  title = {Conformer: Convolution-augmented Transformer for Speech Recognition},
  author = {Gulati, Anmol and others},
  journal = {arXiv preprint arXiv:2005.08100},
  year = {2020}
}

Start Building with MedASR Today

Join thousands of developers using Google's open-source medical speech recognition. Free, accurate, and ready for production.