--- license: gpl-3.0 language: - en pipeline_tag: text-generation tags: - code - asr - inverse text normalization - transformers datasets: - text-normalization-challenge-english-language --- # ASR Inverse Text Normalization This repository provides a **fine-tuned BART model** for the task of **ASR Inverse Text Normalization (ITN)**. The goal is to transform raw, unnormalized ASR transcripts into properly formatted text. --- ## Model Overview **BART (Bidirectional and Auto-Regressive Transformers)** is a transformer-based model introduced by Facebook AI Research. It is designed for both text understanding and generation tasks. - **Architecture**: Encoder–Decoder Transformer with self-attention. - **Pretraining objective**: Reconstruct original text from corrupted/noisy versions. - **Applications**: Summarization, machine translation, question answering, and text normalization. For this project: - Base model: `facebook/bart-base` - Training setup: Treated as a **sequence-to-sequence** problem - Dataset: [Text Normalization Challenge - English Language (Kaggle)](https://www.kaggle.com/competitions/text-normalization-challenge-english-language/data) --- ## Intended Use The model can be applied directly to **normalize ASR outputs** in speech-to-text pipelines. --- ## Quickstart ```python from transformers import pipeline # Load pipeline generator = pipeline(model="pavanBuduguppa/asr_inverse_text_normalization") # Run inference result = generator("my c v v for my card is five six seven and it expires on november twenty three") print(result)