moinbach7's picture
Update README.md
220c6b0 verified
---
license: gpl-3.0
language:
- en
pipeline_tag: text-generation
tags:
- code
- asr
- inverse text normalization
- transformers
datasets:
- text-normalization-challenge-english-language
---
# ASR Inverse Text Normalization
This repository provides a **fine-tuned BART model** for the task of **ASR Inverse Text Normalization (ITN)**.
The goal is to transform raw, unnormalized ASR transcripts into properly formatted text.
---
## Model Overview
**BART (Bidirectional and Auto-Regressive Transformers)** is a transformer-based model introduced by Facebook AI Research.
It is designed for both text understanding and generation tasks.
- **Architecture**: Encoder–Decoder Transformer with self-attention.
- **Pretraining objective**: Reconstruct original text from corrupted/noisy versions.
- **Applications**: Summarization, machine translation, question answering, and text normalization.
For this project:
- Base model: `facebook/bart-base`
- Training setup: Treated as a **sequence-to-sequence** problem
- Dataset: [Text Normalization Challenge - English Language (Kaggle)](https://www.kaggle.com/competitions/text-normalization-challenge-english-language/data)
---
## Intended Use
The model can be applied directly to **normalize ASR outputs** in speech-to-text pipelines.
---
## Quickstart
```python
from transformers import pipeline
# Load pipeline
generator = pipeline(model="pavanBuduguppa/asr_inverse_text_normalization")
# Run inference
result = generator("my c v v for my card is five six seven and it expires on november twenty three")
print(result)