|
--- |
|
license: gpl-3.0 |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
tags: |
|
- code |
|
- asr |
|
- inverse text normalization |
|
- transformers |
|
datasets: |
|
- text-normalization-challenge-english-language |
|
--- |
|
|
|
# ASR Inverse Text Normalization |
|
|
|
This repository provides a **fine-tuned BART model** for the task of **ASR Inverse Text Normalization (ITN)**. |
|
The goal is to transform raw, unnormalized ASR transcripts into properly formatted text. |
|
|
|
--- |
|
|
|
## Model Overview |
|
|
|
**BART (Bidirectional and Auto-Regressive Transformers)** is a transformer-based model introduced by Facebook AI Research. |
|
It is designed for both text understanding and generation tasks. |
|
|
|
- **Architecture**: Encoder–Decoder Transformer with self-attention. |
|
- **Pretraining objective**: Reconstruct original text from corrupted/noisy versions. |
|
- **Applications**: Summarization, machine translation, question answering, and text normalization. |
|
|
|
For this project: |
|
- Base model: `facebook/bart-base` |
|
- Training setup: Treated as a **sequence-to-sequence** problem |
|
- Dataset: [Text Normalization Challenge - English Language (Kaggle)](https://www.kaggle.com/competitions/text-normalization-challenge-english-language/data) |
|
--- |
|
|
|
## Intended Use |
|
|
|
The model can be applied directly to **normalize ASR outputs** in speech-to-text pipelines. |
|
|
|
--- |
|
|
|
## Quickstart |
|
|
|
```python |
|
from transformers import pipeline |
|
|
|
# Load pipeline |
|
generator = pipeline(model="pavanBuduguppa/asr_inverse_text_normalization") |
|
|
|
# Run inference |
|
result = generator("my c v v for my card is five six seven and it expires on november twenty three") |
|
print(result) |
|
|
|
|