File size: 1,608 Bytes
a7e2179
 
 
 
220c6b0
a7e2179
 
 
 
220c6b0
a7e2179
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
---
license: gpl-3.0
language:
  - en
pipeline_tag: text-generation
tags:
  - code
  - asr
  - inverse text normalization
  - transformers
datasets:
  - text-normalization-challenge-english-language
---

# ASR Inverse Text Normalization

This repository provides a **fine-tuned BART model** for the task of **ASR Inverse Text Normalization (ITN)**.  
The goal is to transform raw, unnormalized ASR transcripts into properly formatted text.

---

## Model Overview

**BART (Bidirectional and Auto-Regressive Transformers)** is a transformer-based model introduced by Facebook AI Research.  
It is designed for both text understanding and generation tasks.

- **Architecture**: Encoder–Decoder Transformer with self-attention.  
- **Pretraining objective**: Reconstruct original text from corrupted/noisy versions.  
- **Applications**: Summarization, machine translation, question answering, and text normalization.  

For this project:
- Base model: `facebook/bart-base`  
- Training setup: Treated as a **sequence-to-sequence** problem  
- Dataset: [Text Normalization Challenge - English Language (Kaggle)](https://www.kaggle.com/competitions/text-normalization-challenge-english-language/data)  
---

## Intended Use

The model can be applied directly to **normalize ASR outputs** in speech-to-text pipelines.  

---

## Quickstart

```python
from transformers import pipeline

# Load pipeline
generator = pipeline(model="pavanBuduguppa/asr_inverse_text_normalization")

# Run inference
result = generator("my c v v for my card is five six seven and it expires on november twenty three")
print(result)