moinbach7
/

asr_en_text_normalization

Text Generation

text2text-generation

inverse text normalization

Model card Files Files and versions

asr_en_text_normalization / README.md

moinbach7's picture

Update README.md

220c6b0 verified about 1 month ago

|

history blame contribute delete

1.61 kB

	---
	license: gpl-3.0
	language:
	- en
	pipeline_tag: text-generation
	tags:
	- code
	- asr
	- inverse text normalization
	- transformers
	datasets:
	- text-normalization-challenge-english-language
	---

	# ASR Inverse Text Normalization

	This repository provides a fine-tuned BART model for the task of ASR Inverse Text Normalization (ITN).
	The goal is to transform raw, unnormalized ASR transcripts into properly formatted text.

	---

	## Model Overview

	BART (Bidirectional and Auto-Regressive Transformers) is a transformer-based model introduced by Facebook AI Research.
	It is designed for both text understanding and generation tasks.

	- Architecture: Encoder–Decoder Transformer with self-attention.
	- Pretraining objective: Reconstruct original text from corrupted/noisy versions.
	- Applications: Summarization, machine translation, question answering, and text normalization.

	For this project:
	- Base model: `facebook/bart-base`
	- Training setup: Treated as a sequence-to-sequence problem
	- Dataset: [Text Normalization Challenge - English Language (Kaggle)](https://www.kaggle.com/competitions/text-normalization-challenge-english-language/data)
	---

	## Intended Use

	The model can be applied directly to normalize ASR outputs in speech-to-text pipelines.

	---

	## Quickstart

	```python
	from transformers import pipeline

	# Load pipeline
	generator = pipeline(model="pavanBuduguppa/asr_inverse_text_normalization")

	# Run inference
	result = generator("my c v v for my card is five six seven and it expires on november twenty three")
	print(result)