ASR + Invoice Extraction Server
Standalone packaging of Server_conformer.py to transcribe audio and extract invoice JSON from transcript text. This folder now includes a copy of the trained RNNT checkpoint for convenience.
What’s inside
Server_conformer.py,Speech2text.py,InformationExtractor.pychunkformer/codechunkformer-model/requirements.txt
Prerequisites
- Python 3.9+ and a CUDA GPU (required for Qwen invoice extraction; CPU will be extremely slow)
- Hugging Face token with access to the models you use (
HF_TOKEN) - Chunkformer RNNT checkpoint available at
chunkformer-model(copied into this folder). UpdateCHUNKFORMER_MODEL_PATHif you place it elsewhere.
Setup
cd Speech2Invoice
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Configure environment
Create a .env (or export env vars) with at least:
PORT=8000
USE_NGROK=false
HF_TOKEN=your_hf_token_here
CHUNKFORMER_MODEL_PATH=chunkformer-model
LOG_LEVEL=DEBUG
DEBUG=true
# Optional ngrok
NGROK_AUTHTOKEN=
NGROK_REGION=ap
# Optional invoice LLM overrides (defaults are fast)
IE_LLM_MODEL_ID=Qwen/Qwen1.5-7B-Chat
IE_MAX_NEW_TOKENS=256
IE_DO_SAMPLE=false
IE_TEMPERATURE=0.0
IE_TOP_P=0.8
If you move the model elsewhere, set CHUNKFORMER_MODEL_PATH to that directory.
Run
python3 Server_conformer.py
Endpoints
POST /transcribe— multipart/form-data with audio file (wav,mp3,m4a,ogg,webm). Returns JSON withfinal_resultandfull_transcription.POST /ticket— JSON body{"full_transcription": "<text>"}. Returns invoice JSON inferred by Qwen.
Notes
- The invoice extractor requires GPU and HF download on first run. Use smaller models via
IE_LLM_MODEL_IDfor speed. - Model weights for the RNNT checkpoint are included in
chunkformer-model/. For large files, consider git-lfs if you plan to push to a remote.
Contact
For questions or controlled access requests to Speech2Invoice:
- Duc Dat Pham
- Email: ducdatit2002@gmail.com