Spaces:
Running
Running
# Voxtral ASR Fine-tuning Architecture | |
```mermaid | |
graph TB | |
%% User Interface Layer | |
subgraph "User Interface" | |
UI[Gradio Web Interface<br/>interface.py] | |
REC[Audio Recording<br/>Microphone Input] | |
UP[File Upload<br/>WAV/FLAC files] | |
end | |
%% Data Processing Layer | |
subgraph "Data Processing" | |
DP[Data Processing<br/>Audio resampling<br/>JSONL creation] | |
DS[Dataset Management<br/>NVIDIA Granary<br/>Local datasets] | |
end | |
%% Training Layer | |
subgraph "Training Pipeline" | |
TF[Full Fine-tuning<br/>scripts/train.py] | |
TL[LoRA Fine-tuning<br/>scripts/train_lora.py] | |
TI[Trackio Integration<br/>Experiment Tracking] | |
end | |
%% Model Management Layer | |
subgraph "Model Management" | |
MM[Model Management<br/>Hugging Face Hub<br/>Local storage] | |
MC[Model Card Generation<br/>scripts/generate_model_card.py] | |
end | |
%% Deployment Layer | |
subgraph "Deployment & Demo" | |
DEP[Demo Space Deployment<br/>scripts/deploy_demo_space.py] | |
HF[HF Spaces<br/>Interactive Demo] | |
end | |
%% External Services | |
subgraph "External Services" | |
HFH[Hugging Face Hub<br/>Models & Datasets] | |
GRAN[NVIDIA Granary<br/>Multilingual ASR Dataset] | |
TRACK[Trackio Spaces<br/>Experiment Tracking] | |
end | |
%% Data Flow | |
UI --> DP | |
REC --> DP | |
UP --> DP | |
DP --> DS | |
DS --> TF | |
DS --> TL | |
TF --> TI | |
TL --> TI | |
TF --> MM | |
TL --> MM | |
MM --> MC | |
MM --> DEP | |
DEP --> HF | |
DS -.-> HFH | |
MM -.-> HFH | |
TI -.-> TRACK | |
DS -.-> GRAN | |
%% Styling | |
classDef interface fill:#e1f5fe,stroke:#01579b,stroke-width:2px | |
classDef processing fill:#f3e5f5,stroke:#4a148c,stroke-width:2px | |
classDef training fill:#e8f5e8,stroke:#1b5e20,stroke-width:2px | |
classDef management fill:#fff3e0,stroke:#e65100,stroke-width:2px | |
classDef deployment fill:#fce4ec,stroke:#880e4f,stroke-width:2px | |
classDef external fill:#f5f5f5,stroke:#424242,stroke-width:2px | |
class UI,REC,UP interface | |
class DP,DS processing | |
class TF,TL,TI training | |
class MM,MC management | |
class DEP,HF deployment | |
class HFH,GRAN,TRACK external | |
``` | |
## Architecture Overview | |
This diagram shows the high-level architecture of the Voxtral ASR Fine-tuning application. The system is organized into several layers: | |
### 1. User Interface Layer | |
- **Gradio Web Interface**: Main user-facing application built with Gradio | |
- **Audio Recording**: Microphone input for recording speech samples | |
- **File Upload**: Support for uploading existing WAV/FLAC audio files | |
### 2. Data Processing Layer | |
- **Data Processing**: Audio resampling to 16kHz, JSONL dataset creation | |
- **Dataset Management**: Integration with NVIDIA Granary dataset and local dataset handling | |
### 3. Training Layer | |
- **Full Fine-tuning**: Complete model fine-tuning using `scripts/train.py` | |
- **LoRA Fine-tuning**: Parameter-efficient fine-tuning using `scripts/train_lora.py` | |
- **Trackio Integration**: Experiment tracking and logging | |
### 4. Model Management Layer | |
- **Model Management**: Local storage and Hugging Face Hub integration | |
- **Model Card Generation**: Automated model card creation | |
### 5. Deployment Layer | |
- **Demo Space Deployment**: Automated deployment to Hugging Face Spaces | |
- **Interactive Demo**: Live demo interface for testing fine-tuned models | |
### 6. External Services | |
- **Hugging Face Hub**: Model and dataset storage and sharing | |
- **NVIDIA Granary**: High-quality multilingual ASR dataset | |
- **Trackio Spaces**: Experiment tracking and visualization | |
## Key Workflows | |
1. **Dataset Creation**: Users can record audio or upload files → processed into JSONL format | |
2. **Model Training**: Datasets fed into training scripts with experiment tracking | |
3. **Model Publishing**: Trained models pushed to HF Hub with generated model cards | |
4. **Demo Deployment**: Automated deployment of interactive demos to HF Spaces | |
See also: | |
- [Interface Workflow](interface-workflow.md) | |
- [Training Pipeline](training-pipeline.md) | |
- [Deployment Pipeline](deployment-pipeline.md) | |
- [Data Flow](data-flow.md) | |