# Interface Workflow
```mermaid
stateDiagram-v2
[*] --> LanguageSelection: User opens interface
state "Language & Dataset Setup" as LangSetup {
[*] --> LanguageSelection
LanguageSelection --> LoadPhrases: Select language
LoadPhrases --> DisplayPhrases: Load from NVIDIA Granary
DisplayPhrases --> RecordingInterface: Show phrases & recording UI
state RecordingInterface {
[*] --> ShowInitialRows: Display first 10 phrases
ShowInitialRows --> RecordAudio: User can record audio
RecordAudio --> AddMoreRows: Optional - add 10 more rows
AddMoreRows --> RecordAudio
}
}
RecordingInterface --> DatasetCreation: User finishes recording
state "Dataset Creation Options" as DatasetCreation {
[*] --> FromRecordings: Create from recorded audio
[*] --> FromUploads: Upload existing files
FromRecordings --> ProcessRecordings: Save WAV files + transcripts
FromUploads --> ProcessUploads: Process uploaded files + transcripts
ProcessRecordings --> CreateJSONL: Generate JSONL dataset
ProcessUploads --> CreateJSONL
CreateJSONL --> DatasetReady: Dataset saved locally
}
DatasetCreation --> TrainingConfiguration: Dataset ready
state "Training Setup" as TrainingConfiguration {
[*] --> BasicSettings: Model, LoRA/full, batch size
[*] --> AdvancedSettings: Learning rate, epochs, LoRA params
BasicSettings --> ConfigureDeployment: Repo name, push options
AdvancedSettings --> ConfigureDeployment
ConfigureDeployment --> StartTraining: All settings configured
}
TrainingConfiguration --> TrainingProcess: Start training
state "Training Process" as TrainingProcess {
[*] --> InitializeTrackio: Setup experiment tracking
InitializeTrackio --> RunTrainingScript: Execute train.py or train_lora.py
RunTrainingScript --> StreamLogs: Show real-time training logs
StreamLogs --> MonitorProgress: Track metrics & checkpoints
MonitorProgress --> TrainingComplete: Training finished
MonitorProgress --> HandleErrors: Training failed
HandleErrors --> RetryOrExit: User can retry or exit
}
TrainingProcess --> PostTraining: Training complete
state "Post-Training Actions" as PostTraining {
[*] --> PushToHub: Push model to HF Hub
[*] --> GenerateModelCard: Create model card
[*] --> DeployDemoSpace: Deploy interactive demo
PushToHub --> ModelPublished: Model available on HF Hub
GenerateModelCard --> ModelDocumented: Model card created
DeployDemoSpace --> DemoReady: Demo space deployed
}
PostTraining --> [*]: Process complete
%% Alternative paths
DatasetCreation --> PushDatasetOnly: Skip training, push dataset only
PushDatasetOnly --> DatasetPublished: Dataset on HF Hub
%% Error handling
TrainingProcess --> ErrorRecovery: Handle training errors
ErrorRecovery --> RetryTraining: Retry with different settings
RetryTraining --> TrainingConfiguration
%% Styling and notes
note right of LanguageSelection : User selects language for
authentic phrases from
NVIDIA Granary dataset
note right of RecordingInterface : Users record themselves
reading displayed phrases
note right of DatasetCreation : JSONL format: {"audio_path": "...", "text": "..."}
note right of TrainingConfiguration : Configure LoRA parameters,
learning rate, epochs, etc.
note right of TrainingProcess : Real-time log streaming
with Trackio integration
note right of PostTraining : Automated deployment
pipeline
```
## Interface Workflow Overview
This diagram illustrates the complete user journey through the Voxtral ASR Fine-tuning interface. The workflow is designed to be intuitive and guide users through each step of the fine-tuning process.
### Key Workflow Stages
#### 1. Language & Dataset Setup
- **Language Selection**: Users choose from 25+ European languages supported by NVIDIA Granary
- **Phrase Loading**: System loads authentic, high-quality phrases in the selected language
- **Recording Interface**: Dynamic interface showing phrases with audio recording components
- **Progressive Disclosure**: Users can add more rows as needed (up to 100 recordings)
#### 2. Dataset Creation
- **From Recordings**: Process microphone recordings into WAV files and JSONL dataset
- **From Uploads**: Handle existing WAV/FLAC files with manual transcripts
- **JSONL Format**: Standard format with `audio_path` and `text` fields
- **Local Storage**: Datasets stored in `datasets/voxtral_user/` directory
#### 3. Training Configuration
- **Basic Settings**: Model selection, LoRA vs full fine-tuning, batch size
- **Advanced Settings**: Learning rate, epochs, gradient accumulation
- **LoRA Parameters**: r, alpha, dropout, audio tower freezing options
- **Repository Setup**: Model naming and Hugging Face Hub integration
#### 4. Training Process
- **Trackio Integration**: Automatic experiment tracking setup
- **Script Execution**: Calls appropriate training script (`train.py` or `train_lora.py`)
- **Log Streaming**: Real-time display of training progress and metrics
- **Error Handling**: Graceful handling of training failures with retry options
#### 5. Post-Training Actions
- **Model Publishing**: Automatic push to Hugging Face Hub
- **Model Card Generation**: Automated creation using `generate_model_card.py`
- **Demo Deployment**: One-click deployment of interactive demo spaces
### Alternative Paths
#### Dataset-Only Workflow
- Users can create and publish datasets without training models
- Useful for dataset curation and sharing
#### Error Recovery
- Training failures trigger error recovery flows
- Users can retry with modified parameters
- Comprehensive error logging and debugging information
### Technical Integration Points
#### External Services
- **NVIDIA Granary**: Source of high-quality multilingual ASR data
- **Hugging Face Hub**: Model and dataset storage and sharing
- **Trackio Spaces**: Experiment tracking and visualization
#### Script Integration
- **interface.py**: Main Gradio application orchestrating the workflow
- **train.py/train_lora.py**: Core training scripts with Trackio integration
- **push_to_huggingface.py**: Model/dataset publishing
- **deploy_demo_space.py**: Automated demo deployment
- **generate_model_card.py**: Model documentation generation
### User Experience Features
#### Progressive Interface Reveal
- Interface components are revealed as users progress through workflow
- Reduces cognitive load and guides users step-by-step
#### Real-time Feedback
- Live log streaming during training
- Progress indicators and status updates
- Immediate feedback on dataset creation and validation
#### Flexible Input Methods
- Support for both live recording and file uploads
- Multiple language options for diverse user needs
- Scalable recording interface (10-100 samples)
See also:
- [Architecture Overview](architecture.md)
- [Training Pipeline](training-pipeline.md)
- [Data Flow](data-flow.md)