Model Details

Model Description

A 17.31M parameter multilingual linear projector version 2 trained for automatic speech recognition (ASR) using the SLAM-ASR speechLLM framework. Within this framework, only the linear projector was trained alongside a frozen speech encoder (Whisper-large-v3-turbo) and frozen LLM (EuroLLM-1.7B).

Developed by: SpeechTek Unit at Fondazione Bruno Kessler
Funded by: This work was partially funded by the European Union’s Horizon 2020 project ELOQUENCE (grant 101070558).
Model type: Linear projector in a speechLLM framework
Supported Language(s): English, French, German, Italian, Spanish, Portuguese, Dutch, Polish, Hungarian, Czech, Romanian, Bulgarian, Slovak, Slovene, Serbian, Greek, Danish, Swedish, Finnish, Latvian, Lithuanian, Estonian, Welsh, Maltese, Breton, Irish, Galician, and Basque.
License: CC-BY-4.0

Uses

This model is trained for Automatic Speech Recognition (ASR) and is meant to be the version 2 of the mEUltilingual speechLLM projectors collection.

How to Get Started with the Model

This linear projector checkpoint can be downloaded and utilised for further finetuning or decoding using the shell scripts provided in the SLAM-ASR codebase. Kindly refer to the instructions there for further details.

Whisper-large-v3-turbo and EuroLLM 1.7B must be downloaded before using this linear projector.

Training Details

Training Data

The linear projector was trained with a multilingual dataset covering 28 European languages, that relys on widely used speech datasets: Common Voice 17.0, Fleurs, and Vox-Populi. As the distribution of data across languages is highli imbalanced, we applied a cap of 100K audio samples per language per dataset, discarding any additional samples beyond this threshold. This strategy allowed us to reduce data skew while keeping training computationally feasible. To assess the generalizability and robustness of our models on out-of-domain speech, we used the official evaluation set of the INTERSPEECH 2025 MLC-SLM Challenge.

Training Procedure

The model was trained using the code-based provided by the official SLAM-ASR Github repository with torchrun.
Only the linear projector was trained.
The whisper-large-v3-turbo speech encoder (Whisper-large-v3-turbo) and LLM (EuroLLM-1.7B) were kept frozen, but applying LoRA during training.
A single monolingual prompt has been used for the training: "Transcribe speech to text."
Training was conducted with one NVIDIA Ada Lovelace L40S GPU.

Training Hyperparameters


llm_name	eurollm-1.7b
llm_dim	2048
context_length	4096
encoder_name	whisper
encoder_projector_ds_rate	5
encoder_dim	1280
encoder_projector	linear
input_type	mel
mel_size	128
epochs	3
freeze_encoder	true
freeze_llm	true
warmup_steps	1000
total_steps	100000
lr	1e-4
validation_interval	1000
batch_size_training	4
val_size_training	4
num_workers_dataloader	2
optimizer	AdamW
enable_fdsp	false
enable_ddp	true
use_fp16	true

Evaluation

The model was evaluated using the Word Error Rate (WER) metric from the evaluate library.

Results

Test set	CV	FL	MLC
Spanish	5.22	4.09	21.86
German	7.11	7.79	32.77
Dutch	6.83	8.65	-
Portuguese	9.39	4.86	51.75
Galician	12.70	9.98	-
English	12.94	6.34	46.56
Polish	14.19	8.68	-
Czech	11.16	11.32	-
French	11.24	7.83	42.05
Hungarian	14.59	16.87	-
Italian	6.01	3.32	36.13
Swedish	15.99	10.94	-
Romanian	17.39	9.65	-
Danish	18.81	14.65	-
Basque	19.96	-	-
Bulgarian	24.26	15.20	-
Finnish	22.61	15.29	-
Latvian	27.12	17.23	-
Lithuanian	28.27	24.30	-
Greek	30.06	18.35	-
Slovak	35.84	9.71	-
Slovenian	34.72	19.41	-
Estonian	37.19	19.83	-
Welsh	50.40	39.96	-
Serbian	56.49	27.60	-
Maltese	58.84	44.89	-
Breton	95.68	-	-
Irish	82.23	88.06	-

Acknowledgements

This work was partially funded by the European Union’s Horizon 2020 project ELOQUENCE (grant 101070558).

Downloads last month: -; Downloads are not tracked for this model. How to track

Collection including SpeechTek/mEUltilingual_speechllm_linear_projector_v2

mEUltilingual speechLLM projectors

Collection

Multilingual projectors trained with SLAM-ASR for EU languages. • 2 items • Updated 2 days ago • 5