Commit 
							
							·
						
						76dc1b0
	
1
								Parent(s):
							
							daf13ec
								
add readme
Browse files
    	
        README.md
    CHANGED
    
    | 
         @@ -1,6 +1,26 @@ 
     | 
|
| 1 | 
         
             
            ---
         
     | 
| 
         | 
|
| 2 | 
         
             
            tags:
         
     | 
| 3 | 
         
            -
            -  
     | 
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 4 | 
         
             
            ---
         
     | 
| 5 | 
         | 
| 6 | 
         
            -
             
     | 
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
| 
         | 
|
| 1 | 
         
             
            ---
         
     | 
| 2 | 
         
            +
            language: mt
         
     | 
| 3 | 
         
             
            tags:
         
     | 
| 4 | 
         
            +
            - audio
         
     | 
| 5 | 
         
            +
            - automatic-speech-recognition
         
     | 
| 6 | 
         
            +
            - voxpopuli
         
     | 
| 7 | 
         
            +
            datasets:
         
     | 
| 8 | 
         
            +
            - voxpopuli
         
     | 
| 9 | 
         
            +
            license: cc-by-nc-4.0
         
     | 
| 10 | 
         
            +
            inference: false
         
     | 
| 11 | 
         
             
            ---
         
     | 
| 12 | 
         | 
| 13 | 
         
            +
            # Wav2Vec2-base-VoxPopuli-V2
         
     | 
| 14 | 
         
            +
             
     | 
| 15 | 
         
            +
            [Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) base model pretrained only in **mt** on **9.1k** unlabeled datat of the [VoxPopuli corpus](https://arxiv.org/abs/2101.00390).
         
     | 
| 16 | 
         
            +
             
     | 
| 17 | 
         
            +
            The model is pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz.
         
     | 
| 18 | 
         
            +
             
     | 
| 19 | 
         
            +
            **Note**: This model does not have a tokenizer as it was pretrained on audio alone. In order to use this model for **speech recognition**, a tokenizer should be created and the model should be fine-tuned on labeled text data in **mt**. Check out [this blog](https://huggingface.co/blog/fine-tune-xlsr-wav2vec2) for a more in-detail explanation of how to fine-tune the model. 
         
     | 
| 20 | 
         
            +
             
     | 
| 21 | 
         
            +
            **Paper**: *[VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation
         
     | 
| 22 | 
         
            +
            Learning, Semi-Supervised Learning and Interpretation](https://arxiv.org/abs/2101.00390)*
         
     | 
| 23 | 
         
            +
             
     | 
| 24 | 
         
            +
            **Authors**: *Changhan Wang, Morgane Riviere, Ann Lee, Anne Wu, Chaitanya Talnikar, Daniel Haziza, Mary Williamson, Juan Pino, Emmanuel Dupoux* from *Facebook AI*.
         
     | 
| 25 | 
         
            +
             
     | 
| 26 | 
         
            +
            See the official website for more information, [here](https://github.com/facebookresearch/voxpopuli/).
         
     |