File size: 2,058 Bytes
ee0998f
eb70ae1
ecc971f
 
 
ee0998f
 
 
 
 
 
4401dfb
 
 
 
 
 
 
 
 
 
27deb4c
4401dfb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
---
title: SonicVerse
emoji: πŸ–Ό
colorFrom: purple
colorTo: red
sdk: gradio
sdk_version: 5.25.2
app_file: app.py
pinned: false
---

# 🎼 SonicVerse

An interactive demo for SonicVerse, a music captioning model, allowing users to input audio of up to 10 seconds and generate a natural language caption
that includes a general description of the music as well as music features such as key, instruments, genre, mood / theme, vocals gender.

---

## πŸš€ Demo

Check out the live Space here:  
[![Hugging Face Space](https://img.shields.io/badge/HuggingFace-Space-blue?logo=huggingface)](https://huggingface.co/spaces/amaai-lab/SonicVerse)

---

## πŸš€ Samples

Short captions

---

## πŸ“¦ Features

βœ… Upload a 10 second music clip and get a caption

βœ… Upload a long music clip (upto 1 minute for successful demo) to get a long detailed caption for the whole music clip.

---

## πŸ› οΈ How to Run Locally

```bash
# Clone the repo
git clone https://github.com/AMAAI-Lab/SonicVerse
cd SonicVerse

# Install dependencies
pip install -r requirements.txt

# Alternatively, set up conda environment
conda env create -f environment.yml
conda activate sonicverse

# Run the app
python app.py
```

---

<!-- ## πŸ“‚ File Structure

```
.
β”œβ”€β”€ app.py               # Web app file
β”œβ”€β”€ requirements.txt     # Python dependencies
β”œβ”€β”€ environment.yml      # Conda environment
β”œβ”€β”€ README.md            # This file
└── src/sonicverse       # Source 
```

--- -->

## πŸ’‘ Usage

To use the app:
1. Select audio clip to input 
2. Click the **Generate** button.
3. See the model’s output below.

---

## 🧹 Built With

- [Hugging Face Spaces](https://huggingface.co/spaces)
- [Gradio](https://gradio.app/)
- [Mistral 7B](https://huggingface.co/mistralai/Mistral-7B-v0.1)
- [MERT 95M](https://huggingface.co/m-a-p/MERT-v1-95M)
---

<!-- ## ✨ Acknowledgements

- [Model authors or papers you built on]
- [Contributors or collaborators]

---

## πŸ“œ License

This project is licensed under the MIT License / Apache 2.0 / Other.
 -->