File size: 4,051 Bytes
5a472e3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
---
license: cc-by-nc-nd-4.0
language:
  - en
library_name: torch
tags:
  - audio
  - music-generation
  - accompaniment-generation
  - unconditional-audio-generation
  - pytorch
---

## AnyAccomp: Generalizable Accompaniment Generation via Quantized Melodic Bottleneck

This is the official Hugging Face model repository for **AnyAccomp**, an accompaniment generation framework from the paper **AnyAccomp: Generalizable Accompaniment Generation via Quantized Melodic Bottleneck**.

AnyAccomp addresses two critical challenges in accompaniment generation: **generalization** to in-the-wild singing voices and **versatility** in handling solo instrumental inputs.

The core of our framework is a **quantized melodic bottleneck**, which extracts robust melodic features. A subsequent flow matching model then generates a matching accompaniment based on these features.

For more details, please visit our [GitHub Repository](https://github.com/AmphionTeam/AnyAccomp).

<img src="https://anyaccomp.github.io/data/framework.jpg" alt="framework" width="500">

## Model Checkpoints

This repository contains the three pretrained components of the AnyAccomp framework:

| Model Name        | Directory                    | Description                                       |
| ----------------- | ---------------------------- | ------------------------------------------------- |
| **VQ**            | `./pretrained/vq`            | Extracts core melodic features from audio.        |
| **Flow Matching** | `./pretrained/flow_matching` | Generates accompaniments from melodic features.   |
| **Vocoder**       | `./pretrained/vocoder`       | Converts generated features into audio waveforms. |

## How to use

To run this model, you need to follow the steps below:

1. Clone the repository and install the environment.
2. Run the Gradio demo / Inference script.

### 1. Clone and Environment

In this section, follow the steps below to clone the repository and install the environment.

1. Clone the repository.
2. Install the environment following the guide below.

```bash
git clone https://github.com/AmphionTeam/AnyAccomp.git

# enter the repositry directory
cd AnyAccomp
```

### 2. Download the Pretrained Models

We provide a simple Python script to download all the necessary pretrained models from Hugging Face into the correct directory.

Before running the script, make sure you are in the `AnyAccomp` root directory.

Run the following command:

```bash
python -c "from huggingface_hub import snapshot_download; snapshot_download(repo_id='amphion/anyaccomp', local_dir='./pretrained', repo_type='model')"
```

If you have trouble connecting to Hugging Face, you can try switching to a mirror endpoint before running the command:

```bash
export HF_ENDPOINT=https://hf-mirror.com
```

### 3. Install the Environment

Before start installing, make sure you are under the `AnyAccomp` directory. If not, use `cd` to enter.

```bash
conda create -n anyaccomp python=3.9
conda activate anyaccomp
conda install -c conda-forge ffmpeg=4.0
pip install -r requirements.txt 
```

### Run the Model

Once the setup is complete, you can run the model using either the Gradio demo or the inference script.

#### Run Gradio 🤗 Playground Locally

You can run the following command to interact with the playground:

```bash
python gradio_app.py
```

#### Inference Script

If you want to infer several audios, you can use the python inference script from folder.


```bash
python infer_from_folder.py
```

By default, the script loads input audio from `./example/input` and saves the results to `./example/output`. You can customize these paths in the [inference script](./anyaccomp/infer_from_folder.py).

## Citation

If you use AnyAccomp in your research, please cite our paper:

```bibtex
@article{zhang2025anyaccomp,
  title={AnyAccomp: Generalizable Accompaniment Generation via Quantized Melodic Bottleneck},
  author={Zhang, Junan and Zhang, Yunjia and Zhang, Xueyao and Wu, Zhizheng},
  journal={arXiv preprint arXiv:2509.14052},
  year={2025}
}
```