File size: 2,529 Bytes
2bd9217
dbab2e5
2bd9217
 
 
 
 
 
 
 
a20ad6c
2bd9217
 
 
 
 
 
 
 
 
 
 
 
 
7b3cf26
2bd9217
 
 
 
 
7b3cf26
2bd9217
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6cce464
3e7a8bb
 
 
 
7b3cf26
3e7a8bb
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
---
license: apache-2.0
pipeline_tag: feature-extraction
library_name: mlx
tags:
- audio
- audio-to-audio
- codec
---

# NanoCodec for Apple Silicon

This is an MLX implementation of [NVIDIA NeMo NanoCodec](https://huggingface.co/nvidia/nemo-nano-codec-22khz-0.6kbps-12.5fps), a lightweight neural audio codec.

## Model Description

- **Architecture**: fully convolutional generator neural network and three discriminators. The generator comprises an encoder, followed by vector quantization, and a [HiFi-GAN-based](https://arxiv.org/abs/2010.05646) decoder.
- **Sample Rate**: 22.05 kHz
- **Framework**: MLX
- **Parameters**: 105M

## Installation

```bash
pip install nanocodec-mlx soundfile
```

## Usage

```python
from nanocodec_mlx.models.audio_codec import AudioCodecModel
import soundfile as sf
import mlx.core as mx
import numpy as np

# Load model from HuggingFace Hub
model = AudioCodecModel.from_pretrained("nineninesix/nemo-nano-codec-22khz-0.6kbps-12.5fps-MLX")

# Load audio
audio, sr = sf.read("input.wav")
audio_mlx = mx.array(audio, dtype=mx.float32)[None, None, :]
audio_len = mx.array([len(audio)], dtype=mx.int32)

# Encode and decode
tokens, tokens_len = model.encode(audio_mlx, audio_len)
reconstructed, recon_len = model.decode(tokens, tokens_len)

# Save output
output = np.array(reconstructed[0, 0, :int(recon_len[0])])
sf.write("output.wav", output, 22050)
```

#### Input
  - **Input Type:** Audio 
  - **Input Format(s):** .wav files
  - **Input Parameters:** One-Dimensional (1D)
  - **Other Properties Related to Input:** 22050 Hz Mono-channel Audio

#### Output
  - **Output Type**: Audio 
  - **Output Format:** .wav files
  - **Output Parameters:** One Dimensional (1D)
  - **Other Properties Related to Output:** 22050 Hz Mono-channel Audio


## License

This code is licensed under the Apache License 2.0.

The original NVIDIA NeMo NanoCodec model weights and architecture are developed by NVIDIA and are licensed under the [NVIDIA Open Model License](https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf). See [NOTICE](NOTICE) for attribution.

When using this project, you must comply with both licenses.

## Citation

This is an MLX implementation of NVIDIA NeMo NanoCodec. If you use this work, please cite the original:

- [NVIDIA NeMo NanoCodec](https://huggingface.co/nvidia/nemo-nano-codec-22khz-0.6kbps-12.5fps)
- [NVIDIA Open Model License](https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf)