|
--- |
|
license: apache-2.0 |
|
pipeline_tag: feature-extraction |
|
library_name: mlx |
|
tags: |
|
- audio |
|
- audio-to-audio |
|
- codec |
|
--- |
|
|
|
# NanoCodec for Apple Silicon |
|
|
|
This is an MLX implementation of [NVIDIA NeMo NanoCodec](https://huggingface.co/nvidia/nemo-nano-codec-22khz-0.6kbps-12.5fps), a lightweight neural audio codec. |
|
|
|
## Model Description |
|
|
|
- **Architecture**: fully convolutional generator neural network and three discriminators. The generator comprises an encoder, followed by vector quantization, and a [HiFi-GAN-based](https://arxiv.org/abs/2010.05646) decoder. |
|
- **Sample Rate**: 22.05 kHz |
|
- **Framework**: MLX |
|
- **Parameters**: 105M |
|
|
|
## Installation |
|
|
|
```bash |
|
pip install nanocodec-mlx soundfile |
|
``` |
|
|
|
## Usage |
|
|
|
```python |
|
from nanocodec_mlx.models.audio_codec import AudioCodecModel |
|
import soundfile as sf |
|
import mlx.core as mx |
|
import numpy as np |
|
|
|
# Load model from HuggingFace Hub |
|
model = AudioCodecModel.from_pretrained("nineninesix/nemo-nano-codec-22khz-0.6kbps-12.5fps-MLX") |
|
|
|
# Load audio |
|
audio, sr = sf.read("input.wav") |
|
audio_mlx = mx.array(audio, dtype=mx.float32)[None, None, :] |
|
audio_len = mx.array([len(audio)], dtype=mx.int32) |
|
|
|
# Encode and decode |
|
tokens, tokens_len = model.encode(audio_mlx, audio_len) |
|
reconstructed, recon_len = model.decode(tokens, tokens_len) |
|
|
|
# Save output |
|
output = np.array(reconstructed[0, 0, :int(recon_len[0])]) |
|
sf.write("output.wav", output, 22050) |
|
``` |
|
|
|
#### Input |
|
- **Input Type:** Audio |
|
- **Input Format(s):** .wav files |
|
- **Input Parameters:** One-Dimensional (1D) |
|
- **Other Properties Related to Input:** 22050 Hz Mono-channel Audio |
|
|
|
#### Output |
|
- **Output Type**: Audio |
|
- **Output Format:** .wav files |
|
- **Output Parameters:** One Dimensional (1D) |
|
- **Other Properties Related to Output:** 22050 Hz Mono-channel Audio |
|
|
|
|
|
## License |
|
|
|
This code is licensed under the Apache License 2.0. |
|
|
|
The original NVIDIA NeMo NanoCodec model weights and architecture are developed by NVIDIA and are licensed under the [NVIDIA Open Model License](https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf). See [NOTICE](NOTICE) for attribution. |
|
|
|
When using this project, you must comply with both licenses. |
|
|
|
## Citation |
|
|
|
This is an MLX implementation of NVIDIA NeMo NanoCodec. If you use this work, please cite the original: |
|
|
|
- [NVIDIA NeMo NanoCodec](https://huggingface.co/nvidia/nemo-nano-codec-22khz-0.6kbps-12.5fps) |
|
- [NVIDIA Open Model License](https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf) |