metadata
license: other
license_name: nvidia-open-model-license
license_link: >-
https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf
pipeline_tag: feature-extraction
base_model:
- nvidia/nemo-nano-codec-22khz-0.6kbps-12.5fps
library_name: mlx
tags:
- audio
- audio-to-audio
- codec
NanoCodec for Apple Silicon
This is an MLX implementation of NVIDIA NeMo NanoCodec, a lightweight neural audio codec.
Model Description
- Architecture: fully convolutional generator neural network and three discriminators. The generator comprises an encoder, followed by vector quantization, and a HiFi-GAN-based decoder.
- Sample Rate: 22.05 kHz
- Framework: MLX
- Parameters: 105M
Installation
pip install mlx-nanocodec soundfile
# Install your mlx_codec package
Usage
from mlx-nanocodec.models.audio_codec import AudioCodecModel
import soundfile as sf
import mlx.core as mx
import numpy as np
# Load model from HuggingFace Hub
model = AudioCodecModel.from_pretrained("nineninesix/nemo-nano-codec-22khz-0.6kbps-12.5fps-MLX")
# Load audio
audio, sr = sf.read("input.wav")
audio_mlx = mx.array(audio, dtype=mx.float32)[None, None, :]
audio_len = mx.array([len(audio)], dtype=mx.int32)
# Encode and decode
tokens, tokens_len = model.encode(audio_mlx, audio_len)
reconstructed, recon_len = model.decode(tokens, tokens_len)
# Save output
output = np.array(reconstructed[0, 0, :int(recon_len[0])])
sf.write("output.wav", output, 22050)
Input
- Input Type: Audio
- Input Format(s): .wav files
- Input Parameters: One-Dimensional (1D)
- Other Properties Related to Input: 22050 Hz Mono-channel Audio
Output
- Output Type: Audio
- Output Format: .wav files
- Output Parameters: One Dimensional (1D)
- Other Properties Related to Output: 22050 Hz Mono-channel Audio
Notice
Licensed by NVIDIA Corporation under the NVIDIA Open Model License