ylankgz's picture
Update README.md
6cce464 verified
|
raw
history blame
2.2 kB
metadata
license: other
license_name: nvidia-open-model-license
license_link: >-
  https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf
pipeline_tag: feature-extraction
base_model:
  - nvidia/nemo-nano-codec-22khz-0.6kbps-12.5fps
library_name: mlx
tags:
  - audio
  - audio-to-audio
  - codec

NanoCodec for Apple Silicon

This is an MLX implementation of NVIDIA NeMo NanoCodec, a lightweight neural audio codec.

Model Description

  • Architecture: fully convolutional generator neural network and three discriminators. The generator comprises an encoder, followed by vector quantization, and a HiFi-GAN-based decoder.
  • Sample Rate: 22.05 kHz
  • Framework: MLX
  • Parameters: 105M

Installation

pip install mlx-nanocodec soundfile
# Install your mlx_codec package

Usage

from mlx-nanocodec.models.audio_codec import AudioCodecModel
import soundfile as sf
import mlx.core as mx
import numpy as np

# Load model from HuggingFace Hub
model = AudioCodecModel.from_pretrained("nineninesix/nemo-nano-codec-22khz-0.6kbps-12.5fps-MLX")

# Load audio
audio, sr = sf.read("input.wav")
audio_mlx = mx.array(audio, dtype=mx.float32)[None, None, :]
audio_len = mx.array([len(audio)], dtype=mx.int32)

# Encode and decode
tokens, tokens_len = model.encode(audio_mlx, audio_len)
reconstructed, recon_len = model.decode(tokens, tokens_len)

# Save output
output = np.array(reconstructed[0, 0, :int(recon_len[0])])
sf.write("output.wav", output, 22050)

Input

  • Input Type: Audio
  • Input Format(s): .wav files
  • Input Parameters: One-Dimensional (1D)
  • Other Properties Related to Input: 22050 Hz Mono-channel Audio

Output

  • Output Type: Audio
  • Output Format: .wav files
  • Output Parameters: One Dimensional (1D)
  • Other Properties Related to Output: 22050 Hz Mono-channel Audio

Notice

Licensed by NVIDIA Corporation under the NVIDIA Open Model License