nineninesix
/

nemo-nano-codec-22khz-0.6kbps-12.5fps-MLX

Feature Extraction

AudioCodecModel

Model card Files Files and versions

nemo-nano-codec-22khz-0.6kbps-12.5fps-MLX / README.md

ylankgz's picture

Update README.md

7b3cf26 verified 10 days ago

|

history blame contribute delete

2.53 kB

	---
	license: apache-2.0
	pipeline_tag: feature-extraction
	library_name: mlx
	tags:
	- audio
	- audio-to-audio
	- codec
	---

	# NanoCodec for Apple Silicon

	This is an MLX implementation of [NVIDIA NeMo NanoCodec](https://huggingface.co/nvidia/nemo-nano-codec-22khz-0.6kbps-12.5fps), a lightweight neural audio codec.

	## Model Description

	- Architecture: fully convolutional generator neural network and three discriminators. The generator comprises an encoder, followed by vector quantization, and a [HiFi-GAN-based](https://arxiv.org/abs/2010.05646) decoder.
	- Sample Rate: 22.05 kHz
	- Framework: MLX
	- Parameters: 105M

	## Installation

	```bash
	pip install nanocodec-mlx soundfile
	```

	## Usage

	```python
	from nanocodec_mlx.models.audio_codec import AudioCodecModel
	import soundfile as sf
	import mlx.core as mx
	import numpy as np

	# Load model from HuggingFace Hub
	model = AudioCodecModel.from_pretrained("nineninesix/nemo-nano-codec-22khz-0.6kbps-12.5fps-MLX")

	# Load audio
	audio, sr = sf.read("input.wav")
	audio_mlx = mx.array(audio, dtype=mx.float32)[None, None, :]
	audio_len = mx.array([len(audio)], dtype=mx.int32)

	# Encode and decode
	tokens, tokens_len = model.encode(audio_mlx, audio_len)
	reconstructed, recon_len = model.decode(tokens, tokens_len)

	# Save output
	output = np.array(reconstructed[0, 0, :int(recon_len[0])])
	sf.write("output.wav", output, 22050)
	```

	#### Input
	- Input Type: Audio
	- Input Format(s): .wav files
	- Input Parameters: One-Dimensional (1D)
	- Other Properties Related to Input: 22050 Hz Mono-channel Audio

	#### Output
	- Output Type: Audio
	- Output Format: .wav files
	- Output Parameters: One Dimensional (1D)
	- Other Properties Related to Output: 22050 Hz Mono-channel Audio


	## License

	This code is licensed under the Apache License 2.0.

	The original NVIDIA NeMo NanoCodec model weights and architecture are developed by NVIDIA and are licensed under the [NVIDIA Open Model License](https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf). See [NOTICE](NOTICE) for attribution.

	When using this project, you must comply with both licenses.

	## Citation

	This is an MLX implementation of NVIDIA NeMo NanoCodec. If you use this work, please cite the original:

	- [NVIDIA NeMo NanoCodec](https://huggingface.co/nvidia/nemo-nano-codec-22khz-0.6kbps-12.5fps)
	- [NVIDIA Open Model License](https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf)