AudioSR Models for ComfyUI

Pre-trained AudioSR (Versatile Audio Super Resolution) models for use with ComfyUI-AudioSR custom node.

ComfyUI_temp_bildo_00002_

Models

audiosr_basic_fp32.safetensors

  • Purpose: General audio super-resolution
  • Best for: Music, sound effects, podcasts, mixed content
  • Format: FP32 SafeTensors
  • Size: ~5.9 GB

audiosr_speech_fp32.safetensors

  • Purpose: Speech/voice optimized super-resolution
  • Best for: Voice recordings, vocals, speech content
  • Format: FP32 SafeTensors
  • Size: ~5.9 GB

Usage

Installation

  1. Install ComfyUI-AudioSR via ComfyUI Manager
  2. Download model(s) from this repository
  3. Place in ComfyUI/models/AudioSR/

Quick Start

ComfyUI Workflow:
Load Audio → AudioSR → Preview/Save Audio

Recommended Settings:

  • Steps: 50-100
  • Guidance Scale: 3.5-5.0
  • Model: Use audiosr_speech_fp32.safetensors for voice, audiosr_basic_fp32.safetensors for everything else

What it does

AudioSR upscales low-quality audio to high-quality 48kHz output using latent diffusion. It:

  • Resamples to 48kHz
  • Enhances high frequencies
  • Reduces compression artifacts
  • Adds clarity and detail

Model Info

Based on AudioSR: Versatile Audio Super-Resolution by Haohe Liu et al.

Original repository: https://github.com/haoheliu/versatile_audio_super_resolution

License: MIT

Hardware Requirements

  • GPU: NVIDIA RTX 3060 or higher (6GB+ VRAM minimum)
  • RAM: 12GB+ recommended
  • Works best with audio > 8kHz input sample rate

Credits

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for drbaph/AudioSR