Spaces:

unsloth
/

README

Running

vLLM + openai/gpt-oss-20b on 3× RTX 3090 (CUDA 12.8) — FlashAttention Error

#10

by robinhassan - opened 20 days ago

20 days ago

Hi,

Trying to run openai/gpt-oss-20b on vLLM with:

3 × RTX 3090 (24 GB each)

CUDA 12.8, Driver 570.xx

Python 3, PyTorch 2.7.0

I want to split the model across all GPUs, but I’m getting a FlashAttention error (likely needs FA3).

Any tips for multi-GPU vLLM setup and installing FlashAttention 3 for this environment?

Thanks!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment