Does this quantized version support running on machines like V100 and V100S?

by ShaoShuoHe - opened Sep 15

Sep 15

I am using VLLM for inference with this quantized version, but it failed, showing the error: RuntimeError: ('Quantization scheme is not supported for ', 'the current GPU. Min capability: 80. ', 'Current capability: 70.'). Would it be feasible to switch to SGLang, or is it only possible to use the native Transformers library for inference and generation?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment