Does this quantized version support running on machines like V100 and V100S?

#3
by ShaoShuoHe - opened

I am using VLLM for inference with this quantized version, but it failed, showing the error: RuntimeError: ('Quantization scheme is not supported for ', 'the current GPU. Min capability: 80. ', 'Current capability: 70.'). Would it be feasible to switch to SGLang, or is it only possible to use the native Transformers library for inference and generation?

Sign up or log in to comment