Does this quantized version support running on machines like V100 and V100S?
#3
by
ShaoShuoHe
- opened
I am using VLLM for inference with this quantized version, but it failed, showing the error: RuntimeError: ('Quantization scheme is not supported for ', 'the current GPU. Min capability: 80. ', 'Current capability: 70.'). Would it be feasible to switch to SGLang, or is it only possible to use the native Transformers library for inference and generation?