Which version of ONNXruntime is needed for GroupQueryAttention?

#2
by vkkhare - opened

❌ Error loading ONNX model: [ONNXRuntimeError] : 10 : INVALID_GRAPH : Load model from ./data/cuda/cuda-int4-kquant-block-32-mixed/model.onnx failed:This is an invalid model. In Node, ("/model/layers.0/attn/GroupQueryAttention", GroupQueryAttention, "com.microsoft", -1) : ("/model/layers.0/attn/qkv_proj/Add/output_0": tensor(float16),"","","past_key_values.0.key": tensor(float16),"past_key_values.0.value": tensor(float16),"/model/attn_mask_reformat/attn_mask_subgraph/Sub/Cast/output_0": tensor(int32),"/model/attn_mask_reformat/attn_mask_subgraph/Gather/Cast/output_0": tensor(int32),"cos_cache": tensor(float16),"sin_cache": tensor(float16),"","","model.layers.0.attn.sinks": tensor(float16),) -> ("/model/layers.0/attn/GroupQueryAttention/output_0": tensor(float16),"present.0.key": tensor(float16),"present.0.value": tensor(float16),) , Error Node(/model/layers.0/attn/GroupQueryAttention) with schema(com.microsoft::GroupQueryAttention:1) has input size 12 not in range [min=7, max=11].

ONNX Runtime org

You can install a nightly build of ONNX Runtime to get the latest changes.

ORT nightly GPU package with CUDA 12.X:

# Uninstall any existing ORT packages
$ pip uninstall -y onnxruntime onnxruntime-gpu

# Install ORT nightly GPU package
$ pip install -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/ --pre onnxruntime-gpu

Sign up or log in to comment