Which version of ONNXruntime is needed for GroupQueryAttention?

by vkkhare - opened Aug 6

Aug 6

❌ Error loading ONNX model: [ONNXRuntimeError] : 10 : INVALID_GRAPH : Load model from ./data/cuda/cuda-int4-kquant-block-32-mixed/model.onnx failed:This is an invalid model. In Node, ("/model/layers.0/attn/GroupQueryAttention", GroupQueryAttention, "com.microsoft", -1) : ("/model/layers.0/attn/qkv_proj/Add/output_0": tensor(float16),"","","past_key_values.0.key": tensor(float16),"past_key_values.0.value": tensor(float16),"/model/attn_mask_reformat/attn_mask_subgraph/Sub/Cast/output_0": tensor(int32),"/model/attn_mask_reformat/attn_mask_subgraph/Gather/Cast/output_0": tensor(int32),"cos_cache": tensor(float16),"sin_cache": tensor(float16),"","","model.layers.0.attn.sinks": tensor(float16),) -> ("/model/layers.0/attn/GroupQueryAttention/output_0": tensor(float16),"present.0.key": tensor(float16),"present.0.value": tensor(float16),) , Error Node(/model/layers.0/attn/GroupQueryAttention) with schema(com.microsoft::GroupQueryAttention:1) has input size 12 not in range [min=7, max=11].

kvaishnavi

ONNX Runtime org Aug 6

You can install a nightly build of ONNX Runtime to get the latest changes.

ORT nightly GPU package with CUDA 12.X:

# Uninstall any existing ORT packages
$ pip uninstall -y onnxruntime onnxruntime-gpu

# Install ORT nightly GPU package
$ pip install -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/ --pre onnxruntime-gpu

kvaishnavi changed discussion status to closed Aug 11

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment