RuntimeError: split_with_sizes expects split_sizes have only non-negative entries, but got split_sizes=[64, -32]
#1
by
nikita-savelyev-cerebras - opened
Hi! Thanks for preparing the model.
I observe the following error when running sample script for inference with Transformers:
/venv/bin/python tmp.py
Loading weights: 100%|ββββββββββ| 30/30 [00:00<00:00, 282.85it/s, Materializing param=model.norm.weight]
GlmMoeDsaForCausalLM LOAD REPORT from: tiny-random/glm-5
Key | Status |
----------------------------------------------------------+------------+-
model.layers.2.mlp.gate.e_score_correction_bias | UNEXPECTED |
model.layers.{0, 1, 2}.self_attn.k_norm.weight | UNEXPECTED |
model.layers.2.self_attn.q_b_proj.weight | UNEXPECTED |
model.layers.2.self_attn.q_a_layernorm.weight | UNEXPECTED |
model.layers.2.self_attn.o_proj.weight | UNEXPECTED |
model.layers.2.shared_head.norm.weight | UNEXPECTED |
model.layers.{0, 1, 2}.self_attn.wq_b.weight | UNEXPECTED |
model.layers.2.mlp.shared_experts.gate_proj.weight | UNEXPECTED |
model.layers.{0, 1, 2}.self_attn.weights_proj.weight | UNEXPECTED |
model.layers.2.input_layernorm.weight | UNEXPECTED |
model.layers.2.self_attn.kv_a_proj_with_mqa.weight | UNEXPECTED |
model.layers.2.self_attn.kv_b_proj.weight | UNEXPECTED |
model.layers.2.enorm.weight | UNEXPECTED |
model.layers.{0, 1, 2}.self_attn.wk.weight | UNEXPECTED |
model.layers.2.mlp.experts.down_proj | UNEXPECTED |
model.layers.2.eh_proj.weight | UNEXPECTED |
model.layers.2.mlp.shared_experts.down_proj.weight | UNEXPECTED |
model.layers.2.mlp.gate.weight | UNEXPECTED |
model.layers.2.mlp.experts.gate_up_proj | UNEXPECTED |
model.layers.2.self_attn.q_a_proj.weight | UNEXPECTED |
model.layers.2.hnorm.weight | UNEXPECTED |
model.layers.2.mlp.shared_experts.up_proj.weight | UNEXPECTED |
model.layers.2.post_attention_layernorm.weight | UNEXPECTED |
model.layers.2.self_attn.kv_a_layernorm.weight | UNEXPECTED |
model.layers.{0, 1}.self_attn.indexer.wk.weight | MISSING |
model.layers.{0, 1}.self_attn.indexer.k_norm.bias | MISSING |
model.layers.{0, 1}.self_attn.indexer.weights_proj.weight | MISSING |
model.layers.{0, 1}.self_attn.indexer.k_norm.weight | MISSING |
model.layers.{0, 1}.self_attn.indexer.wq_b.weight | MISSING |
Notes:
- UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
- MISSING :those params were newly initialized because missing from the checkpoint. Consider training on your downstream task.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Traceback (most recent call last):
File "tmp.py", line 14, in <module>
generated_ids = model.generate(input_ids, max_new_tokens=32)
File "/venv/lib/python3.13/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
return func(*args, **kwargs)
File "/venv/lib/python3.13/site-packages/transformers/generation/utils.py", line 2668, in generate
result = decoding_method(
self,
...<5 lines>...
**model_kwargs,
)
File "/venv/lib/python3.13/site-packages/transformers/generation/utils.py", line 2863, in _sample
outputs = self._prefill(input_ids, generation_config, model_kwargs)
File "/venv/lib/python3.13/site-packages/transformers/generation/utils.py", line 3857, in _prefill
return self(**model_inputs, return_dict=True)
File "/venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
return forward_call(*args, **kwargs)
File "/venv/lib/python3.13/site-packages/transformers/utils/generic.py", line 841, in wrapper
output = func(self, *args, **kwargs)
File "/venv/lib/python3.13/site-packages/transformers/models/glm_moe_dsa/modeling_glm_moe_dsa.py", line 864, in forward
outputs: BaseModelOutputWithPast = self.model(
~~~~~~~~~~^
input_ids=input_ids,
^^^^^^^^^^^^^^^^^^^^
...<6 lines>...
**kwargs,
^^^^^^^^^
)
^
File "/venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
return forward_call(*args, **kwargs)
File "/venv/lib/python3.13/site-packages/transformers/utils/generic.py", line 915, in wrapper
output = func(self, *args, **kwargs)
File "/venv/lib/python3.13/site-packages/transformers/utils/output_capturing.py", line 253, in wrapper
outputs = func(self, *args, **kwargs)
File "/venv/lib/python3.13/site-packages/transformers/models/glm_moe_dsa/modeling_glm_moe_dsa.py", line 799, in forward
hidden_states = decoder_layer(
hidden_states,
...<6 lines>...
**kwargs,
)
File "/venv/lib/python3.13/site-packages/transformers/modeling_layers.py", line 93, in __call__
return super().__call__(*args, **kwargs)
~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
return forward_call(*args, **kwargs)
File "/venv/lib/python3.13/site-packages/transformers/models/glm_moe_dsa/modeling_glm_moe_dsa.py", line 614, in forward
hidden_states, _ = self.self_attn(
~~~~~~~~~~~~~~^
hidden_states=hidden_states,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...<6 lines>...
**kwargs,
^^^^^^^^^
)
^
File "/venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
return forward_call(*args, **kwargs)
File "/venv/lib/python3.13/site-packages/transformers/models/glm_moe_dsa/modeling_glm_moe_dsa.py", line 398, in forward
topk_indices = self.indexer(
hidden_states,
...<3 lines>...
use_cache=past_key_values is not None,
) # [B, S, topk]
File "/venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
return forward_call(*args, **kwargs)
File "/venv/lib/python3.13/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
return func(*args, **kwargs)
File "/venv/lib/python3.13/site-packages/transformers/models/glm_moe_dsa/modeling_glm_moe_dsa.py", line 181, in forward
q_pe, q_nope = torch.split(q, [self.qk_rope_head_dim, self.head_dim - self.qk_rope_head_dim], dim=-1)
~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/venv/lib/python3.13/site-packages/torch/functional.py", line 173, in split
return tensor.split(split_size_or_sections, dim)
~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/venv/lib/python3.13/site-packages/torch/_tensor.py", line 1066, in split
return torch._VF.split_with_sizes(
~~~~~~~~~~~~~~~~~~~~~~~~~~^
self,
^^^^^
...<2 lines>...
dim,
^^^^
)
^
RuntimeError: split_with_sizes expects split_sizes have only non-negative entries, but got split_sizes=[64, -32]
Process finished with exit code 1
Environment:
accelerate==1.12.0
aiohappyeyeballs==2.6.1
aiohttp==3.13.3
aiosignal==1.4.0
annotated-doc==0.0.4
anyio==4.12.1
attrs==25.4.0
black==26.1.0
certifi==2026.1.4
charset-normalizer==3.4.4
click==8.3.1
cuda-bindings==12.9.4
cuda-pathfinder==1.3.5
datasets==4.5.0
dill==0.4.0
filelock==3.24.3
frozenlist==1.8.0
fsspec==2025.10.0
h11==0.16.0
hf-xet==1.3.0
httpcore==1.0.9
httpx==0.28.1
huggingface_hub==1.4.1
idna==3.11
iniconfig==2.3.0
Jinja2==3.1.6
markdown-it-py==4.0.0
MarkupSafe==3.0.3
mdurl==0.1.2
mpmath==1.3.0
multidict==6.7.1
multiprocess==0.70.18
mypy_extensions==1.1.0
networkx==3.6.1
ninja==1.13.0
numpy==2.2.6
nvidia-cublas-cu12==12.8.4.1
nvidia-cuda-cupti-cu12==12.8.90
nvidia-cuda-nvrtc-cu12==12.8.93
nvidia-cuda-runtime-cu12==12.8.90
nvidia-cudnn-cu12==9.10.2.21
nvidia-cufft-cu12==11.3.3.83
nvidia-cufile-cu12==1.13.1.3
nvidia-curand-cu12==10.3.9.90
nvidia-cusolver-cu12==11.7.3.90
nvidia-cusparse-cu12==12.5.8.93
nvidia-cusparselt-cu12==0.7.1
nvidia-nccl-cu12==2.27.5
nvidia-nvjitlink-cu12==12.8.93
nvidia-nvshmem-cu12==3.4.5
nvidia-nvtx-cu12==12.8.90
packaging==26.0
pandas==3.0.1
pathspec==1.0.4
platformdirs==4.9.2
pluggy==1.6.0
propcache==0.4.1
psutil==7.2.2
pyarrow==23.0.1
Pygments==2.19.2
pytest==9.0.2
python-dateutil==2.9.0.post0
pytokens==0.4.1
PyYAML==6.0.3
regex==2026.2.19
requests==2.32.5
rich==14.3.3
ruff==0.14.14
safetensors==0.7.0
setuptools==82.0.0
shellingham==1.5.4
six==1.17.0
sympy==1.14.0
tokenizers==0.22.2
torch==2.10.0
tqdm==4.67.3
transformers==5.2.0
triton==3.6.0
typer==0.24.1
typer-slim==0.24.0
typing_extensions==4.15.0
urllib3==2.6.3
xxhash==3.6.0
yarl==1.22.0
zstandard==0.25.0
Perhaps something wrong with the config?
Hi Nikita, thanks for your investigation! I've updated the models
yujiepan changed discussion status to
closed