RuntimeError: split_with_sizes expects split_sizes have only non-negative entries, but got split_sizes=[64, -32]

#1
by nikita-savelyev-cerebras - opened

Hi! Thanks for preparing the model.

I observe the following error when running sample script for inference with Transformers:

/venv/bin/python tmp.py 
Loading weights: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 30/30 [00:00<00:00, 282.85it/s, Materializing param=model.norm.weight]
GlmMoeDsaForCausalLM LOAD REPORT from: tiny-random/glm-5
Key                                                       | Status     | 
----------------------------------------------------------+------------+-
model.layers.2.mlp.gate.e_score_correction_bias           | UNEXPECTED | 
model.layers.{0, 1, 2}.self_attn.k_norm.weight            | UNEXPECTED | 
model.layers.2.self_attn.q_b_proj.weight                  | UNEXPECTED | 
model.layers.2.self_attn.q_a_layernorm.weight             | UNEXPECTED | 
model.layers.2.self_attn.o_proj.weight                    | UNEXPECTED | 
model.layers.2.shared_head.norm.weight                    | UNEXPECTED | 
model.layers.{0, 1, 2}.self_attn.wq_b.weight              | UNEXPECTED | 
model.layers.2.mlp.shared_experts.gate_proj.weight        | UNEXPECTED | 
model.layers.{0, 1, 2}.self_attn.weights_proj.weight      | UNEXPECTED | 
model.layers.2.input_layernorm.weight                     | UNEXPECTED | 
model.layers.2.self_attn.kv_a_proj_with_mqa.weight        | UNEXPECTED | 
model.layers.2.self_attn.kv_b_proj.weight                 | UNEXPECTED | 
model.layers.2.enorm.weight                               | UNEXPECTED | 
model.layers.{0, 1, 2}.self_attn.wk.weight                | UNEXPECTED | 
model.layers.2.mlp.experts.down_proj                      | UNEXPECTED | 
model.layers.2.eh_proj.weight                             | UNEXPECTED | 
model.layers.2.mlp.shared_experts.down_proj.weight        | UNEXPECTED | 
model.layers.2.mlp.gate.weight                            | UNEXPECTED | 
model.layers.2.mlp.experts.gate_up_proj                   | UNEXPECTED | 
model.layers.2.self_attn.q_a_proj.weight                  | UNEXPECTED | 
model.layers.2.hnorm.weight                               | UNEXPECTED | 
model.layers.2.mlp.shared_experts.up_proj.weight          | UNEXPECTED | 
model.layers.2.post_attention_layernorm.weight            | UNEXPECTED | 
model.layers.2.self_attn.kv_a_layernorm.weight            | UNEXPECTED | 
model.layers.{0, 1}.self_attn.indexer.wk.weight           | MISSING    | 
model.layers.{0, 1}.self_attn.indexer.k_norm.bias         | MISSING    | 
model.layers.{0, 1}.self_attn.indexer.weights_proj.weight | MISSING    | 
model.layers.{0, 1}.self_attn.indexer.k_norm.weight       | MISSING    | 
model.layers.{0, 1}.self_attn.indexer.wq_b.weight         | MISSING    | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.
- MISSING	:those params were newly initialized because missing from the checkpoint. Consider training on your downstream task.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Traceback (most recent call last):
  File "tmp.py", line 14, in <module>
    generated_ids = model.generate(input_ids, max_new_tokens=32)
  File "/venv/lib/python3.13/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
    return func(*args, **kwargs)
  File "/venv/lib/python3.13/site-packages/transformers/generation/utils.py", line 2668, in generate
    result = decoding_method(
        self,
    ...<5 lines>...
        **model_kwargs,
    )
  File "/venv/lib/python3.13/site-packages/transformers/generation/utils.py", line 2863, in _sample
    outputs = self._prefill(input_ids, generation_config, model_kwargs)
  File "/venv/lib/python3.13/site-packages/transformers/generation/utils.py", line 3857, in _prefill
    return self(**model_inputs, return_dict=True)
  File "/venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
  File "/venv/lib/python3.13/site-packages/transformers/utils/generic.py", line 841, in wrapper
    output = func(self, *args, **kwargs)
  File "/venv/lib/python3.13/site-packages/transformers/models/glm_moe_dsa/modeling_glm_moe_dsa.py", line 864, in forward
    outputs: BaseModelOutputWithPast = self.model(
                                       ~~~~~~~~~~^
        input_ids=input_ids,
        ^^^^^^^^^^^^^^^^^^^^
    ...<6 lines>...
        **kwargs,
        ^^^^^^^^^
    )
    ^
  File "/venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
  File "/venv/lib/python3.13/site-packages/transformers/utils/generic.py", line 915, in wrapper
    output = func(self, *args, **kwargs)
  File "/venv/lib/python3.13/site-packages/transformers/utils/output_capturing.py", line 253, in wrapper
    outputs = func(self, *args, **kwargs)
  File "/venv/lib/python3.13/site-packages/transformers/models/glm_moe_dsa/modeling_glm_moe_dsa.py", line 799, in forward
    hidden_states = decoder_layer(
        hidden_states,
    ...<6 lines>...
        **kwargs,
    )
  File "/venv/lib/python3.13/site-packages/transformers/modeling_layers.py", line 93, in __call__
    return super().__call__(*args, **kwargs)
           ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
  File "/venv/lib/python3.13/site-packages/transformers/models/glm_moe_dsa/modeling_glm_moe_dsa.py", line 614, in forward
    hidden_states, _ = self.self_attn(
                       ~~~~~~~~~~~~~~^
        hidden_states=hidden_states,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<6 lines>...
        **kwargs,
        ^^^^^^^^^
    )
    ^
  File "/venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
  File "/venv/lib/python3.13/site-packages/transformers/models/glm_moe_dsa/modeling_glm_moe_dsa.py", line 398, in forward
    topk_indices = self.indexer(
        hidden_states,
    ...<3 lines>...
        use_cache=past_key_values is not None,
    )  # [B, S, topk]
  File "/venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
  File "/venv/lib/python3.13/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
    return func(*args, **kwargs)
  File "/venv/lib/python3.13/site-packages/transformers/models/glm_moe_dsa/modeling_glm_moe_dsa.py", line 181, in forward
    q_pe, q_nope = torch.split(q, [self.qk_rope_head_dim, self.head_dim - self.qk_rope_head_dim], dim=-1)
                   ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.13/site-packages/torch/functional.py", line 173, in split
    return tensor.split(split_size_or_sections, dim)
           ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.13/site-packages/torch/_tensor.py", line 1066, in split
    return torch._VF.split_with_sizes(
           ~~~~~~~~~~~~~~~~~~~~~~~~~~^
        self,
        ^^^^^
    ...<2 lines>...
        dim,
        ^^^^
    )
    ^
RuntimeError: split_with_sizes expects split_sizes have only non-negative entries, but got split_sizes=[64, -32]

Process finished with exit code 1

Environment:

accelerate==1.12.0
aiohappyeyeballs==2.6.1
aiohttp==3.13.3
aiosignal==1.4.0
annotated-doc==0.0.4
anyio==4.12.1
attrs==25.4.0
black==26.1.0
certifi==2026.1.4
charset-normalizer==3.4.4
click==8.3.1
cuda-bindings==12.9.4
cuda-pathfinder==1.3.5
datasets==4.5.0
dill==0.4.0
filelock==3.24.3
frozenlist==1.8.0
fsspec==2025.10.0
h11==0.16.0
hf-xet==1.3.0
httpcore==1.0.9
httpx==0.28.1
huggingface_hub==1.4.1
idna==3.11
iniconfig==2.3.0
Jinja2==3.1.6
markdown-it-py==4.0.0
MarkupSafe==3.0.3
mdurl==0.1.2
mpmath==1.3.0
multidict==6.7.1
multiprocess==0.70.18
mypy_extensions==1.1.0
networkx==3.6.1
ninja==1.13.0
numpy==2.2.6
nvidia-cublas-cu12==12.8.4.1
nvidia-cuda-cupti-cu12==12.8.90
nvidia-cuda-nvrtc-cu12==12.8.93
nvidia-cuda-runtime-cu12==12.8.90
nvidia-cudnn-cu12==9.10.2.21
nvidia-cufft-cu12==11.3.3.83
nvidia-cufile-cu12==1.13.1.3
nvidia-curand-cu12==10.3.9.90
nvidia-cusolver-cu12==11.7.3.90
nvidia-cusparse-cu12==12.5.8.93
nvidia-cusparselt-cu12==0.7.1
nvidia-nccl-cu12==2.27.5
nvidia-nvjitlink-cu12==12.8.93
nvidia-nvshmem-cu12==3.4.5
nvidia-nvtx-cu12==12.8.90
packaging==26.0
pandas==3.0.1
pathspec==1.0.4
platformdirs==4.9.2
pluggy==1.6.0
propcache==0.4.1
psutil==7.2.2
pyarrow==23.0.1
Pygments==2.19.2
pytest==9.0.2
python-dateutil==2.9.0.post0
pytokens==0.4.1
PyYAML==6.0.3
regex==2026.2.19
requests==2.32.5
rich==14.3.3
ruff==0.14.14
safetensors==0.7.0
setuptools==82.0.0
shellingham==1.5.4
six==1.17.0
sympy==1.14.0
tokenizers==0.22.2
torch==2.10.0
tqdm==4.67.3
transformers==5.2.0
triton==3.6.0
typer==0.24.1
typer-slim==0.24.0
typing_extensions==4.15.0
urllib3==2.6.3
xxhash==3.6.0
yarl==1.22.0
zstandard==0.25.0

Perhaps something wrong with the config?

tiny-random org

Hi Nikita, thanks for your investigation! I've updated the models

yujiepan changed discussion status to closed

Sign up or log in to comment