Kkk's picture

Kkk

Michalea

·

AI & ML interests

None yet

Recent Activity

new activity 2 days ago

BLR2/Qwen3.5-9B-Eagle3-ShareGPT:MTP vs Eagle3

new activity 4 days ago

nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16:Add SWE-Bench Verified evaluation results

new activity 4 days ago

zai-org/GLM-5:The performance of dense small models is now gaining renewed attention. Could GLM's 32B/40B variants get their chance to shine?

View all activity

Organizations

None yet

New activity in BLR2/Qwen3.5-9B-Eagle3-ShareGPT 2 days ago

MTP vs Eagle3

#1 opened 14 days ago by

New activity in nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 4 days ago

Add SWE-Bench Verified evaluation results

#18 opened 12 days ago by

New activity in zai-org/GLM-5 4 days ago

The performance of dense small models is now gaining renewed attention. Could GLM's 32B/40B variants get their chance to shine?

#62 opened 18 days ago by

New activity in nvidia/GLM-5-NVFP4 4 days ago

SciBench Results - methodology

#3 opened 4 days ago by

New activity in lukealonso/GLM-5-NVFP4 7 days ago

Eval. of your quant.

#5 opened 7 days ago by

New activity in nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 7 days ago

math-oai.yaml file for aime eval

#56 opened 7 days ago by

New activity in festr2/GLM-5-NVFP4-MTP 10 days ago

[Bug] Eagle V2 speculative decoding crashes with NaN in logits when radix cache prefix hit occurs (SM120 / RTX PRO 6000 Blackwell)

#2 opened 24 days ago by

New activity in GadflyII/GLM-4.7-Flash-MTP-NVFP4 22 days ago

SGLang and MTP

#2 opened 22 days ago by

New activity in Qwen/Qwen3-4B-Thinking-2507 about 1 month ago

RoPE theta 5mln instead of 1mln

#15 opened about 1 month ago by

New activity in lmsys/SGLang-EAGLE3-Qwen3-Next-80B-A3B-Instruct-FP8-SpecForge-Meituan about 1 month ago

The comparison with the original MTP

#2 opened about 1 month ago by

New activity in nvidia/Qwen3-235B-A22B-Eagle3 about 1 month ago

context length/number of generated tokens during training

#4 opened about 1 month ago by

New activity in AngelSlim/Qwen3-8B_eagle3 about 2 months ago

Datasets used to create this head

#5 opened about 2 months ago by

New activity in unsloth/GLM-4.7-Flash-REAP-23B-A3B-GGUF about 2 months ago

A low number of evaluation benchmarks

#5 opened about 2 months ago by

New activity in RedHatAI/Qwen3-235B-A22B-speculator.eagle3 about 2 months ago

Context length and regeneration

#1 opened about 2 months ago by

New activity in GadflyII/GLM-4.7-Flash-NVFP4 about 2 months ago

MTP quality, 47 layer

#7 opened about 2 months ago by

New activity in nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4 about 2 months ago

Efficiency of NVFP4 vs FP16/8

#4 opened about 2 months ago by

New activity in speakleash/Bielik-11B-v3.0-Instruct about 2 months ago

opis wersji 3.0

#1 opened 3 months ago by

New activity in cerebras/GLM-4.7-Flash-REAP-23B-A3B about 2 months ago

Evaluation

#3 opened about 2 months ago by

New activity in unsloth/GLM-4.7-Flash-FP8-Dynamic about 2 months ago

Severe Looping/Repetitive Output when using --kv-cache-dtype fp8 with GLM-4.7-Flash-FP8-Dynamic on vLLM

#2 opened about 2 months ago by

New activity in nvidia/gpt-oss-120b-Eagle3-throughput 3 months ago

Inconsistent description with the evaluation results

#3 opened 3 months ago by