293 19 344

John Leimgruber III PRO

ubergarm

https://www.paypal.com/donate/?hosted_button_id=HU59345BZVSUA

AI & ML interests

Open LLMs and Astrophotography image processing.

Recent Activity

new activity 3 minutes ago

ubergarm/GLM-4.7-Flash-GGUF:Re-cooking imatrix and quants with updated ik/llama.cpp PR

updated a model 10 minutes ago

ubergarm/GLM-4.7-Flash-GGUF

new activity about 9 hours ago

zai-org/GLM-4.7-Flash:Why does the KV cache occupy so much GPU memory?

View all activity

Organizations

New activity in ubergarm/GLM-4.7-Flash-GGUF 3 minutes ago

Re-cooking imatrix and quants with updated ik/llama.cpp PR

#1 opened 3 minutes ago by

ubergarm

updated a model 10 minutes ago

ubergarm/GLM-4.7-Flash-GGUF

Text Generation • Updated 12 minutes ago • 460 • 7

New activity in zai-org/GLM-4.7-Flash about 9 hours ago

Why does the KV cache occupy so much GPU memory?

#21 opened about 18 hours ago by

yyg201708

Cannot run vLLM on DGX Spark: ImportError: libcudart.so.12

#18 opened about 24 hours ago by

yyg201708

Performance Discussion

👀 2

#1 opened 1 day ago by

IndenScale

Enormous KV-cache size?

👍 ➕ 4

#3 opened 1 day ago by

nephepritou

New activity in noctrex/GLM-4.7-Flash-MXFP4_MOE-GGUF about 11 hours ago

Feedback from running in LM Studio 0.39.3 with v1.103.2 of llama.cpp

#1 opened about 19 hours ago by

spanspek

liked a model about 13 hours ago

noctrex/GLM-4.7-Flash-MXFP4_MOE-GGUF

Text Generation • 30B • Updated 1 day ago • 1.16k • 9

published a model 1 day ago

ubergarm/GLM-4.7-Flash-GGUF

Text Generation • Updated 12 minutes ago • 460 • 7

liked 2 models 1 day ago

ngxson/GLM-4.7-Flash-GGUF

30B • Updated about 20 hours ago • 6.79k • 17

zai-org/GLM-4.7-Flash

Text Generation • 31B • Updated about 17 hours ago • 15.2k • • 784

New activity in ubergarm/GLM-4.7-GGUF 3 days ago

Stable run on 2x RTX 5090 and 2 Xeon E5 2696 V4 and DDR4 with ik_llama.cpp - 6.1 t/s on IQ4_K and 5.1 t/s on IQ5_K, opencode works with this

👍 1

#5 opened 25 days ago by

martossien

liked a model 3 days ago

ArtusDev/requests-exl

Updated Oct 13, 2025 • 6

New activity in ArtusDev/requests-exl 3 days ago

[QUANTING UPDATE]

❤️ 👍 3

#28 opened 5 days ago by

ArtusDev

New activity in ubergarm/Devstral-Small-2-24B-Instruct-2512-GGUF 3 days ago

Mistral 3 large wuant

👍 1

#1 opened 3 days ago by

facedwithahug

New activity in ubergarm/DeepSeek-V3.2-Speciale-GGUF 3 days ago

QuIP - 2 bit quantised as good as 16 bit

#5 opened 8 days ago by

infinityai

New activity in msievers/gemma-3-1b-it-qat-q4_0-gguf 7 days ago

Thanks for sharing your work!

❤️ 2

#1 opened 7 days ago by

ubergarm

New activity in ubergarm/DeepSeek-V3.2-Speciale-GGUF 7 days ago

Say Whattt?!

🔥 👍 4

#1 opened 12 days ago by

mtcl

New activity in ubergarm/Devstral-2-123B-Instruct-2512-GGUF 7 days ago

Decent PPL with 100% IQ4_KSS

🔥 1

#3 opened about 1 month ago by

sokann

New activity in kyutai/pocket-tts 7 days ago

Open access to the model

#1 opened 7 days ago by

jujutechnology

John Leimgruber III PRO

AI & ML interests

Recent Activity

Organizations

ubergarm's activity

Re-cooking imatrix and quants with updated ik/llama.cpp PR

Why does the KV cache occupy so much GPU memory?

Cannot run vLLM on DGX Spark: ImportError: libcudart.so.12

Performance Discussion

Enormous KV-cache size?

Feedback from running in LM Studio 0.39.3 with v1.103.2 of llama.cpp

Stable run on 2x RTX 5090 and 2 Xeon E5 2696 V4 and DDR4 with ik_llama.cpp - 6.1 t/s on IQ4_K and 5.1 t/s on IQ5_K, opencode works with this

[QUANTING UPDATE]

Mistral 3 large wuant

QuIP - 2 bit quantised as good as 16 bit

Thanks for sharing your work!

Say Whattt?!

Decent PPL with 100% IQ4_KSS

Open access to the model