Alex Hant
hardhant
AI & ML interests
None yet
Recent Activity
reacted to ManniX-ITA's post with π about 8 hours ago
π Two releases this week pushing merge methodology forward.
βΆ Qwen3.6-27B-Omnimerge-v4-MLP
https://huggingface.co/ManniX-ITA/Qwen3.6-27B-Omnimerge-v4
Same-base DARE-TIES merge of Qwen3.6-27B + 3 fine-tunes (rico03 Claude distill, Esper3.1, kai-os Opus reasoning anchor) via my Omnimerge_v2 method (OBIM-lite + DAREx-q + EMR election).
Hit a Qwen3.6-specific fragility: hyperparams that work flawlessly on 3.5 produced 80% unclosed-<think> on 3.6, collapsing pass@1 to ~20%. Per-tensor delta forensics localized the failure to mlp.{gate,up,down}_proj in
layers 27β52. Fix: MLP-passthrough surgery β copy MLPs verbatim from base, keep merged attn + linear_attn. Leak β 0%.
Q6_K results (vs Qwen3.6 base / vs Omnimerge-v2 on Qwen3.5):
β’ HumanEval: 84.76% (= base, +5.49 pp vs v2)
β’ MBPP corrected: 73.40% (+15.80 pp vs base, β v2)
β’ GPQA Diamond: ~84.75% partial 192/198 (+15.5 pp vs v2)
βΆ Qwen3.5-4B Importance-Signal Study (M1..M5)
Controlled 5-way comparison: same Qwen3.5-4B base, same 2 fine-tunes (Jackrong Claude-4.5 distill + Crow Opus-4.6 distill), only the importance signal driving DARE-TIES sparsification varies.
Q6_K HE / MBPP pass@1:
β’ M1 Vanilla DARE-TIES β 51.22 / 47.00
β’ M2 OMv2 (no signal) β 52.44 / 49.40
β’ M3 OMv2 + Fisher β 57.93 π₯ / 48.80
β’ M4 mergekit ex-LRP (PR #682) β 51.22 / 49.40
β’ M5 OMv2 + LRP β 53.05 / 51.40 π₯
Findings: Fisher wins HE (+4.88 pp over vanilla), LRP wins MBPP (+2.60 pp). Both signals + Omnimerge_v2 recipe beat vanilla. To make multimodal-LM ex-LRP work end-to-end against Qwen3_5ForConditionalGeneration, I filed
5 patches against arcee-ai/mergekit PR #682 + 1 against rachtibat/lxt.
All five Mx checkpoints + Fisher/LRP signal safetensors + reproducer scripts published. reacted to ajibawa-2023's post with π 13 days ago
Go-Code-Large
Dataset: https://huggingface.co/datasets/ajibawa-2023/Go-Code-Large
Go-Code-Large is a large-scale corpus of Go (Golang) programming language source code, comprising 316,427 code samples stored in .jsonl format. The dataset is designed to support research and development in large language model (LLM) pretraining, static analysis, cloud-native systems, and modern backend software engineering.
By offering a focused and curated dataset for Go, this corpus enables experimentation in concurrent programming, distributed systems, and performance-oriented backend servicesβdomains where Go is widely adopted.
Go-Code-Large addresses the relative scarcity of large, language-specific datasets for Go, enabling targeted research into idiomatic Go patterns, concurrency primitives, and scalable system design.Organizations
None yet