๐Ÿง  lastbrain โ€” Darwin V8

Darwin V8 ๊ธฐ๋ฐ˜ Claude Opus ์ฆ๋ฅ˜ ๋ชจ๋ธ (2B ํŒŒ๋ผ๋ฏธํ„ฐ)


๐Ÿ“ฆ ํŠน์ง•

  • Base: Qwen3.5-2B (2.3B ํŒŒ๋ผ๋ฏธํ„ฐ, ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์–ดํ…์…˜)
  • Training: SFT + LoRA (all-linear, rank=16, ฮฑ=32)
  • Teachers: Claude Opus 4.5 / 4.6, Claude Sonnet 4.6 (pre-generated reasoning traces)
  • Data: 4,451 ๊ณ ํ’ˆ์งˆ ์ถ”๋ก  ๊ถค์  (4๊ฐœ ๊ณต๊ฐœ ๋ฐ์ดํ„ฐ์…‹)
  • Merged: LoRA ์–ด๋Œ‘ํ„ฐ๊ฐ€ base ๊ฐ€์ค‘์น˜์— ์™„์ „ ํ†ตํ•ฉ๋˜์–ด ๋…๋ฆฝ ์‹คํ–‰ ๊ฐ€๋Šฅ

๐Ÿš€ ๋น ๋ฅธ ์‚ฌ์šฉ๋ฒ•

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "FINAL-Bench/lastbrain"
tok = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True
)

messages = [
    {"role": "user", "content": "If a train travels 60 km in 45 minutes, what is its speed in km/h?"}
]
prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tok(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=800,
        do_sample=False,
        pad_token_id=tok.eos_token_id,
    )
print(tok.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))

์˜ˆ์‹œ ์ถœ๋ ฅ:

To find the speed of the train in km/h, we need to convert the given time from minutes to hours.

**Given:**
- Distance = 60 km
- Time = 45 minutes

**Step 1: Convert time to hours**
Since there are 60 minutes in 1 hour:
Time in hours=4560=0.75 hours\text{Time in hours} = \frac{45}{60} = 0.75 \text{ hours}

**Step 2: Calculate speed**
Speed=600.75=80 km/h\text{Speed} = \frac{60}{0.75} = 80 \text{ km/h}

**Final Answer:** The speed of the train is **80 km/h**.

๐Ÿงฌ Darwin V8 ํ•™์Šต ํŒŒ์ดํ”„๋ผ์ธ

[Qwen/Qwen3.5-2B] โ”€โ”€โ”€โ”€ Base ๋ชจ๋ธ (๋™๊ฒฐ)
        +
[4,451 Claude Opus/Sonnet reasoning traces]
        โ†“
[SFT Training]
  - LoRA (all-linear, r=16, ฮฑ=32)
  - Learning rate: 2e-4 (V8 rule: ร—10 FullFT)
  - 2 epochs, bf16, 8ร—B200 DDP
  - Loss: 1.33 โ†’ 1.10 (-17%)
  - Token accuracy: 68% โ†’ 72% (+4%p)
        โ†“
[LoRA merge into base weights]
        โ†“
[lastbrain] โ† ์ด ๋ชจ๋ธ

๐Ÿ“Š ํ•™์Šต ๋ฐ์ดํ„ฐ ๊ตฌ์„ฑ

๋ฐ์ดํ„ฐ์…‹ ์ƒ˜ํ”Œ ์ˆ˜ ์ถœ์ฒ˜ Teacher
nohurry/Opus-4.6-Reasoning-3000x-filtered 2,326 Claude Opus 4.6
TeichAI/Claude-Opus-4.6-Reasoning-887x 887 Claude Opus 4.6
TeichAI/claude-4.5-opus-high-reasoning-250x 250 Claude Opus 4.5
TeichAI/Claude-Sonnet-4.6-Reasoning-1100x 1,100 Claude Sonnet 4.6
ํ•ฉ๊ณ„ (ํ•„ํ„ฐ ํ›„) 4,451 -

๐ŸŽฏ ์„ค๊ณ„ ์ฒ ํ•™ (Darwin V8)

  1. LoRA Without Regret โ€” all-linear target, high LR, ์ž‘์€ rank๋„ OK
  2. Response Distillation โ€” pre-generated Opus traces๋กœ ๋น„์šฉ ํšจ์œจ์  ์ฆ๋ฅ˜
  3. Merge-and-Deploy โ€” LoRA ์–ด๋Œ‘ํ„ฐ ํ†ตํ•ฉ ํ›„ ์ถ”๊ฐ€ ์˜์กด์„ฑ ์—†์ด ๋ฐฐํฌ

๐Ÿ” ์žฌํ˜„ ๋ฐฉ๋ฒ•

์ด ๋ชจ๋ธ์€ ๋‹ค์Œ ๋‘ ์ปดํฌ๋„ŒํŠธ๋ฅผ mergeํ•˜์—ฌ ๋งŒ๋“ค์–ด์กŒ์Šต๋‹ˆ๋‹ค:

from transformers import AutoModelForCausalLM
from peft import PeftModel
import torch

base = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3.5-2B", torch_dtype=torch.bfloat16
)
model = PeftModel.from_pretrained(
    base, "FINAL-Bench/Qwen3.5-2B-Opus-Distill-v1"
)
merged = model.merge_and_unload()
merged.save_pretrained("./lastbrain")

๐Ÿ“ ์ƒ˜ํ”Œ ํ…Œ์ŠคํŠธ ๊ฒฐ๊ณผ (4๋ฌธ์ œ)

์œ ํ˜• ์ •๋‹ต ์—ฌ๋ถ€ ์‘๋‹ต ๊ธธ์ด
Math (๊ธฐ์ฐจ ์†๋„) โœ… 80 km/h 771์ž
Logic (ํ‚ค ๋น„๊ต) โœ… Carol 354์ž
Code (์†Œ์ˆ˜ ํŒ๋ณ„) โœ… Python ํ•จ์ˆ˜ 1,712์ž
Korean (์ตœ์ €์‹œ๊ธ‰) โœ… 1,577,600์› 142์ž

Markdown/LaTeX/Step-by-Step ๊ตฌ์กฐํ™”๋œ ๋‹ต๋ณ€ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ์ƒ์„ฑ


โš ๏ธ ์ œํ•œ ์‚ฌํ•ญ

  • ๊ทœ๋ชจ: 2.3B ํŒŒ๋ผ๋ฏธํ„ฐ (์†Œํ˜• ๋ชจ๋ธ)
  • ํ•œ๊ตญ์–ด ๊ณ„์‚ฐ ์ •ํ™•์„ฑ: ๋•Œ๋กœ ์ˆซ์ž ์˜ค๋ฅ˜ ๋ฐœ์ƒ ๊ฐ€๋Šฅ (์†Œํ˜• ๋ชจ๋ธ ํ•œ๊ณ„)
  • ๊ธด ์ปจํ…์ŠคํŠธ: ํ•™์Šต ์‹œ max_length=4,096์œผ๋กœ ํ•™์Šต๋จ
  • <think> ํƒœ๊ทธ: ๋ช…์‹œ์  ์‚ฌ์šฉ ๋‚ฎ์Œ (reasoning์„ ๋ณธ๋ฌธ์— ํ†ตํ•ฉ)

๐Ÿชช ๋ผ์ด์„ ์Šค

  • Base model: Apache 2.0 (Qwen)
  • ํ•™์Šต ๋ฐ์ดํ„ฐ: ๊ฐ ๋ฐ์ดํ„ฐ์…‹ ๊ฐœ๋ณ„ ๋ผ์ด์„ ์Šค ์ฐธ์กฐ
  • ์ด ๋ชจ๋ธ: Apache 2.0

๐Ÿ™ ํฌ๋ ˆ๋”ง

  • Base: Qwen team (Alibaba)
  • Teacher: Anthropic (Claude Opus 4.5/4.6, Sonnet 4.6)
  • ๋ฐ์ดํ„ฐ ๊ณต๊ฐœ: nohurry, TeichAI
  • Training & Release: FINAL-Bench / VIDRAFT_LAB

๐Ÿ”— ๊ด€๋ จ ๋ชจ๋ธ


Darwin V8 ยท Part of the evolutionary model merging series by VIDRAFT_LAB

Downloads last month
168
Safetensors
Model size
2B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for FINAL-Bench/lastbrain

Finetuned
Qwen/Qwen3.5-2B
Adapter
(43)
this model

Collection including FINAL-Bench/lastbrain