AI & ML interests

None defined yet.

philipp-zettlΒ 
posted an update 2 days ago
view post
Post
2256
I've been cooking something neat over the past weeks πŸ‘¨β€πŸ³

We all know that training LLMs requires a lot of resources and especially a lot of compute in form of GPUs, or is super slow and inefficient when done on CPUs.

The big players use giant clusters of Nvidia H100s.
But if I look at the profiles of my fellow home brewers, all we can get our hands on are those pesky consumer RTX's. If you're lucky you got yourself a 5080 with 16GB VRAM or something.

To be frank, I don't have that 1.3k disposable cash laying around Β―\_(ツ)_/Β―
But I can write rust and like building ML libraries.

So I asked myself the question(s):
- can I train SMLs at home on my hardware?
- How hard can it be to build a ML library that can stream data between RAM and VRAM on demand, like llama.cpp's unified memory feature [^1]?
- how hard can it be to implement bf16 support?

The answers are wild, trust me!

Image 1: Metrics form last nights build on my "tiny" RTX 2060 (6 GB VRAM)
Image 2: Metrics from my most recent build on my RTX 4070 Laptop (8GB VRAM)

The majority of my time went into the shared memory, but it's stable and I'm very excited!
Here some debug logs, a la "trust me bro"
----
Currently available: 1112735744, attempting to reclaim: 1073741824
--- VRAM STATE [backward pass] ---
Driver Used:    6744 MB / 7805 MB
Data on GPU:    1641 MB
Grads on GPU:   3459 MB
CPU Offloaded: 18230 MB
---------------------------------
Currently available: 1079181312, attempting to reclaim: 1073741824
--- VRAM STATE [backward pass] ---
Driver Used:    6776 MB / 7805 MB
Data on GPU:    1561 MB
Grads on GPU:   3279 MB
CPU Offloaded: 18590 MB
-----------------------------


Final models get exported in safetensors format and are compatible with PyTorch and transformers, for accessibility.

- [^1]: https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md#unified-memory
  • 1 reply
Β·
philipp-zettlΒ 
posted an update 11 days ago
view post
Post
185
I'm unemployed, I have a gaming GPU, and I just published a German LLM.

qwen3-0.6b-german - fine-tuned Qwen3-0.6B in ~40h on an RTX 4070 Ti, using the exact same instruct datasets as the LLΓ€Mmlein paper (ACL 2025).

HellaSwag-DE: 0.3111 β†’ 0.3193 βœ…
ARC-DE: 0.2352 β†’ 0.2575 βœ…
MMlu-DE: 0.3600 β†’ 0.2475 πŸ”» (alignment tax - known trade-off)

Instruction fine-tuning trades some factual breadth for better reasoning and format following. The model is more useful, even if not better on every metric.

Weights, LoRA adapter, full training script and logs all public.

philipp-zettl/qwen3-0.6b-german

It ain't much, but it's honest work.
philipp-zettlΒ 
posted an update over 1 year ago
view post
Post
1217
alias rm='rm -i'


Better be safe than sorry.
philipp-zettlΒ 
posted an update over 1 year ago
view post
Post
1608
This is probably a very hot take, but here goes nothing.

With the incredibly accurate LoRAs we see emerge for high quality models like FLUX from services like fal.ai that offer training within single digit minutes, e.g. 2 min per 1000 iterations.

Why the hell are people publishing private LoRAs as public models?!
Take a look at this listing: https://huggingface.co/models?other=base_model:adapter:black-forest-labs%2FFLUX.1-dev&sort=created

I would expect that people that hold a HF account have some kind of forward thinking. Heck, do you really want to give anyone the power to create ultra realistic images of yourself?!

Didn't we learn anything from social media?
I am puzzled..
  • 6 replies
Β·
philipp-zettlΒ 
posted an update over 1 year ago
view post
Post
1451
πŸš€ Finishing up the prototype of my weekend project called ChessPT πŸš€

- The game state is now being rendered. This simplifies coming up with own new moves
- The model space philipp-zettl/ChessPT was updated to provide an interactive mode.
- The space is currently running v0.4 of philipp-zettl/chessPT
- New updates will come this week.
- Training runs will be logged under https://wandb.ai/philipp-zettl/chessPT/

**Note**: The model is still not performing on a level that I want it to. It predicts too frequently invalid moves (according to the game state). In addition to that the post-processing step is a little faulty, so it might be possible that you end up in a state where the model didn't provide a next move.
philipp-zettlΒ 
posted an update over 1 year ago
view post
Post
612
Version 0.2a of ChessPT is currently training.

I decided to wait with the actual v1.0 until I have a better understanding where I want to go and successfully trained the first fine tune.

I'm playing around with a loss that is highly influenced by the idea of reinforcement.

Basically I'm punishing the model for generating invalid PGN strings.
The current approach sets on simplicity

-2: wrong characters in output
-1: invalid PGN string, but valid charset
0: valid PGN string, incl. valid moves


GPT-4o helped me with the implementation. I'm expecting some errors in the implementation.

The training should finish in somewhat 14h, I will upload the new weights then.
But I still need to run extensive tests on this loss before I can happily call it v0.2 ✌️

BTW, I'm also building a space for the model which will be published tonight after adding descriptions and a nice interface. β™ŸοΈ

philipp-zettl/chessPT
philipp-zettl/ChessPT
philipp-zettlΒ 
posted an update over 1 year ago
view post
Post
1054
This is my first post, so I need to start with a bang!

The people over at
Lichess
published some amazing data sets over the past weeks, including a collection of >1M standard chess games ( Lichess/standard-chess-games).

Finally it's time to revive my chess buddy project from back in 2021 πŸŽ‰

So without any further ado... I'm currently training my first character level LLM, and to be quite frank, I'm pretty astonished with the quality of my testing samples.

I'm using e4 g6, the Modern Defense (https://en.wikipedia.org/wiki/Modern_Defense) as a validation sample.
My model currently predicts mostly d4 Bg7 which are the strongest next moves for white and black.

Now in between I see some results that take lower ranked moves, which makes me very excited.

Once the pre-training is done for the base model, I want to run some fine tuning on more specific data sets, which are
Lichess/chess-openings
Lichess/chess-puzzles

Here are some intermediate examples

Step 6000: 
1. e4 g6 13. Rb1 d5 14. Bd3 Nxf3 15. Nxf3 Nxe3+ 16. Rxd3 Rxd3 17. Rxd6 Rhe8 18. Nd6 Rxd4 19. Rxd7+ Kxd7 20. Re7 Rxe7 21. Qxe7 1-0

Step 12000:
1. e4 g6 22. Be2 Re8 23. Kg2 1-0
1. d4 d5 2. c4 c6 3. Nf3 e6 4. dxe6 Qe7 5. Bb5+ Be8 6. Bxb7# 1-0
1. d4 d5 2. dxe5 Bd6 3. Nc3 h6 4. e4 Bf5 5. exf5 Nd7 6. exd5 Nxd5 7. Bxc4 Bxe2 8. f4 d4 9. Ng3 Bb4+ 10. Bxd4 Qxd4 11. Nfxe2 O-O-O 12. Ne6 Qf5 13. fxg4 Nxe5

Step 30000:
1. e4 g6 2. d4 Bg7 3. Nf3 d6 4. b3 e6 5. Bb2 f5 6. e5 c5 7. dxc5 dxc5 8. Nbd2 Nf6 9. Nce2 O-O 10. Qe2 c4 11. Na4 Bd6 12. f3 Ng4 13. fxg4 1-0
1. c4 c5 2. a3 Nc6 3. cxd5 Nxd5 4. Bf4 g6 5. Be2 Bg7 6. Nf3 Bg4 7. b4 Nf6 8. h3 Bxf3 9. Bxf3 a6 10. Nc3 O-O 11. Qc2 e

(each line starting with 1. is a set of moves)

You can find a first pre trained version here:
philipp-zettl/chessPT