-
Cached Transformers: Improving Transformers with Differentiable Memory Cache
Paper • 2312.12742 • Published • 13 -
ProTIP: Progressive Tool Retrieval Improves Planning
Paper • 2312.10332 • Published • 8 -
Paloma: A Benchmark for Evaluating Language Model Fit
Paper • 2312.10523 • Published • 13 -
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale
Paper • 2406.17557 • Published • 100
daje kang
daje
AI & ML interests
None yet
Organizations
Paper
-
Cached Transformers: Improving Transformers with Differentiable Memory Cache
Paper • 2312.12742 • Published • 13 -
ProTIP: Progressive Tool Retrieval Improves Planning
Paper • 2312.10332 • Published • 8 -
Paloma: A Benchmark for Evaluating Language Model Fit
Paper • 2312.10523 • Published • 13 -
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale
Paper • 2406.17557 • Published • 100
models 41
daje/whisper-v3-turbo-address
Automatic Speech Recognition • 0.8B • Updated
daje/Qwen2-VL-7B-Instruct-fashion-product-images-small
8B • Updated
daje/Meta-Llama-3.1-8B-Instruct-de-identification
8B • Updated
• 1
daje/Qwen2.5-14B-Instruct-tools
Text Generation • 15B • Updated
daje/model_0.0002_alpha-32_r-64
Updated
• 22
daje/model_0.0002_alpha-8_r-16
Updated
• 19
daje/model_5e-05_alpha-128_r-256
Updated
• 438
daje/model_2e-4_alpha-8_r-16
Updated
• 416
daje/model_Lora
Updated
• 18
daje/model_2e-4
Updated
• 399
datasets 19
daje/korean-address-voice-v2
Viewer
• Updated
• 3.74k • 9
daje/korean-address-voice
Viewer
• Updated
• 118 • 11
daje/synthetic-ko-sql-hard-add-llm-result
Viewer
• Updated
• 1.68k • 8
daje/synthetic-ko-sql-hard
Viewer
• Updated
• 1.68k • 5 • 1
daje/kotext-to-sql-v1-hard
Viewer
• Updated
• 2k • 6
daje/kaggle-image-datasets
Viewer
• Updated
• 44.4k • 12
daje/de-identify-chat-ko
Viewer
• Updated
• 9.92k • 7
daje/ko-hatefulmemes_train_8500
Viewer
• Updated
• 8.2k • 19
daje/ko-hatefulmemes_train_8500_kmhas
Viewer
• Updated
• 95.3k • 26
daje/ko-hatefulmemes_train_2000
Viewer
• Updated
• 1.91k • 13