|
--- |
|
MachineLearningML: Continued Pretraining Language Models on Millions of Synthetic Tabular Prediction Tasks Scales In-Context ML |
|
license: apache-2.0 |
|
base_model: |
|
- Qwen/Qwen2.5-7B-Instruct |
|
--- |
|
|
|
# MachineLearningLM |
|
|
|
## model summary |
|
|
|
Can LLMs learn from 1,000 in-context examples? |
|
|
|
Introducing **MachineLearningLM** 🧪📊 — a model continuously pretrained on millions of synthetic tabular ML tasks, enabling robust many-shot in-context learning. |
|
|
|
📈 **Scales from 8 to 1,024 examples** |
|
|
|
📈 **~15% improvement** on unseen tabular tasks compared to o3-mini / GPT-5-mini / Qwen-2.5-7B |
|
|
|
🌲 **Random-Forest–level robustness** |
|
|
|
🧠 **MMLU score: 75.4%** |
|
|
|
📄 Read the paper: https://huggingface.co/papers/2509.06806 |
|
|
|
GitHub: https://github.com/HaoAreYuDong/MachineLearningLM |
|
|
|
## evaluation and validation |
|
|
|
We have developed an automated evaluation framework — simply configure the parameters to easily perform validation and evaluation. |
|
**The code is now open-sourced at our GitHub.** |
|
|
|
**Quick Start** |
|
|
|
```bash |
|
pip install -r requirements.txt |
|
python ./src/evaluation/model_pred/dl_model_pred.py \ |
|
--input_dir ./demo_input.jsonl \ |
|
--output_dir ./demo_output.jsonl \ |
|
--model_name MachineLearningLM/MachineLearningLM-7B-v1 |
|
``` |
|
**pipeline** |
|
```bash |
|
# modify the evaluate_parameters.sh file |
|
source evaluate_parameters.sh |
|
|
|
# Option 1 End-to-End Pipeline |
|
./scripts/evaluate_pipeline.sh |
|
|
|
# Option 2 Parallel Processing |
|
./scripts/multi_process/data_prep.sh |
|
./scripts/multi_process/prompt_gen.sh # For deep learning only |
|
./scripts/multi_process/model_pred.sh |
|
./scripts/multi_process/evaluation.sh |
|
./scripts/multi_process/report.sh |
|
|
|
# Option3 Sequential Processing |
|
./scripts/single_process/data_prep.sh |
|
./scripts/single_process/prompt_gen.sh # For deep learning only |
|
./scripts/single_process/model_pred.sh |
|
./scripts/single_process/evaluation.sh |
|
./scripts/single_process/report.sh |
|
``` |
|
|
|
**Quants** |
|
|
|
https://huggingface.co/mradermacher/MachineLearningLM-7B-v1-GGUF |
|
|
|
For more usage details, please visit our GitHub. |
|
|
|
|