demo-social-media
This model was trained using influence-guided dataset selection, a technique that uses influence scores to identify the most impactful training data for specific concepts.
Model Description
- Base Model: distilgpt2
- Training Concepts: sentiment, analysis, on, social, media
- Training Method: Influence-guided data selection
- Compute Budget: 100 steps per condition
- Total Datasets: 4
Training Approach
This model was trained using three different data selection strategies to validate the effectiveness of influence-guided training:
- Positive Influence: Datasets with high positive influence scores (most aligned with target concepts)
- Random Baseline: Randomly sampled datasets
- Negative Influence: Datasets with high negative influence scores (least aligned)
Benchmark Results
| Condition | Perplexity ↓ | Train Loss ↓ | Eval Loss ↓ |
|---|---|---|---|
| Positive | 185.75 | 5.1232 | 5.2244 |
| Random | 18.28 | 3.0011 | 2.9060 |
| Negative | 46.11 | 3.9575 | 3.8310 |
Lower is better for all metrics
Training Datasets
The model was trained on datasets selected through influence scoring:
google-research-datasets/poem_sentiment(Influence: 46.167)patched-codes/static-analysis-eval(Influence: -13.218)dvilasuero/ag_news_error_analysis(Influence: -16.302)MahdiA/Iran-protests-media(Influence: 90.503)
Intended Use
This model demonstrates the effectiveness of influence-guided training for:
- Concept-specific language modeling
- Data-efficient training
- Dataset curation research
Limitations
- Trained on a limited compute budget for benchmarking purposes
- May not generalize well outside the target concepts: sentiment, analysis, on, social, media
- Performance depends on the quality of influence score estimation
Citation
If you use this model or the influence-guided training approach, please cite:
@software{influence_guided_training,
title = {Influence-Guided Dataset Selection for Language Models},
author = {Dowser by Durinn},
year = {2025},
url = {https://huggingface.co/durinn/demo-social-media}
}
Model Card Contact
For questions or feedback, visit Durinn
Generated by Dowser - Dataset discovery and training optimization
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support