Christopher Schröder's picture

Christopher Schröder

cschroeder

·

https://github.com/webis-de/small-text

AI & ML interests

NLP, Active Learning, Text Representations, PyTorch

Organizations

upvoted a paper 7 months ago

The German Commons - 154 Billion Tokens of Openly Licensed Text for German Language Models

Paper • 2510.13996 • Published Oct 15, 2025 • 9

upvoted 2 papers about 1 year ago

EuroBERT: Scaling Multilingual Encoders for European Languages

Paper • 2503.05500 • Published Mar 7, 2025 • 81

NeoBERT: A Next-Generation BERT

Paper • 2502.19587 • Published Feb 26, 2025 • 38

upvoted a collection over 1 year ago

Models for dataset curation

9 items • Updated Dec 5, 2024 • 17

upvoted a paper over 1 year ago

Self-Training for Sample-Efficient Active Learning for Text Classification with Pre-Trained Language Models

Paper • 2406.09206 • Published Jun 13, 2024 • 1

upvoted 2 collections over 1 year ago

OpenCulture

A multilingual dataset of public domain books and newspapers. • 25 items • Updated Mar 2 • 134

EU20-Benchmarks

Evaluation Benchmarks for 20 European languages. • 5 items • Updated Oct 11, 2024 • 9

upvoted an article over 1 year ago

Article

AI Policy @🤗: Open ML Considerations in the EU AI Act

yjernite

•

Jul 24, 2023

• 2

upvoted a paper over 1 year ago

Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time

Paper • 2408.13233 • Published Aug 23, 2024 • 23

upvoted 4 papers almost 2 years ago

Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies

Paper • 2407.13623 • Published Jul 18, 2024 • 56

RETVec: Resilient and Efficient Text Vectorizer

Paper • 2302.09207 • Published Feb 18, 2023 • 3

LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs

Paper • 2407.03963 • Published Jul 4, 2024 • 18

AnchorAL: Computationally Efficient Active Learning for Large and Imbalanced Datasets

Paper • 2404.05623 • Published Apr 8, 2024 • 3

upvoted a collection about 2 years ago

🎧AI Podcasts and Talks!

🤗Cool stuff to listen to at any time! • 10 items • Updated Oct 6, 2023 • 5

upvoted a paper about 2 years ago

Small-Text: Active Learning for Text Classification in Python

Paper • 2107.10314 • Published Jul 21, 2021 • 1