File size: 1,778 Bytes
e3a14a0
 
31b1e9f
e3a14a0
 
 
 
 
 
31b1e9f
 
292e46b
31b1e9f
f601267
31b1e9f
f601267
31b1e9f
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
---
title: README
emoji: 🚀
colorFrom: red
colorTo: gray
sdk: static
pinned: false
---

We advance German NLP through transparent, open-source research with three flagship projects:

**[🐑 LLäMmlein](https://www.informatik.uni-wuerzburg.de/datascience/projects/nlp/llammlein/)** - Comprehensive family of German-only transformer models (120M, 1B, 7B parameters) trained transparently from scratch with full training data and code documentation.

**[📊 SuperGLEBer](https://lsx-uniwue.github.io/SuperGLEBer-site/leaderboard_v1)** - First comprehensive German benchmark suite featuring 29 diverse NLP tasks across domains, providing systematic evaluation for German language models.

**[🤖 ModernGBERT](https://arxiv.org/abs/2505.13136)** - Transparent encoder models (138M, 1B parameters) based on modernBERT architecture, specifically optimized for German language understanding.

Beyond our German NLP ecosystem, we specialize in **LLM-Knowledge Graph Integration** for text mining applications. Our work combines language models with explicit knowledge representations, developing:

- **Character Analysis**: Models like LitBERT for understanding character networks in novels
- **Temporal Text Analysis**: Tracking narrative development through relation detection and scene segmentation
- **Sentiment & Engagement Analysis**: Measuring emotional dynamics in streaming platforms and social media
- **Knowledge Enrichment**: Semantic Web technologies for ontology learning and KG enhancement

🌸 We foster reproducible, collaborative research by open-sourcing models, datasets, and evaluation frameworks, establishing German as a first-class language in the global AI ecosystem while advancing the intersection of symbolic knowledge and neural language understanding.