|
--- |
|
library_name: transformers |
|
license: mit |
|
language: |
|
- nl |
|
pipeline_tag: token-classification |
|
widget: |
|
- text: >- |
|
Vandaag bespreken we Turks Fruit, een meesterwerk van de Nederlandse auteur Jan Wolkers. |
|
Dit boek, dat oorspronkelijk werd gepubliceerd in 1969, is een van de meest iconische en controversiële werken in de Nederlandse literatuur. |
|
- text: >- |
|
Gisteren heb ik het boek Nijntje in de dierentuin gelezen. Ik kan niet |
|
anders zeggen dat dit boek fantastisch was! |
|
metrics: |
|
- f1 |
|
tags: |
|
- Literature |
|
- PyTorch |
|
--- |
|
|
|
# Model Card for Dutch Book Title Extraction |
|
|
|
This Named Entity Recognition (NER) model is designed to extract book titles from Dutch texts. |
|
|
|
## Model Details |
|
|
|
The model has been fine-tuned and evaluated on a Dutch dataset consisting of 12,535 book reviews from the Leeuwarder Courant, identifying 23,529 book titles. The dataset utilizes the IO Tagging Schema. The data was divided into a training set (70%), validation set (15%), and test set (15%). Training involved the Majority or Minority loss function, achieving an F1 score of 84.3%, Precision of 83.4%, and Recall of 85.2% on the test set. |
|
 |
|
|
|
## Model Description |
|
|
|
- **Model type:** XML-RoBERTa |
|
- **Language(s):** Dutch |
|
- **Fine-tuned from model:** [FacebookAI/xlm-roberta-large-finetuned-conll03-english](https://huggingface.co/FacebookAI/xlm-roberta-large-finetuned-conll03-english) |
|
|
|
## Model Flaws |
|
- Struggles with accurately identifying subtitles of book titles. |
|
- When a book title is mentioned multiple times within the same review, the model tends to mark it only once, missing subsequent occurrences. |
|
|
|
## Uses |
|
|
|
This model is intended for extracting book titles from Dutch texts, particularly useful for applications involving text analysis in the literary domain. |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline |
|
|
|
# Load the model and tokenizer |
|
tokenizer = AutoTokenizer.from_pretrained("nielsaxe/BookTitleNERDutch") |
|
model = AutoModelForTokenClassification.from_pretrained("nielsaxe/BookTitleNERDutch") |
|
|
|
# Create a NER pipeline |
|
nlp = pipeline("ner", model=model, tokenizer=tokenizer) |
|
|
|
# Example usage |
|
text = "Gisteren heb ik het boek Nijntje in de dierentuin gelezen. Ik kan niet anders zeggen dat dit boek fantastisch was!" |
|
entities = nlp(text) |
|
print(entities) |
|
``` |