|
--- |
|
license: apache-2.0 |
|
language: |
|
- pl |
|
base_model: |
|
- sdadas/polish-roberta-large-v2 |
|
pipeline_tag: text-classification |
|
library_name: transformers |
|
tags: |
|
- news |
|
--- |
|
|
|
|
|
### Description |
|
`polarity3c` is a classification model that is specialized for determining the polarity of texts from news portals. It was learned mostly on texts in Polish. |
|
|
|
<center><img src="https://cdn-uploads.huggingface.co/production/uploads/644addfe9279988e0cbc296b/v6pz2sBwc3GCPL1Il8wVP.png" width=20%></center> |
|
|
|
Annotations from the plWordnet were used as the basis for the data. A pre-learned model on these annotations, served as the model in Human in the loop, |
|
to support the annotations for teaching the final model. The final model was learned on web content; data was manually collected and annotated. |
|
|
|
As a model base, the `sdadas/polish-roberta-large-v2` model was used with a classification head. More about model construction is on our [blog](https://radlab.dev/2025/06/01/polaryzacja-3c-model-z-plg-na-hf/). |
|
|
|
### Architecture |
|
``` |
|
RobertaForSequenceClassification( |
|
(roberta): RobertaModel( |
|
(embeddings): RobertaEmbeddings( |
|
(word_embeddings): Embedding(128001, 1024, padding_idx=1) |
|
(position_embeddings): Embedding(514, 1024, padding_idx=1) |
|
(token_type_embeddings): Embedding(1, 1024) |
|
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
(encoder): RobertaEncoder( |
|
(layer): ModuleList( |
|
(0-23): 24 x RobertaLayer( |
|
(attention): RobertaAttention( |
|
(self): RobertaSdpaSelfAttention( |
|
(query): Linear(in_features=1024, out_features=1024, bias=True) |
|
(key): Linear(in_features=1024, out_features=1024, bias=True) |
|
(value): Linear(in_features=1024, out_features=1024, bias=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
(output): RobertaSelfOutput( |
|
(dense): Linear(in_features=1024, out_features=1024, bias=True) |
|
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
(intermediate): RobertaIntermediate( |
|
(dense): Linear(in_features=1024, out_features=4096, bias=True) |
|
(intermediate_act_fn): GELUActivation() |
|
) |
|
(output): RobertaOutput( |
|
(dense): Linear(in_features=4096, out_features=1024, bias=True) |
|
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
) |
|
) |
|
(classifier): RobertaClassificationHead( |
|
(dense): Linear(in_features=1024, out_features=1024, bias=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
(out_proj): Linear(in_features=1024, out_features=3, bias=True) |
|
) |
|
) |
|
``` |
|
|
|
### Usage |
|
Example of use with transformers pipeline: |
|
```[python] |
|
from transformers import pipeline |
|
|
|
classifier = pipeline(model="radlab/polarity-3c", task="text-classification") |
|
|
|
classifier("Text to classification") |
|
``` |
|
|
|
with sample data and `top_k=3`: |
|
```[python] |
|
classifier(""" |
|
Po upadku re偶imu Asada w Syrii, mieszka艅cy, borykaj膮cy si臋 z ub贸stwem, |
|
zacz臋li t艂umnie poszukiwa膰 skarb贸w, zach臋ceni legendami o zakopanych |
|
bogactwach i dost臋pno艣ci膮 wykrywaczy metali, kt贸re sta艂y si臋 popularnym |
|
towarem. Mimo, 偶e dzia艂alno艣膰 ta jest nielegalna, rz膮d przymyka oko, |
|
a sprzedawcy oferuj膮 urz膮dzenia nawet dla dzieci. Poszukiwacze skupiaj膮 |
|
si臋 na obszarach historycznych, wierz膮c w legendy o skarbach ukrytych |
|
przez staro偶ytne cywilizacje i wojska osma艅skie, cho膰 eksperci ostrzegaj膮 |
|
przed fa艂szywymi monetami i kradzie偶膮 artefakt贸w z muze贸w.""", |
|
top_k=3 |
|
) |
|
``` |
|
the output is: |
|
``` |
|
[{'label': 'ambivalent', 'score': 0.9995126724243164}, |
|
{'label': 'negative', 'score': 0.00024663121439516544}, |
|
{'label': 'positive', 'score': 0.00024063512682914734}] |
|
``` |
|
|
|
|