File size: 3,987 Bytes
168592a 8f0d850 168592a 017acf2 91d3d1a 017acf2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 |
---
license: apache-2.0
language:
- pl
base_model:
- sdadas/polish-roberta-large-v2
pipeline_tag: text-classification
library_name: transformers
tags:
- news
---
### Description
`polarity3c` is a classification model that is specialized for determining the polarity of texts from news portals. It was learned mostly on texts in Polish.
<center><img src="https://cdn-uploads.huggingface.co/production/uploads/644addfe9279988e0cbc296b/v6pz2sBwc3GCPL1Il8wVP.png" width=20%></center>
Annotations from the plWordnet were used as the basis for the data. A pre-learned model on these annotations, served as the model in Human in the loop,
to support the annotations for teaching the final model. The final model was learned on web content; data was manually collected and annotated.
As a model base, the `sdadas/polish-roberta-large-v2` model was used with a classification head. More about model construction is on our [blog](https://radlab.dev/2025/06/01/polaryzacja-3c-model-z-plg-na-hf/).
### Architecture
```
RobertaForSequenceClassification(
(roberta): RobertaModel(
(embeddings): RobertaEmbeddings(
(word_embeddings): Embedding(128001, 1024, padding_idx=1)
(position_embeddings): Embedding(514, 1024, padding_idx=1)
(token_type_embeddings): Embedding(1, 1024)
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(encoder): RobertaEncoder(
(layer): ModuleList(
(0-23): 24 x RobertaLayer(
(attention): RobertaAttention(
(self): RobertaSdpaSelfAttention(
(query): Linear(in_features=1024, out_features=1024, bias=True)
(key): Linear(in_features=1024, out_features=1024, bias=True)
(value): Linear(in_features=1024, out_features=1024, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): RobertaSelfOutput(
(dense): Linear(in_features=1024, out_features=1024, bias=True)
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): RobertaIntermediate(
(dense): Linear(in_features=1024, out_features=4096, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): RobertaOutput(
(dense): Linear(in_features=4096, out_features=1024, bias=True)
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
)
(classifier): RobertaClassificationHead(
(dense): Linear(in_features=1024, out_features=1024, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
(out_proj): Linear(in_features=1024, out_features=3, bias=True)
)
)
```
### Usage
Example of use with transformers pipeline:
```[python]
from transformers import pipeline
classifier = pipeline(model="radlab/polarity-3c", task="text-classification")
classifier("Text to classification")
```
with sample data and `top_k=3`:
```[python]
classifier("""
Po upadku re偶imu Asada w Syrii, mieszka艅cy, borykaj膮cy si臋 z ub贸stwem,
zacz臋li t艂umnie poszukiwa膰 skarb贸w, zach臋ceni legendami o zakopanych
bogactwach i dost臋pno艣ci膮 wykrywaczy metali, kt贸re sta艂y si臋 popularnym
towarem. Mimo, 偶e dzia艂alno艣膰 ta jest nielegalna, rz膮d przymyka oko,
a sprzedawcy oferuj膮 urz膮dzenia nawet dla dzieci. Poszukiwacze skupiaj膮
si臋 na obszarach historycznych, wierz膮c w legendy o skarbach ukrytych
przez staro偶ytne cywilizacje i wojska osma艅skie, cho膰 eksperci ostrzegaj膮
przed fa艂szywymi monetami i kradzie偶膮 artefakt贸w z muze贸w.""",
top_k=3
)
```
the output is:
```
[{'label': 'ambivalent', 'score': 0.9995126724243164},
{'label': 'negative', 'score': 0.00024663121439516544},
{'label': 'positive', 'score': 0.00024063512682914734}]
```
|