polarity-3c / README.md

Update README.md

91d3d1a verified 3 months ago

3.99 kB

	---
	license: apache-2.0
	language:
	- pl
	base_model:
	- sdadas/polish-roberta-large-v2
	pipeline_tag: text-classification
	library_name: transformers
	tags:
	- news
	---


	### Description
	`polarity3c` is a classification model that is specialized for determining the polarity of texts from news portals. It was learned mostly on texts in Polish.

	<center><img src="https://cdn-uploads.huggingface.co/production/uploads/644addfe9279988e0cbc296b/v6pz2sBwc3GCPL1Il8wVP.png" width=20%></center>

	Annotations from the plWordnet were used as the basis for the data. A pre-learned model on these annotations, served as the model in Human in the loop,
	to support the annotations for teaching the final model. The final model was learned on web content; data was manually collected and annotated.

	As a model base, the `sdadas/polish-roberta-large-v2` model was used with a classification head. More about model construction is on our [blog](https://radlab.dev/2025/06/01/polaryzacja-3c-model-z-plg-na-hf/).

	### Architecture
	```
	RobertaForSequenceClassification(
	(roberta): RobertaModel(
	(embeddings): RobertaEmbeddings(
	(word_embeddings): Embedding(128001, 1024, padding_idx=1)
	(position_embeddings): Embedding(514, 1024, padding_idx=1)
	(token_type_embeddings): Embedding(1, 1024)
	(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
	(dropout): Dropout(p=0.1, inplace=False)
	)
	(encoder): RobertaEncoder(
	(layer): ModuleList(
	(0-23): 24 x RobertaLayer(
	(attention): RobertaAttention(
	(self): RobertaSdpaSelfAttention(
	(query): Linear(in_features=1024, out_features=1024, bias=True)
	(key): Linear(in_features=1024, out_features=1024, bias=True)
	(value): Linear(in_features=1024, out_features=1024, bias=True)
	(dropout): Dropout(p=0.1, inplace=False)
	)
	(output): RobertaSelfOutput(
	(dense): Linear(in_features=1024, out_features=1024, bias=True)
	(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
	(dropout): Dropout(p=0.1, inplace=False)
	)
	)
	(intermediate): RobertaIntermediate(
	(dense): Linear(in_features=1024, out_features=4096, bias=True)
	(intermediate_act_fn): GELUActivation()
	)
	(output): RobertaOutput(
	(dense): Linear(in_features=4096, out_features=1024, bias=True)
	(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
	(dropout): Dropout(p=0.1, inplace=False)
	)
	)
	)
	)
	)
	(classifier): RobertaClassificationHead(
	(dense): Linear(in_features=1024, out_features=1024, bias=True)
	(dropout): Dropout(p=0.1, inplace=False)
	(out_proj): Linear(in_features=1024, out_features=3, bias=True)
	)
	)
	```

	### Usage
	Example of use with transformers pipeline:
	```[python]
	from transformers import pipeline

	classifier = pipeline(model="radlab/polarity-3c", task="text-classification")

	classifier("Text to classification")
	```

	with sample data and `top_k=3`:
	```[python]
	classifier("""
	Po upadku reżimu Asada w Syrii, mieszkańcy, borykający się z ubóstwem,
	zaczęli tłumnie poszukiwać skarbów, zachęceni legendami o zakopanych
	bogactwach i dostępnością wykrywaczy metali, które stały się popularnym
	towarem. Mimo, że działalność ta jest nielegalna, rząd przymyka oko,
	a sprzedawcy oferują urządzenia nawet dla dzieci. Poszukiwacze skupiają
	się na obszarach historycznych, wierząc w legendy o skarbach ukrytych
	przez starożytne cywilizacje i wojska osmańskie, choć eksperci ostrzegają
	przed fałszywymi monetami i kradzieżą artefaktów z muzeów.""",
	top_k=3
	)
	```
	the output is:
	```
	[{'label': 'ambivalent', 'score': 0.9995126724243164},
	{'label': 'negative', 'score': 0.00024663121439516544},
	{'label': 'positive', 'score': 0.00024063512682914734}]
	```