File size: 2,599 Bytes
afdfab8 6888d7f afdfab8 6888d7f afdfab8 6888d7f afdfab8 6888d7f afdfab8 6888d7f afdfab8 6888d7f afdfab8 6888d7f afdfab8 6888d7f afdfab8 6888d7f afdfab8 6888d7f afdfab8 6888d7f afdfab8 6888d7f afdfab8 6888d7f afdfab8 6888d7f afdfab8 6888d7f afdfab8 6888d7f afdfab8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 |
---
language:
- multilingual
- af
- sq
- ar
- an
- hy
- ast
- az
- ba
- eu
- bar
- be
- bn
- inc
- bs
- br
- bg
- my
- ca
- ceb
- ce
- zh
- cv
- hr
- cs
- da
- nl
- en
- et
- fi
- fr
- gl
- ka
- de
- el
- gu
- ht
- he
- hi
- hu
- is
- io
- id
- ga
- it
- ja
- jv
- kn
- kk
- ky
- ko
- la
- lv
- lt
- roa
- nds
- lm
- mk
- mg
- ms
- ml
- mr
- mn
- min
- ne
- new
- nb
- nn
- oc
- fa
- pms
- pl
- pt
- pa
- ro
- ru
- sco
- sr
- hr
- scn
- sk
- sl
- aze
- es
- su
- sw
- sv
- tl
- tg
- th
- ta
- tt
- te
- tr
- uk
- ud
- uz
- vi
- vo
- war
- cy
- fry
- pnb
- yo
license: apache-2.0
metrics:
- accuracy
pipeline_tag: text-classification
---
## Username Classification Model 👤🔍
This is a machine learning model that can classify usernames into two categories: spam and non-spam. The model is based on the bert-base-multilingual-cased model. The input to the model is a string representing a username, and the output is a probability distribution over the two categories.
## Dataset 📊
The model was trained on a dataset of usernames that were manually labeled as spam or non-spam. The dataset contains approximately 50,000 usernames, with a roughly equal number of examples in each category.
## Performance 🏆
The model achieved an accuracy of 82% on the test set, and has been shown to generalize well to new data. However, as with any machine learning model, its performance may vary depending on the specific characteristics of the data.
## Usage 🚀
To use this model, you can load it from Hugging Face using the Transformers library. Here is an example of how to do this:
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("lokas/spam-usernames-classifier")
model = AutoModelForSequenceClassification.from_pretrained("lokas/spam-usernames-classifier")
# Example usernames
usernames = ["Yousef10166", "توفيق الشارني", "Eng.salman1", "Moulay nadjem ALLOUAOUI", "Mmaarwa111", "Abdouflih99", "loka"]
# Tokenize the usernames
inputs = tokenizer(usernames, return_tensors="pt", padding=True, truncation=True)
# Get the model's predictions
outputs = model(**inputs)
# The predictions are in the form of logits, so we need to apply the softmax function to convert them to probabilities
probs = outputs.logits.softmax(dim=-1)
# Print the probabilities
print(probs)
```
This example uses the dataset provided in the comment as an example. The usernames are classified as spam or non-spam.
## License 📝
This project is licensed under the MIT License. See the LICENSE file for more details.
|