|
--- |
|
language: en |
|
tags: |
|
- token-classification |
|
- named-entity-recognition |
|
- bert |
|
- transformers |
|
license: mit |
|
datasets: |
|
- conll2003 |
|
--- |
|
|
|
# Token Classification Model |
|
|
|
## Description |
|
This project involves developing a machine learning model for token classification, specifically for Named Entity Recognition (NER). Using a fine-tuned BERT model from the Hugging Face library, this system classifies tokens in text into predefined categories like names, locations, and dates. |
|
|
|
The model is trained on a dataset annotated with entity labels to accurately classify each token. This token classification system is useful for information extraction, document processing, and conversational AI applications. |
|
|
|
## Technologies Used |
|
|
|
### Dataset |
|
- **Source:** Kaggle: conll2003 |
|
- **Purpose:** Contains text data with annotated entities for token classification. |
|
|
|
### Model |
|
- **Base Model:** BERT (bert-base-uncased) |
|
- **Library:** Hugging Face transformers |
|
- **Task:** Token Classification (Named Entity Recognition) |
|
|
|
### Approach |
|
|
|
#### Preprocessing: |
|
- Load and preprocess the dataset. |
|
- Tokenize the text data and align labels with tokens. |
|
|
|
#### Fine-Tuning: |
|
- Fine-tune the BERT model on the token classification dataset. |
|
|
|
#### Training: |
|
- Train the model to classify each token into predefined entity labels. |
|
|
|
#### Inference: |
|
- Use the trained model to predict entity labels for new text inputs. |
|
|
|
### Key Technologies |
|
- **Deep Learning (BERT):** For advanced token classification and contextual understanding. |
|
- **Natural Language Processing (NLP):** For text preprocessing, tokenization, and entity recognition. |
|
- **Machine Learning Algorithms:** For model training and prediction tasks. |
|
|
|
## Streamlit App |
|
You can view and interact with the Streamlit app for token classification [here](https://huggingface.co/spaces/AdilHayat173/token_classifcation). |
|
|
|
## Examples |
|
Here are some examples of outputs from the model: |
|
|
|
 |
|
 |
|
|
|
## Google Colab Notebook |
|
You can view and run the Google Colab notebook for this project [here](https://colab.research.google.com/drive/1GYVlIToQ_lnT8XEjGrR2WFkUQWpWXgQi#scrollTo=ZlyX1Lgn8gjj). |
|
|
|
## Acknowledgements |
|
- Hugging Face for transformer models and libraries. |
|
- Streamlit for creating the interactive web interface. |
|
- [Your Dataset Provider] for the token classification dataset. |
|
|
|
## Author |
|
- AdilHayat |
|
- [Hugging Face Profile](https://huggingface.co/AdilHayat173) |
|
- [GitHub Profile](https://github.com/AdilHayat21173) |
|
|
|
## Feedback |
|
If you have any feedback, please reach out to us at hayatadil300@gmail.com. |
|
|