Spaces:

dev100rabh
/

FakeNews

Sleeping

App Files Files Community

FakeNews / README.md

100rabhsah

made changes to app.py and readme

df91fab 6 months ago

preview code

raw

history blame contribute delete

5.35 kB

A newer version of the Streamlit SDK is available: 1.52.1

Upgrade

metadata

title: Fake News Detection
emoji: 📰
colorFrom: blue
colorTo: indigo
sdk: streamlit
sdk_version: 1.31.1
app_file: app.py
pinned: false

Hybrid Fake News Detection Model

A hybrid deep learning model for fake news detection using BERT and BiLSTM with attention mechanism. This project was developed as part of the Data Mining Laboratory course under the guidance of Dr. Kirti Kumari.

Project Overview

This project implements a state-of-the-art fake news detection system that combines the power of BERT (Bidirectional Encoder Representations from Transformers) with BiLSTM (Bidirectional Long Short-Term Memory) and attention mechanisms. The model is designed to effectively identify fake news articles by analyzing their textual content and linguistic patterns.

Data and Model Files

The project uses the following datasets and model files:

Datasets

Raw and processed datasets are available at: Data Files
- Contains both raw and processed versions of the datasets
- Includes LIAR and Kaggle Fake News datasets
- Preprocessed versions ready for training

Model Files

Trained model checkpoints are available at: Model Files
- Contains saved model weights
- Includes best model checkpoints
- Model evaluation results

Project Structure

.
├── data/
│   ├── raw/           # Raw datasets
│   └── processed/     # Processed data
├── models/
│   ├── saved/        # Saved model checkpoints
│   └── checkpoints/  # Training checkpoints
├── src/
│   ├── config/       # Configuration files
│   ├── data/         # Data processing modules
│   ├── models/       # Model architecture
│   ├── utils/        # Utility functions
│   └── visualization/# Visualization modules
├── tests/            # Unit tests
├── notebooks/        # Jupyter notebooks
└── visualizations/   # Generated plots and graphs

Features

Hybrid architecture combining BERT and BiLSTM
Attention mechanism for better interpretability
Comprehensive text preprocessing pipeline
Support for multiple feature extraction methods
Early stopping and model checkpointing
Detailed evaluation metrics
Interactive visualizations of model performance
Support for multiple datasets (LIAR, Kaggle Fake News)

Installation

Clone the repository:

git clone https://github.com/yourusername/fake-news-detection.git
cd fake-news-detection

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Usage

Download the required files:
- Download datasets from the Data Files link
- Download pre-trained models from the Model Files link
- Place the files in their respective directories as shown in the project structure
Prepare your dataset:
- Place your dataset in the data/raw directory
- The dataset should have at least two columns: 'text' and 'label'
- Supported formats: CSV, TSV
Train the model:

python src/train.py

Model evaluation metrics and visualizations will be generated in the visualizations directory

Model Architecture

The model combines:

BERT for contextual embeddings
BiLSTM for sequence modeling
Attention mechanism for focusing on important parts
Classification head for final prediction

Key Components:

BERT Layer: Extracts contextual word embeddings
BiLSTM Layer: Captures sequential patterns
Attention Layer: Identifies important text segments
Classification Head: Makes final prediction

Configuration

Key parameters can be modified in src/config/config.py:

Model hyperparameters
Training parameters
Data processing settings
Feature extraction options

Performance Metrics

The model is evaluated using:

Accuracy
Precision
Recall
F1 Score
Confusion Matrix

Future Improvements

Add support for image/video metadata
Implement real-time detection
Add social graph analysis
Improve model interpretability
Add API endpoints for inference
Support for multilingual fake news detection
Integration with fact-checking databases

Acknowledgments

I would like to express our sincere gratitude to Dr. Kirti Kumari for her invaluable guidance and support throughout the development of this project. Her expertise in data mining and machine learning has been instrumental in shaping this work.

Special thanks to:

Open-source community for their excellent tools and libraries
Dataset providers (LIAR, Kaggle)

Contributing

Fork the repository
Create a feature branch
Commit your changes
Push to the branch
Create a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

For any queries or suggestions, please feel free to reach out to me.