resumescreener_v2 / README.md
root
ss
26e8660
---
title: Resumescreener V2
emoji: ๐Ÿš€
colorFrom: red
colorTo: red
sdk: docker
app_port: 8501
tags:
- streamlit
pinned: false
short_description: Streamlit template space
---
# ๐Ÿค– AI Resume Screener
An advanced Streamlit application that automatically ranks candidate resumes against job descriptions using a sophisticated multi-stage AI pipeline.
## ๐Ÿš€ Features
### Multi-Stage AI Pipeline
1. **FAISS Recall**: Semantic similarity search using BGE embeddings (top 50 candidates)
2. **Cross-Encoder Reranking**: Deep semantic matching using MS-Marco model (top 20 candidates)
3. **BM25 Scoring**: Traditional keyword-based relevance scoring
4. **Intent Analysis**: AI-powered candidate interest assessment using Qwen LLM
5. **Final Ranking**: Weighted combination of all scores
### Advanced AI Models
- **Embedding Model**: BAAI/bge-large-en-v1.5 for semantic understanding
- **Cross-Encoder**: cross-encoder/ms-marco-MiniLM-L6-v2 for precise ranking
- **LLM**: Qwen2-1.5B with 4-bit quantization for intent analysis
### Multiple Input Methods
- **File Upload**: PDF, DOCX, TXT files
- **CSV Upload**: Bulk resume processing
- **Hugging Face Datasets**: Direct integration with HF datasets
### Comprehensive Analysis
- **Skills Extraction**: Technical skills and job-specific keywords
- **Score Breakdown**: Detailed analysis of each scoring component
- **Interactive Visualizations**: Charts and metrics for insights
- **Export Capabilities**: Download results as CSV
## ๐Ÿ“‹ Requirements
### System Requirements
- Python 3.8+
- CUDA-compatible GPU (recommended for optimal performance)
- 8GB+ RAM (16GB+ recommended)
- 10GB+ disk space for models
### Dependencies
All dependencies are listed in `requirements.txt`:
- streamlit
- sentence-transformers
- transformers
- torch
- faiss-cpu
- rank-bm25
- nltk
- pdfplumber
- PyPDF2
- python-docx
- datasets
- plotly
- pandas
- numpy
## ๐Ÿ› ๏ธ Installation
1. **Clone the repository**:
```bash
git clone <repository-url>
cd resumescreener_v2
```
2. **Install dependencies**:
```bash
pip install -r requirements.txt
```
3. **Run the application**:
```bash
streamlit run src/streamlit_app.py
```
## ๐Ÿ“– Usage Guide
### Step 1: Model Loading
- Models are automatically loaded when the app starts
- First run may take 5-10 minutes to download models
- Check the sidebar for model loading status
### Step 2: Job Description
- Enter the complete job description in the text area
- Include requirements, responsibilities, and desired skills
- More detailed descriptions yield better matching results
### Step 3: Load Resumes
Choose from three options:
#### Option A: File Upload
- Upload PDF, DOCX, or TXT files
- Supports multiple file selection
- Automatic text extraction
#### Option B: CSV Upload
- Upload CSV with resume texts
- Select text and name columns
- Bulk processing capability
#### Option C: Hugging Face Dataset
- Load from public datasets
- Specify dataset name and columns
- Limited to 100 resumes for performance
### Step 4: Run Pipeline
- Click "Run Advanced Ranking Pipeline"
- Monitor progress through 5 stages
- Results appear in three tabs
### Step 5: Analyze Results
#### Summary Tab
- Top-ranked candidates table
- Key metrics and scores
- CSV download option
#### Detailed Analysis Tab
- Individual candidate breakdowns
- Score components explanation
- Skills and keywords analysis
- Resume excerpts
#### Visualizations Tab
- Score distribution charts
- Comparative analysis
- Intent distribution
- Average metrics
## ๐Ÿงฎ Scoring Formula
**Final Score = 0.5 ร— Cross-Encoder + 0.3 ร— BM25 + 0.2 ร— Intent**
### Score Components
1. **Cross-Encoder Score (50%)**
- Deep semantic matching between job and resume
- Considers context and meaning
- Range: 0-1 (normalized)
2. **BM25 Score (30%)**
- Traditional keyword-based relevance
- Term frequency and document frequency
- Range: 0-1 (normalized)
3. **Intent Score (20%)**
- AI-assessed candidate interest level
- Based on experience-job alignment
- Categories: Yes (0.9), Maybe (0.5), No (0.1)
## ๐ŸŽฏ Best Practices
### For Optimal Results
1. **Detailed Job Descriptions**: Include specific requirements, technologies, and responsibilities
2. **Quality Resume Data**: Ensure resumes contain relevant information
3. **Appropriate Batch Size**: Process 20-100 resumes for best performance
4. **Clear Requirements**: Specify must-have vs. nice-to-have skills
### Performance Tips
1. **GPU Usage**: Enable CUDA for faster processing
2. **Memory Management**: Use cleanup controls for large batches
3. **Model Caching**: Models are cached after first load
4. **Batch Processing**: Process resumes in smaller batches if memory limited
## ๐Ÿ”ง Configuration
### Model Configuration
Models can be customized by modifying the `load_models()` function:
- Change model names for different embeddings
- Adjust quantization settings
- Modify device mapping
### Scoring Weights
Adjust weights in `calculate_final_scores()`:
```python
final_scores = 0.5 * ce_scores + 0.3 * bm25_scores + 0.2 * intent_scores
```
### Skills List
Customize the predefined skills list in the `ResumeScreener` class:
```python
self.skills_list = [
'python', 'java', 'javascript',
# Add your specific skills
]
```
## ๐Ÿ› Troubleshooting
### Common Issues
1. **Model Loading Errors**
- Check internet connection for model downloads
- Ensure sufficient disk space
- Verify CUDA compatibility
2. **Memory Issues**
- Reduce batch size
- Use CPU-only mode
- Clear cache between runs
3. **File Processing Errors**
- Check file formats (PDF, DOCX, TXT)
- Ensure files are not corrupted
- Verify text extraction quality
4. **Performance Issues**
- Enable GPU acceleration
- Process smaller batches
- Use model quantization
### Error Messages
- **"Models not loaded"**: Wait for model loading to complete
- **"ML libraries not available"**: Install missing dependencies
- **"CUDA out of memory"**: Reduce batch size or use CPU
## ๐Ÿ“Š Sample Data
Use the included `sample_resumes.csv` for testing:
- 5 sample resumes with different roles
- Realistic job experience and skills
- Good for testing all features
## ๐Ÿค Contributing
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests if applicable
5. Submit a pull request
## ๐Ÿ“„ License
This project is licensed under the MIT License - see the LICENSE file for details.
## ๐Ÿ™ Acknowledgments
- **BAAI** for the BGE embedding model
- **Microsoft** for the MS-Marco cross-encoder
- **Alibaba** for the Qwen language model
- **Streamlit** for the web framework
- **Hugging Face** for model hosting and transformers library
## ๐Ÿ“ž Support
For issues and questions:
1. Check the troubleshooting section
2. Review error messages in the sidebar
3. Open an issue on GitHub
4. Check model compatibility
---
**Built with โค๏ธ using Streamlit and state-of-the-art AI models**