Spaces:
Paused
Paused
title: Resumescreener V2 | |
emoji: ๐ | |
colorFrom: red | |
colorTo: red | |
sdk: docker | |
app_port: 8501 | |
tags: | |
- streamlit | |
pinned: false | |
short_description: Streamlit template space | |
# ๐ค AI Resume Screener | |
An advanced Streamlit application that automatically ranks candidate resumes against job descriptions using a sophisticated multi-stage AI pipeline. | |
## ๐ Features | |
### Multi-Stage AI Pipeline | |
1. **FAISS Recall**: Semantic similarity search using BGE embeddings (top 50 candidates) | |
2. **Cross-Encoder Reranking**: Deep semantic matching using MS-Marco model (top 20 candidates) | |
3. **BM25 Scoring**: Traditional keyword-based relevance scoring | |
4. **Intent Analysis**: AI-powered candidate interest assessment using Qwen LLM | |
5. **Final Ranking**: Weighted combination of all scores | |
### Advanced AI Models | |
- **Embedding Model**: BAAI/bge-large-en-v1.5 for semantic understanding | |
- **Cross-Encoder**: cross-encoder/ms-marco-MiniLM-L6-v2 for precise ranking | |
- **LLM**: Qwen2-1.5B with 4-bit quantization for intent analysis | |
### Multiple Input Methods | |
- **File Upload**: PDF, DOCX, TXT files | |
- **CSV Upload**: Bulk resume processing | |
- **Hugging Face Datasets**: Direct integration with HF datasets | |
### Comprehensive Analysis | |
- **Skills Extraction**: Technical skills and job-specific keywords | |
- **Score Breakdown**: Detailed analysis of each scoring component | |
- **Interactive Visualizations**: Charts and metrics for insights | |
- **Export Capabilities**: Download results as CSV | |
## ๐ Requirements | |
### System Requirements | |
- Python 3.8+ | |
- CUDA-compatible GPU (recommended for optimal performance) | |
- 8GB+ RAM (16GB+ recommended) | |
- 10GB+ disk space for models | |
### Dependencies | |
All dependencies are listed in `requirements.txt`: | |
- streamlit | |
- sentence-transformers | |
- transformers | |
- torch | |
- faiss-cpu | |
- rank-bm25 | |
- nltk | |
- pdfplumber | |
- PyPDF2 | |
- python-docx | |
- datasets | |
- plotly | |
- pandas | |
- numpy | |
## ๐ ๏ธ Installation | |
1. **Clone the repository**: | |
```bash | |
git clone <repository-url> | |
cd resumescreener_v2 | |
``` | |
2. **Install dependencies**: | |
```bash | |
pip install -r requirements.txt | |
``` | |
3. **Run the application**: | |
```bash | |
streamlit run src/streamlit_app.py | |
``` | |
## ๐ Usage Guide | |
### Step 1: Model Loading | |
- Models are automatically loaded when the app starts | |
- First run may take 5-10 minutes to download models | |
- Check the sidebar for model loading status | |
### Step 2: Job Description | |
- Enter the complete job description in the text area | |
- Include requirements, responsibilities, and desired skills | |
- More detailed descriptions yield better matching results | |
### Step 3: Load Resumes | |
Choose from three options: | |
#### Option A: File Upload | |
- Upload PDF, DOCX, or TXT files | |
- Supports multiple file selection | |
- Automatic text extraction | |
#### Option B: CSV Upload | |
- Upload CSV with resume texts | |
- Select text and name columns | |
- Bulk processing capability | |
#### Option C: Hugging Face Dataset | |
- Load from public datasets | |
- Specify dataset name and columns | |
- Limited to 100 resumes for performance | |
### Step 4: Run Pipeline | |
- Click "Run Advanced Ranking Pipeline" | |
- Monitor progress through 5 stages | |
- Results appear in three tabs | |
### Step 5: Analyze Results | |
#### Summary Tab | |
- Top-ranked candidates table | |
- Key metrics and scores | |
- CSV download option | |
#### Detailed Analysis Tab | |
- Individual candidate breakdowns | |
- Score components explanation | |
- Skills and keywords analysis | |
- Resume excerpts | |
#### Visualizations Tab | |
- Score distribution charts | |
- Comparative analysis | |
- Intent distribution | |
- Average metrics | |
## ๐งฎ Scoring Formula | |
**Final Score = 0.5 ร Cross-Encoder + 0.3 ร BM25 + 0.2 ร Intent** | |
### Score Components | |
1. **Cross-Encoder Score (50%)** | |
- Deep semantic matching between job and resume | |
- Considers context and meaning | |
- Range: 0-1 (normalized) | |
2. **BM25 Score (30%)** | |
- Traditional keyword-based relevance | |
- Term frequency and document frequency | |
- Range: 0-1 (normalized) | |
3. **Intent Score (20%)** | |
- AI-assessed candidate interest level | |
- Based on experience-job alignment | |
- Categories: Yes (0.9), Maybe (0.5), No (0.1) | |
## ๐ฏ Best Practices | |
### For Optimal Results | |
1. **Detailed Job Descriptions**: Include specific requirements, technologies, and responsibilities | |
2. **Quality Resume Data**: Ensure resumes contain relevant information | |
3. **Appropriate Batch Size**: Process 20-100 resumes for best performance | |
4. **Clear Requirements**: Specify must-have vs. nice-to-have skills | |
### Performance Tips | |
1. **GPU Usage**: Enable CUDA for faster processing | |
2. **Memory Management**: Use cleanup controls for large batches | |
3. **Model Caching**: Models are cached after first load | |
4. **Batch Processing**: Process resumes in smaller batches if memory limited | |
## ๐ง Configuration | |
### Model Configuration | |
Models can be customized by modifying the `load_models()` function: | |
- Change model names for different embeddings | |
- Adjust quantization settings | |
- Modify device mapping | |
### Scoring Weights | |
Adjust weights in `calculate_final_scores()`: | |
```python | |
final_scores = 0.5 * ce_scores + 0.3 * bm25_scores + 0.2 * intent_scores | |
``` | |
### Skills List | |
Customize the predefined skills list in the `ResumeScreener` class: | |
```python | |
self.skills_list = [ | |
'python', 'java', 'javascript', | |
# Add your specific skills | |
] | |
``` | |
## ๐ Troubleshooting | |
### Common Issues | |
1. **Model Loading Errors** | |
- Check internet connection for model downloads | |
- Ensure sufficient disk space | |
- Verify CUDA compatibility | |
2. **Memory Issues** | |
- Reduce batch size | |
- Use CPU-only mode | |
- Clear cache between runs | |
3. **File Processing Errors** | |
- Check file formats (PDF, DOCX, TXT) | |
- Ensure files are not corrupted | |
- Verify text extraction quality | |
4. **Performance Issues** | |
- Enable GPU acceleration | |
- Process smaller batches | |
- Use model quantization | |
### Error Messages | |
- **"Models not loaded"**: Wait for model loading to complete | |
- **"ML libraries not available"**: Install missing dependencies | |
- **"CUDA out of memory"**: Reduce batch size or use CPU | |
## ๐ Sample Data | |
Use the included `sample_resumes.csv` for testing: | |
- 5 sample resumes with different roles | |
- Realistic job experience and skills | |
- Good for testing all features | |
## ๐ค Contributing | |
1. Fork the repository | |
2. Create a feature branch | |
3. Make your changes | |
4. Add tests if applicable | |
5. Submit a pull request | |
## ๐ License | |
This project is licensed under the MIT License - see the LICENSE file for details. | |
## ๐ Acknowledgments | |
- **BAAI** for the BGE embedding model | |
- **Microsoft** for the MS-Marco cross-encoder | |
- **Alibaba** for the Qwen language model | |
- **Streamlit** for the web framework | |
- **Hugging Face** for model hosting and transformers library | |
## ๐ Support | |
For issues and questions: | |
1. Check the troubleshooting section | |
2. Review error messages in the sidebar | |
3. Open an issue on GitHub | |
4. Check model compatibility | |
--- | |
**Built with โค๏ธ using Streamlit and state-of-the-art AI models** | |