Spaces:

jacob-c
/

resumescreener_v2

Paused

resumescreener_v2 / README.md

root

26e8660 3 months ago

6.96 kB

	---
	title: Resumescreener V2
	emoji: 🚀
	colorFrom: red
	colorTo: red
	sdk: docker
	app_port: 8501
	tags:
	- streamlit
	pinned: false
	short_description: Streamlit template space
	---

	# 🤖 AI Resume Screener

	An advanced Streamlit application that automatically ranks candidate resumes against job descriptions using a sophisticated multi-stage AI pipeline.

	## 🚀 Features

	### Multi-Stage AI Pipeline
	1. FAISS Recall: Semantic similarity search using BGE embeddings (top 50 candidates)
	2. Cross-Encoder Reranking: Deep semantic matching using MS-Marco model (top 20 candidates)
	3. BM25 Scoring: Traditional keyword-based relevance scoring
	4. Intent Analysis: AI-powered candidate interest assessment using Qwen LLM
	5. Final Ranking: Weighted combination of all scores

	### Advanced AI Models
	- Embedding Model: BAAI/bge-large-en-v1.5 for semantic understanding
	- Cross-Encoder: cross-encoder/ms-marco-MiniLM-L6-v2 for precise ranking
	- LLM: Qwen2-1.5B with 4-bit quantization for intent analysis

	### Multiple Input Methods
	- File Upload: PDF, DOCX, TXT files
	- CSV Upload: Bulk resume processing
	- Hugging Face Datasets: Direct integration with HF datasets

	### Comprehensive Analysis
	- Skills Extraction: Technical skills and job-specific keywords
	- Score Breakdown: Detailed analysis of each scoring component
	- Interactive Visualizations: Charts and metrics for insights
	- Export Capabilities: Download results as CSV

	## 📋 Requirements

	### System Requirements
	- Python 3.8+
	- CUDA-compatible GPU (recommended for optimal performance)
	- 8GB+ RAM (16GB+ recommended)
	- 10GB+ disk space for models

	### Dependencies
	All dependencies are listed in `requirements.txt`:
	- streamlit
	- sentence-transformers
	- transformers
	- torch
	- faiss-cpu
	- rank-bm25
	- nltk
	- pdfplumber
	- PyPDF2
	- python-docx
	- datasets
	- plotly
	- pandas
	- numpy

	## 🛠️ Installation

	1. Clone the repository:
	```bash
	git clone <repository-url>
	cd resumescreener_v2
	```

	2. Install dependencies:
	```bash
	pip install -r requirements.txt
	```

	3. Run the application:
	```bash
	streamlit run src/streamlit_app.py
	```

	## 📖 Usage Guide

	### Step 1: Model Loading
	- Models are automatically loaded when the app starts
	- First run may take 5-10 minutes to download models
	- Check the sidebar for model loading status

	### Step 2: Job Description
	- Enter the complete job description in the text area
	- Include requirements, responsibilities, and desired skills
	- More detailed descriptions yield better matching results

	### Step 3: Load Resumes
	Choose from three options:

	#### Option A: File Upload
	- Upload PDF, DOCX, or TXT files
	- Supports multiple file selection
	- Automatic text extraction

	#### Option B: CSV Upload
	- Upload CSV with resume texts
	- Select text and name columns
	- Bulk processing capability

	#### Option C: Hugging Face Dataset
	- Load from public datasets
	- Specify dataset name and columns
	- Limited to 100 resumes for performance

	### Step 4: Run Pipeline
	- Click "Run Advanced Ranking Pipeline"
	- Monitor progress through 5 stages
	- Results appear in three tabs

	### Step 5: Analyze Results

	#### Summary Tab
	- Top-ranked candidates table
	- Key metrics and scores
	- CSV download option

	#### Detailed Analysis Tab
	- Individual candidate breakdowns
	- Score components explanation
	- Skills and keywords analysis
	- Resume excerpts

	#### Visualizations Tab
	- Score distribution charts
	- Comparative analysis
	- Intent distribution
	- Average metrics

	## 🧮 Scoring Formula

	Final Score = 0.5 × Cross-Encoder + 0.3 × BM25 + 0.2 × Intent

	### Score Components

	1. Cross-Encoder Score (50%)
	- Deep semantic matching between job and resume
	- Considers context and meaning
	- Range: 0-1 (normalized)

	2. BM25 Score (30%)
	- Traditional keyword-based relevance
	- Term frequency and document frequency
	- Range: 0-1 (normalized)

	3. Intent Score (20%)
	- AI-assessed candidate interest level
	- Based on experience-job alignment
	- Categories: Yes (0.9), Maybe (0.5), No (0.1)

	## 🎯 Best Practices

	### For Optimal Results
	1. Detailed Job Descriptions: Include specific requirements, technologies, and responsibilities
	2. Quality Resume Data: Ensure resumes contain relevant information
	3. Appropriate Batch Size: Process 20-100 resumes for best performance
	4. Clear Requirements: Specify must-have vs. nice-to-have skills

	### Performance Tips
	1. GPU Usage: Enable CUDA for faster processing
	2. Memory Management: Use cleanup controls for large batches
	3. Model Caching: Models are cached after first load
	4. Batch Processing: Process resumes in smaller batches if memory limited

	## 🔧 Configuration

	### Model Configuration
	Models can be customized by modifying the `load_models()` function:
	- Change model names for different embeddings
	- Adjust quantization settings
	- Modify device mapping

	### Scoring Weights
	Adjust weights in `calculate_final_scores()`:
	```python
	final_scores = 0.5 * ce_scores + 0.3 * bm25_scores + 0.2 * intent_scores
	```

	### Skills List
	Customize the predefined skills list in the `ResumeScreener` class:
	```python
	self.skills_list = [
	'python', 'java', 'javascript',
	# Add your specific skills
	]
	```

	## 🐛 Troubleshooting

	### Common Issues

	1. Model Loading Errors
	- Check internet connection for model downloads
	- Ensure sufficient disk space
	- Verify CUDA compatibility

	2. Memory Issues
	- Reduce batch size
	- Use CPU-only mode
	- Clear cache between runs

	3. File Processing Errors
	- Check file formats (PDF, DOCX, TXT)
	- Ensure files are not corrupted
	- Verify text extraction quality

	4. Performance Issues
	- Enable GPU acceleration
	- Process smaller batches
	- Use model quantization

	### Error Messages
	- "Models not loaded": Wait for model loading to complete
	- "ML libraries not available": Install missing dependencies
	- "CUDA out of memory": Reduce batch size or use CPU

	## 📊 Sample Data

	Use the included `sample_resumes.csv` for testing:
	- 5 sample resumes with different roles
	- Realistic job experience and skills
	- Good for testing all features

	## 🤝 Contributing

	1. Fork the repository
	2. Create a feature branch
	3. Make your changes
	4. Add tests if applicable
	5. Submit a pull request

	## 📄 License

	This project is licensed under the MIT License - see the LICENSE file for details.

	## 🙏 Acknowledgments

	- BAAI for the BGE embedding model
	- Microsoft for the MS-Marco cross-encoder
	- Alibaba for the Qwen language model
	- Streamlit for the web framework
	- Hugging Face for model hosting and transformers library

	## 📞 Support

	For issues and questions:
	1. Check the troubleshooting section
	2. Review error messages in the sidebar
	3. Open an issue on GitHub
	4. Check model compatibility

	---

	Built with ❤️ using Streamlit and state-of-the-art AI models