|
--- |
|
title: AI-driven Candidate Matcher |
|
emoji: ๐ฏ |
|
colorFrom: blue |
|
colorTo: green |
|
sdk: streamlit |
|
sdk_version: 1.31.0 |
|
app_file: app.py |
|
pinned: false |
|
license: mit |
|
--- |
|
|
|
# AI-driven Candidate Matcher |
|
|
|
An advanced AI-powered resume screening application that uses a sophisticated 5-stage pipeline to efficiently match resumes against job descriptions. Built with state-of-the-art machine learning models for accurate candidate ranking. |
|
|
|
## ๐ Features |
|
|
|
- **5-Stage Advanced Pipeline**: Multi-layered approach combining semantic similarity, keyword matching, and AI intent analysis |
|
- **State-of-the-Art Models**: Uses BAAI/bge-large-en-v1.5 embeddings and Cross-Encoder re-ranking |
|
- **FAISS Integration**: Lightning-fast similarity search for large resume collections |
|
- **AI Intent Analysis**: Qwen3-1.7B model analyzes candidate job-seeking intent |
|
- **Multi-format Support**: Processes PDFs, DOCX, TXT, and CSV files |
|
- **Interactive Visualizations**: Comprehensive score breakdowns and comparative analysis |
|
- **Batch Processing**: Upload and analyze multiple resumes simultaneously |
|
- **Export Results**: Download detailed analysis as CSV |
|
|
|
## ๐ง How It Works |
|
|
|
### 5-Stage Advanced Pipeline |
|
|
|
1. **FAISS Recall (Top 50)**: Initial semantic similarity search using BAAI/bge-large-en-v1.5 embeddings |
|
2. **Cross-Encoder Re-ranking (Top 20)**: Deep semantic relevance scoring with ms-marco-MiniLM-L6-v2 |
|
3. **BM25 Keyword Matching**: Traditional keyword-based scoring for skill alignment |
|
4. **LLM Intent Analysis**: Qwen3-1.7B analyzes candidate suitability and job-seeking intent |
|
5. **Combined Scoring**: Weighted combination of all scores for final ranking |
|
|
|
### Scoring Formula |
|
**Final Score = Cross-Encoder (0-0.7) + BM25 (0.1-0.2) + Intent (0-0.1)** |
|
|
|
### Input & Output |
|
- **Input**: Job description + Resume files (PDF/DOCX/TXT/CSV) |
|
- **Output**: Ranked candidates with detailed score breakdowns and AI explanations |
|
|
|
## ๐ค Technical Details |
|
|
|
### Models Used |
|
- **BAAI/bge-large-en-v1.5**: Advanced embedding model for semantic similarity |
|
- **Cross-Encoder/ms-marco-MiniLM-L6-v2**: Deep re-ranking for relevance scoring |
|
- **Qwen3-1.7B**: Large language model for intent analysis and explanations |
|
|
|
### Key Libraries |
|
- **FAISS**: Facebook AI Similarity Search for efficient vector operations |
|
- **Sentence Transformers**: For embedding generation and cross-encoding |
|
- **rank_bm25**: BM25 algorithm implementation for keyword matching |
|
- **Streamlit**: Interactive web interface |
|
- **PyTorch**: Deep learning framework |
|
|
|
## ๐ Configuration Options |
|
|
|
The sidebar provides several customization options: |
|
- **Results Count**: Choose how many top candidates to display (1-5) |
|
- **Pipeline Visualization**: Real-time progress through the 5-stage pipeline |
|
- **Score Breakdown**: Detailed view of individual scoring components |
|
|
|
## ๐ Getting Started |
|
|
|
### Online Usage |
|
1. Visit the application |
|
2. Enter a comprehensive job description |
|
3. Upload resume files or CSV dataset |
|
4. Click "Advanced Pipeline Analysis" |
|
5. Review ranked candidates with detailed insights |
|
|
|
### Local Installation |
|
|
|
```bash |
|
git clone <repository-url> |
|
cd Resume_Screener_and_Skill_Extractor |
|
pip install -r requirements.txt |
|
streamlit run app.py |
|
``` |
|
|
|
### Requirements |
|
- Python 3.8+ |
|
- CUDA-compatible GPU (optional, for faster processing) |
|
- Minimum 8GB RAM recommended |
|
|
|
## ๐ Supported File Formats |
|
|
|
- **PDF**: Extracted using pdfplumber with PyPDF2 fallback |
|
- **DOCX**: Microsoft Word documents |
|
- **TXT**: Plain text files |
|
- **CSV**: Structured datasets with resume text columns |
|
|
|
## ๐ Privacy & Security |
|
|
|
### Data Privacy Statement |
|
|
|
**Your privacy is our top priority. We are committed to protecting the confidentiality of all resume data processed through this application.** |
|
|
|
#### Data Handling |
|
- **No Data Storage**: Resume content is processed in memory only and never stored permanently |
|
- **Session-Based**: All data is cleared when you close the browser or reset the application |
|
- **Local Processing**: All AI analysis happens locally within the application environment |
|
- **No External Transmission**: Resume data is never sent to external services or third parties |
|
|
|
#### Security Measures |
|
- **Temporary Files**: Uploaded files are processed in secure temporary locations and immediately deleted |
|
- **Memory Management**: Automatic cleanup of resume data from system memory |
|
- **No Logging**: Resume content is never logged or cached |
|
- **Secure Processing**: All text extraction and analysis occurs within isolated processing environments |
|
|
|
#### User Control |
|
- **Clear Data Options**: Multiple options to clear resume data and free memory |
|
- **Session Management**: Complete control over when and how data is processed |
|
- **Transparent Processing**: Full visibility into what data is being analyzed |
|
|
|
**We recommend reviewing your organization's data handling policies before uploading sensitive resume information.** |
|
|
|
## ๐ Performance Metrics |
|
|
|
- **Accuracy**: Advanced multi-stage pipeline ensures high-quality candidate ranking |
|
- **Speed**: FAISS indexing enables sub-second search across thousands of resumes |
|
- **Scalability**: Efficient memory management for large resume datasets |
|
- **Reliability**: Fallback models ensure consistent operation |
|
|
|
## ๐ฎ Future Enhancements |
|
|
|
- **Multi-language Support**: Extend to non-English resumes and job descriptions |
|
- **Custom Scoring Weights**: User-configurable importance of different scoring components |
|
- **Advanced Skill Extraction**: Enhanced NLP for technical skill identification |
|
- **Integration APIs**: Connect with ATS and HR management systems |
|
- **Batch Job Processing**: Queue-based processing for large-scale screening |
|
|
|
## ๐ License |
|
|
|
MIT License - See LICENSE file for details |
|
|
|
## ๐ค Contributing |
|
|
|
Contributions are welcome! Please feel free to submit pull requests or open issues for bugs and feature requests. |
|
|
|
--- |
|
|
|
*Built with โค๏ธ using Streamlit, Transformers, and FAISS* |
|
|
|
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference |
|
|