root
ss
6cea573
---
title: AI-driven Candidate Matcher
emoji: ๐ŸŽฏ
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.31.0
app_file: app.py
pinned: false
license: mit
---
# AI-driven Candidate Matcher
An advanced AI-powered resume screening application that uses a sophisticated 5-stage pipeline to efficiently match resumes against job descriptions. Built with state-of-the-art machine learning models for accurate candidate ranking.
## ๐Ÿš€ Features
- **5-Stage Advanced Pipeline**: Multi-layered approach combining semantic similarity, keyword matching, and AI intent analysis
- **State-of-the-Art Models**: Uses BAAI/bge-large-en-v1.5 embeddings and Cross-Encoder re-ranking
- **FAISS Integration**: Lightning-fast similarity search for large resume collections
- **AI Intent Analysis**: Qwen3-1.7B model analyzes candidate job-seeking intent
- **Multi-format Support**: Processes PDFs, DOCX, TXT, and CSV files
- **Interactive Visualizations**: Comprehensive score breakdowns and comparative analysis
- **Batch Processing**: Upload and analyze multiple resumes simultaneously
- **Export Results**: Download detailed analysis as CSV
## ๐Ÿ”ง How It Works
### 5-Stage Advanced Pipeline
1. **FAISS Recall (Top 50)**: Initial semantic similarity search using BAAI/bge-large-en-v1.5 embeddings
2. **Cross-Encoder Re-ranking (Top 20)**: Deep semantic relevance scoring with ms-marco-MiniLM-L6-v2
3. **BM25 Keyword Matching**: Traditional keyword-based scoring for skill alignment
4. **LLM Intent Analysis**: Qwen3-1.7B analyzes candidate suitability and job-seeking intent
5. **Combined Scoring**: Weighted combination of all scores for final ranking
### Scoring Formula
**Final Score = Cross-Encoder (0-0.7) + BM25 (0.1-0.2) + Intent (0-0.1)**
### Input & Output
- **Input**: Job description + Resume files (PDF/DOCX/TXT/CSV)
- **Output**: Ranked candidates with detailed score breakdowns and AI explanations
## ๐Ÿค– Technical Details
### Models Used
- **BAAI/bge-large-en-v1.5**: Advanced embedding model for semantic similarity
- **Cross-Encoder/ms-marco-MiniLM-L6-v2**: Deep re-ranking for relevance scoring
- **Qwen3-1.7B**: Large language model for intent analysis and explanations
### Key Libraries
- **FAISS**: Facebook AI Similarity Search for efficient vector operations
- **Sentence Transformers**: For embedding generation and cross-encoding
- **rank_bm25**: BM25 algorithm implementation for keyword matching
- **Streamlit**: Interactive web interface
- **PyTorch**: Deep learning framework
## ๐Ÿ“Š Configuration Options
The sidebar provides several customization options:
- **Results Count**: Choose how many top candidates to display (1-5)
- **Pipeline Visualization**: Real-time progress through the 5-stage pipeline
- **Score Breakdown**: Detailed view of individual scoring components
## ๐Ÿš€ Getting Started
### Online Usage
1. Visit the application
2. Enter a comprehensive job description
3. Upload resume files or CSV dataset
4. Click "Advanced Pipeline Analysis"
5. Review ranked candidates with detailed insights
### Local Installation
```bash
git clone <repository-url>
cd Resume_Screener_and_Skill_Extractor
pip install -r requirements.txt
streamlit run app.py
```
### Requirements
- Python 3.8+
- CUDA-compatible GPU (optional, for faster processing)
- Minimum 8GB RAM recommended
## ๐Ÿ“‹ Supported File Formats
- **PDF**: Extracted using pdfplumber with PyPDF2 fallback
- **DOCX**: Microsoft Word documents
- **TXT**: Plain text files
- **CSV**: Structured datasets with resume text columns
## ๐Ÿ”’ Privacy & Security
### Data Privacy Statement
**Your privacy is our top priority. We are committed to protecting the confidentiality of all resume data processed through this application.**
#### Data Handling
- **No Data Storage**: Resume content is processed in memory only and never stored permanently
- **Session-Based**: All data is cleared when you close the browser or reset the application
- **Local Processing**: All AI analysis happens locally within the application environment
- **No External Transmission**: Resume data is never sent to external services or third parties
#### Security Measures
- **Temporary Files**: Uploaded files are processed in secure temporary locations and immediately deleted
- **Memory Management**: Automatic cleanup of resume data from system memory
- **No Logging**: Resume content is never logged or cached
- **Secure Processing**: All text extraction and analysis occurs within isolated processing environments
#### User Control
- **Clear Data Options**: Multiple options to clear resume data and free memory
- **Session Management**: Complete control over when and how data is processed
- **Transparent Processing**: Full visibility into what data is being analyzed
**We recommend reviewing your organization's data handling policies before uploading sensitive resume information.**
## ๐Ÿ“ˆ Performance Metrics
- **Accuracy**: Advanced multi-stage pipeline ensures high-quality candidate ranking
- **Speed**: FAISS indexing enables sub-second search across thousands of resumes
- **Scalability**: Efficient memory management for large resume datasets
- **Reliability**: Fallback models ensure consistent operation
## ๐Ÿ”ฎ Future Enhancements
- **Multi-language Support**: Extend to non-English resumes and job descriptions
- **Custom Scoring Weights**: User-configurable importance of different scoring components
- **Advanced Skill Extraction**: Enhanced NLP for technical skill identification
- **Integration APIs**: Connect with ATS and HR management systems
- **Batch Job Processing**: Queue-based processing for large-scale screening
## ๐Ÿ“„ License
MIT License - See LICENSE file for details
## ๐Ÿค Contributing
Contributions are welcome! Please feel free to submit pull requests or open issues for bugs and feature requests.
---
*Built with โค๏ธ using Streamlit, Transformers, and FAISS*
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference