A newer version of the Streamlit SDK is available:
1.48.1
title: AI-driven Candidate Matcher
emoji: ๐ฏ
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.31.0
app_file: app.py
pinned: false
license: mit
AI-driven Candidate Matcher
An advanced AI-powered resume screening application that uses a sophisticated 5-stage pipeline to efficiently match resumes against job descriptions. Built with state-of-the-art machine learning models for accurate candidate ranking.
๐ Features
- 5-Stage Advanced Pipeline: Multi-layered approach combining semantic similarity, keyword matching, and AI intent analysis
- State-of-the-Art Models: Uses BAAI/bge-large-en-v1.5 embeddings and Cross-Encoder re-ranking
- FAISS Integration: Lightning-fast similarity search for large resume collections
- AI Intent Analysis: Qwen3-1.7B model analyzes candidate job-seeking intent
- Multi-format Support: Processes PDFs, DOCX, TXT, and CSV files
- Interactive Visualizations: Comprehensive score breakdowns and comparative analysis
- Batch Processing: Upload and analyze multiple resumes simultaneously
- Export Results: Download detailed analysis as CSV
๐ง How It Works
5-Stage Advanced Pipeline
- FAISS Recall (Top 50): Initial semantic similarity search using BAAI/bge-large-en-v1.5 embeddings
- Cross-Encoder Re-ranking (Top 20): Deep semantic relevance scoring with ms-marco-MiniLM-L6-v2
- BM25 Keyword Matching: Traditional keyword-based scoring for skill alignment
- LLM Intent Analysis: Qwen3-1.7B analyzes candidate suitability and job-seeking intent
- Combined Scoring: Weighted combination of all scores for final ranking
Scoring Formula
Final Score = Cross-Encoder (0-0.7) + BM25 (0.1-0.2) + Intent (0-0.1)
Input & Output
- Input: Job description + Resume files (PDF/DOCX/TXT/CSV)
- Output: Ranked candidates with detailed score breakdowns and AI explanations
๐ค Technical Details
Models Used
- BAAI/bge-large-en-v1.5: Advanced embedding model for semantic similarity
- Cross-Encoder/ms-marco-MiniLM-L6-v2: Deep re-ranking for relevance scoring
- Qwen3-1.7B: Large language model for intent analysis and explanations
Key Libraries
- FAISS: Facebook AI Similarity Search for efficient vector operations
- Sentence Transformers: For embedding generation and cross-encoding
- rank_bm25: BM25 algorithm implementation for keyword matching
- Streamlit: Interactive web interface
- PyTorch: Deep learning framework
๐ Configuration Options
The sidebar provides several customization options:
- Results Count: Choose how many top candidates to display (1-5)
- Pipeline Visualization: Real-time progress through the 5-stage pipeline
- Score Breakdown: Detailed view of individual scoring components
๐ Getting Started
Online Usage
- Visit the application
- Enter a comprehensive job description
- Upload resume files or CSV dataset
- Click "Advanced Pipeline Analysis"
- Review ranked candidates with detailed insights
Local Installation
git clone <repository-url>
cd Resume_Screener_and_Skill_Extractor
pip install -r requirements.txt
streamlit run app.py
Requirements
- Python 3.8+
- CUDA-compatible GPU (optional, for faster processing)
- Minimum 8GB RAM recommended
๐ Supported File Formats
- PDF: Extracted using pdfplumber with PyPDF2 fallback
- DOCX: Microsoft Word documents
- TXT: Plain text files
- CSV: Structured datasets with resume text columns
๐ Privacy & Security
Data Privacy Statement
Your privacy is our top priority. We are committed to protecting the confidentiality of all resume data processed through this application.
Data Handling
- No Data Storage: Resume content is processed in memory only and never stored permanently
- Session-Based: All data is cleared when you close the browser or reset the application
- Local Processing: All AI analysis happens locally within the application environment
- No External Transmission: Resume data is never sent to external services or third parties
Security Measures
- Temporary Files: Uploaded files are processed in secure temporary locations and immediately deleted
- Memory Management: Automatic cleanup of resume data from system memory
- No Logging: Resume content is never logged or cached
- Secure Processing: All text extraction and analysis occurs within isolated processing environments
User Control
- Clear Data Options: Multiple options to clear resume data and free memory
- Session Management: Complete control over when and how data is processed
- Transparent Processing: Full visibility into what data is being analyzed
We recommend reviewing your organization's data handling policies before uploading sensitive resume information.
๐ Performance Metrics
- Accuracy: Advanced multi-stage pipeline ensures high-quality candidate ranking
- Speed: FAISS indexing enables sub-second search across thousands of resumes
- Scalability: Efficient memory management for large resume datasets
- Reliability: Fallback models ensure consistent operation
๐ฎ Future Enhancements
- Multi-language Support: Extend to non-English resumes and job descriptions
- Custom Scoring Weights: User-configurable importance of different scoring components
- Advanced Skill Extraction: Enhanced NLP for technical skill identification
- Integration APIs: Connect with ATS and HR management systems
- Batch Job Processing: Queue-based processing for large-scale screening
๐ License
MIT License - See LICENSE file for details
๐ค Contributing
Contributions are welcome! Please feel free to submit pull requests or open issues for bugs and feature requests.
Built with โค๏ธ using Streamlit, Transformers, and FAISS
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference