metadata

title: AI-driven Candidate Matcher
emoji: 🎯
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.31.0
app_file: app.py
pinned: false
license: mit

AI-driven Candidate Matcher

An advanced AI-powered resume screening application that uses a sophisticated 5-stage pipeline to efficiently match resumes against job descriptions. Built with state-of-the-art machine learning models for accurate candidate ranking.

🚀 Features

5-Stage Advanced Pipeline: Multi-layered approach combining semantic similarity, keyword matching, and AI intent analysis
State-of-the-Art Models: Uses BAAI/bge-large-en-v1.5 embeddings and Cross-Encoder re-ranking
FAISS Integration: Lightning-fast similarity search for large resume collections
AI Intent Analysis: Qwen3-1.7B model analyzes candidate job-seeking intent
Multi-format Support: Processes PDFs, DOCX, TXT, and CSV files
Interactive Visualizations: Comprehensive score breakdowns and comparative analysis
Batch Processing: Upload and analyze multiple resumes simultaneously
Export Results: Download detailed analysis as CSV

🔧 How It Works

5-Stage Advanced Pipeline

FAISS Recall (Top 50): Initial semantic similarity search using BAAI/bge-large-en-v1.5 embeddings
Cross-Encoder Re-ranking (Top 20): Deep semantic relevance scoring with ms-marco-MiniLM-L6-v2
BM25 Keyword Matching: Traditional keyword-based scoring for skill alignment
LLM Intent Analysis: Qwen3-1.7B analyzes candidate suitability and job-seeking intent
Combined Scoring: Weighted combination of all scores for final ranking

Scoring Formula

Final Score = Cross-Encoder (0-0.7) + BM25 (0.1-0.2) + Intent (0-0.1)

Input & Output

Input: Job description + Resume files (PDF/DOCX/TXT/CSV)
Output: Ranked candidates with detailed score breakdowns and AI explanations

🤖 Technical Details

Models Used

BAAI/bge-large-en-v1.5: Advanced embedding model for semantic similarity
Cross-Encoder/ms-marco-MiniLM-L6-v2: Deep re-ranking for relevance scoring
Qwen3-1.7B: Large language model for intent analysis and explanations

Key Libraries

FAISS: Facebook AI Similarity Search for efficient vector operations
Sentence Transformers: For embedding generation and cross-encoding
rank_bm25: BM25 algorithm implementation for keyword matching
Streamlit: Interactive web interface
PyTorch: Deep learning framework

📊 Configuration Options

The sidebar provides several customization options:

Results Count: Choose how many top candidates to display (1-5)
Pipeline Visualization: Real-time progress through the 5-stage pipeline
Score Breakdown: Detailed view of individual scoring components

🚀 Getting Started

Online Usage

Visit the application
Enter a comprehensive job description
Upload resume files or CSV dataset
Click "Advanced Pipeline Analysis"
Review ranked candidates with detailed insights

Local Installation

git clone <repository-url>
cd Resume_Screener_and_Skill_Extractor
pip install -r requirements.txt
streamlit run app.py

Requirements

Python 3.8+
CUDA-compatible GPU (optional, for faster processing)
Minimum 8GB RAM recommended

📋 Supported File Formats

PDF: Extracted using pdfplumber with PyPDF2 fallback
DOCX: Microsoft Word documents
TXT: Plain text files
CSV: Structured datasets with resume text columns

🔒 Privacy & Security

Data Privacy Statement

Your privacy is our top priority. We are committed to protecting the confidentiality of all resume data processed through this application.

Data Handling

No Data Storage: Resume content is processed in memory only and never stored permanently
Session-Based: All data is cleared when you close the browser or reset the application
Local Processing: All AI analysis happens locally within the application environment
No External Transmission: Resume data is never sent to external services or third parties

Security Measures

Temporary Files: Uploaded files are processed in secure temporary locations and immediately deleted
Memory Management: Automatic cleanup of resume data from system memory
No Logging: Resume content is never logged or cached
Secure Processing: All text extraction and analysis occurs within isolated processing environments

User Control

Clear Data Options: Multiple options to clear resume data and free memory
Session Management: Complete control over when and how data is processed
Transparent Processing: Full visibility into what data is being analyzed

We recommend reviewing your organization's data handling policies before uploading sensitive resume information.

📈 Performance Metrics

Accuracy: Advanced multi-stage pipeline ensures high-quality candidate ranking
Speed: FAISS indexing enables sub-second search across thousands of resumes
Scalability: Efficient memory management for large resume datasets
Reliability: Fallback models ensure consistent operation

🔮 Future Enhancements

Multi-language Support: Extend to non-English resumes and job descriptions
Custom Scoring Weights: User-configurable importance of different scoring components
Advanced Skill Extraction: Enhanced NLP for technical skill identification
Integration APIs: Connect with ATS and HR management systems
Batch Job Processing: Queue-based processing for large-scale screening

📄 License

MIT License - See LICENSE file for details

🤝 Contributing

Contributions are welcome! Please feel free to submit pull requests or open issues for bugs and feature requests.

Built with ❤️ using Streamlit, Transformers, and FAISS

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference