root
ss
6cea573

A newer version of the Streamlit SDK is available: 1.48.1

Upgrade
metadata
title: AI-driven Candidate Matcher
emoji: ๐ŸŽฏ
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.31.0
app_file: app.py
pinned: false
license: mit

AI-driven Candidate Matcher

An advanced AI-powered resume screening application that uses a sophisticated 5-stage pipeline to efficiently match resumes against job descriptions. Built with state-of-the-art machine learning models for accurate candidate ranking.

๐Ÿš€ Features

  • 5-Stage Advanced Pipeline: Multi-layered approach combining semantic similarity, keyword matching, and AI intent analysis
  • State-of-the-Art Models: Uses BAAI/bge-large-en-v1.5 embeddings and Cross-Encoder re-ranking
  • FAISS Integration: Lightning-fast similarity search for large resume collections
  • AI Intent Analysis: Qwen3-1.7B model analyzes candidate job-seeking intent
  • Multi-format Support: Processes PDFs, DOCX, TXT, and CSV files
  • Interactive Visualizations: Comprehensive score breakdowns and comparative analysis
  • Batch Processing: Upload and analyze multiple resumes simultaneously
  • Export Results: Download detailed analysis as CSV

๐Ÿ”ง How It Works

5-Stage Advanced Pipeline

  1. FAISS Recall (Top 50): Initial semantic similarity search using BAAI/bge-large-en-v1.5 embeddings
  2. Cross-Encoder Re-ranking (Top 20): Deep semantic relevance scoring with ms-marco-MiniLM-L6-v2
  3. BM25 Keyword Matching: Traditional keyword-based scoring for skill alignment
  4. LLM Intent Analysis: Qwen3-1.7B analyzes candidate suitability and job-seeking intent
  5. Combined Scoring: Weighted combination of all scores for final ranking

Scoring Formula

Final Score = Cross-Encoder (0-0.7) + BM25 (0.1-0.2) + Intent (0-0.1)

Input & Output

  • Input: Job description + Resume files (PDF/DOCX/TXT/CSV)
  • Output: Ranked candidates with detailed score breakdowns and AI explanations

๐Ÿค– Technical Details

Models Used

  • BAAI/bge-large-en-v1.5: Advanced embedding model for semantic similarity
  • Cross-Encoder/ms-marco-MiniLM-L6-v2: Deep re-ranking for relevance scoring
  • Qwen3-1.7B: Large language model for intent analysis and explanations

Key Libraries

  • FAISS: Facebook AI Similarity Search for efficient vector operations
  • Sentence Transformers: For embedding generation and cross-encoding
  • rank_bm25: BM25 algorithm implementation for keyword matching
  • Streamlit: Interactive web interface
  • PyTorch: Deep learning framework

๐Ÿ“Š Configuration Options

The sidebar provides several customization options:

  • Results Count: Choose how many top candidates to display (1-5)
  • Pipeline Visualization: Real-time progress through the 5-stage pipeline
  • Score Breakdown: Detailed view of individual scoring components

๐Ÿš€ Getting Started

Online Usage

  1. Visit the application
  2. Enter a comprehensive job description
  3. Upload resume files or CSV dataset
  4. Click "Advanced Pipeline Analysis"
  5. Review ranked candidates with detailed insights

Local Installation

git clone <repository-url>
cd Resume_Screener_and_Skill_Extractor
pip install -r requirements.txt
streamlit run app.py

Requirements

  • Python 3.8+
  • CUDA-compatible GPU (optional, for faster processing)
  • Minimum 8GB RAM recommended

๐Ÿ“‹ Supported File Formats

  • PDF: Extracted using pdfplumber with PyPDF2 fallback
  • DOCX: Microsoft Word documents
  • TXT: Plain text files
  • CSV: Structured datasets with resume text columns

๐Ÿ”’ Privacy & Security

Data Privacy Statement

Your privacy is our top priority. We are committed to protecting the confidentiality of all resume data processed through this application.

Data Handling

  • No Data Storage: Resume content is processed in memory only and never stored permanently
  • Session-Based: All data is cleared when you close the browser or reset the application
  • Local Processing: All AI analysis happens locally within the application environment
  • No External Transmission: Resume data is never sent to external services or third parties

Security Measures

  • Temporary Files: Uploaded files are processed in secure temporary locations and immediately deleted
  • Memory Management: Automatic cleanup of resume data from system memory
  • No Logging: Resume content is never logged or cached
  • Secure Processing: All text extraction and analysis occurs within isolated processing environments

User Control

  • Clear Data Options: Multiple options to clear resume data and free memory
  • Session Management: Complete control over when and how data is processed
  • Transparent Processing: Full visibility into what data is being analyzed

We recommend reviewing your organization's data handling policies before uploading sensitive resume information.

๐Ÿ“ˆ Performance Metrics

  • Accuracy: Advanced multi-stage pipeline ensures high-quality candidate ranking
  • Speed: FAISS indexing enables sub-second search across thousands of resumes
  • Scalability: Efficient memory management for large resume datasets
  • Reliability: Fallback models ensure consistent operation

๐Ÿ”ฎ Future Enhancements

  • Multi-language Support: Extend to non-English resumes and job descriptions
  • Custom Scoring Weights: User-configurable importance of different scoring components
  • Advanced Skill Extraction: Enhanced NLP for technical skill identification
  • Integration APIs: Connect with ATS and HR management systems
  • Batch Job Processing: Queue-based processing for large-scale screening

๐Ÿ“„ License

MIT License - See LICENSE file for details

๐Ÿค Contributing

Contributions are welcome! Please feel free to submit pull requests or open issues for bugs and feature requests.


Built with โค๏ธ using Streamlit, Transformers, and FAISS

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference