Spaces:

jacob-c
/

Resume_Screener_and_Skill_Extractor

Paused

Resume_Screener_and_Skill_Extractor / README.md

root

6cea573 2 months ago

6 kB

	---
	title: AI-driven Candidate Matcher
	emoji: 🎯
	colorFrom: blue
	colorTo: green
	sdk: streamlit
	sdk_version: 1.31.0
	app_file: app.py
	pinned: false
	license: mit
	---

	# AI-driven Candidate Matcher

	An advanced AI-powered resume screening application that uses a sophisticated 5-stage pipeline to efficiently match resumes against job descriptions. Built with state-of-the-art machine learning models for accurate candidate ranking.

	## 🚀 Features

	- 5-Stage Advanced Pipeline: Multi-layered approach combining semantic similarity, keyword matching, and AI intent analysis
	- State-of-the-Art Models: Uses BAAI/bge-large-en-v1.5 embeddings and Cross-Encoder re-ranking
	- FAISS Integration: Lightning-fast similarity search for large resume collections
	- AI Intent Analysis: Qwen3-1.7B model analyzes candidate job-seeking intent
	- Multi-format Support: Processes PDFs, DOCX, TXT, and CSV files
	- Interactive Visualizations: Comprehensive score breakdowns and comparative analysis
	- Batch Processing: Upload and analyze multiple resumes simultaneously
	- Export Results: Download detailed analysis as CSV

	## 🔧 How It Works

	### 5-Stage Advanced Pipeline

	1. FAISS Recall (Top 50): Initial semantic similarity search using BAAI/bge-large-en-v1.5 embeddings
	2. Cross-Encoder Re-ranking (Top 20): Deep semantic relevance scoring with ms-marco-MiniLM-L6-v2
	3. BM25 Keyword Matching: Traditional keyword-based scoring for skill alignment
	4. LLM Intent Analysis: Qwen3-1.7B analyzes candidate suitability and job-seeking intent
	5. Combined Scoring: Weighted combination of all scores for final ranking

	### Scoring Formula
	Final Score = Cross-Encoder (0-0.7) + BM25 (0.1-0.2) + Intent (0-0.1)

	### Input & Output
	- Input: Job description + Resume files (PDF/DOCX/TXT/CSV)
	- Output: Ranked candidates with detailed score breakdowns and AI explanations

	## 🤖 Technical Details

	### Models Used
	- BAAI/bge-large-en-v1.5: Advanced embedding model for semantic similarity
	- Cross-Encoder/ms-marco-MiniLM-L6-v2: Deep re-ranking for relevance scoring
	- Qwen3-1.7B: Large language model for intent analysis and explanations

	### Key Libraries
	- FAISS: Facebook AI Similarity Search for efficient vector operations
	- Sentence Transformers: For embedding generation and cross-encoding
	- rank_bm25: BM25 algorithm implementation for keyword matching
	- Streamlit: Interactive web interface
	- PyTorch: Deep learning framework

	## 📊 Configuration Options

	The sidebar provides several customization options:
	- Results Count: Choose how many top candidates to display (1-5)
	- Pipeline Visualization: Real-time progress through the 5-stage pipeline
	- Score Breakdown: Detailed view of individual scoring components

	## 🚀 Getting Started

	### Online Usage
	1. Visit the application
	2. Enter a comprehensive job description
	3. Upload resume files or CSV dataset
	4. Click "Advanced Pipeline Analysis"
	5. Review ranked candidates with detailed insights

	### Local Installation

	```bash
	git clone <repository-url>
	cd Resume_Screener_and_Skill_Extractor
	pip install -r requirements.txt
	streamlit run app.py
	```

	### Requirements
	- Python 3.8+
	- CUDA-compatible GPU (optional, for faster processing)
	- Minimum 8GB RAM recommended

	## 📋 Supported File Formats

	- PDF: Extracted using pdfplumber with PyPDF2 fallback
	- DOCX: Microsoft Word documents
	- TXT: Plain text files
	- CSV: Structured datasets with resume text columns

	## 🔒 Privacy & Security

	### Data Privacy Statement

	Your privacy is our top priority. We are committed to protecting the confidentiality of all resume data processed through this application.

	#### Data Handling
	- No Data Storage: Resume content is processed in memory only and never stored permanently
	- Session-Based: All data is cleared when you close the browser or reset the application
	- Local Processing: All AI analysis happens locally within the application environment
	- No External Transmission: Resume data is never sent to external services or third parties

	#### Security Measures
	- Temporary Files: Uploaded files are processed in secure temporary locations and immediately deleted
	- Memory Management: Automatic cleanup of resume data from system memory
	- No Logging: Resume content is never logged or cached
	- Secure Processing: All text extraction and analysis occurs within isolated processing environments

	#### User Control
	- Clear Data Options: Multiple options to clear resume data and free memory
	- Session Management: Complete control over when and how data is processed
	- Transparent Processing: Full visibility into what data is being analyzed

	We recommend reviewing your organization's data handling policies before uploading sensitive resume information.

	## 📈 Performance Metrics

	- Accuracy: Advanced multi-stage pipeline ensures high-quality candidate ranking
	- Speed: FAISS indexing enables sub-second search across thousands of resumes
	- Scalability: Efficient memory management for large resume datasets
	- Reliability: Fallback models ensure consistent operation

	## 🔮 Future Enhancements

	- Multi-language Support: Extend to non-English resumes and job descriptions
	- Custom Scoring Weights: User-configurable importance of different scoring components
	- Advanced Skill Extraction: Enhanced NLP for technical skill identification
	- Integration APIs: Connect with ATS and HR management systems
	- Batch Job Processing: Queue-based processing for large-scale screening

	## 📄 License

	MIT License - See LICENSE file for details

	## 🤝 Contributing

	Contributions are welcome! Please feel free to submit pull requests or open issues for bugs and feature requests.

	---

	Built with ❤️ using Streamlit, Transformers, and FAISS

	Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference