oncall-guide-ai / evaluation

Commit History

Merge branch 'Merged20250805' into Merged20250811
4ad2c7c

YanBoChen commited on

Merge pull request #14 from YanBoChen0928/Jeff
b4a9ac6

Yan-Bo Chen commited on

Enhance evaluation framework with comprehensive metrics and improved query complexity analysis, temp bug fixing about metric 7-8
6577369

YanBoChen commited on

Update query file references for full evaluation and improve user prompts in evaluation scripts (before optimized_general_pipeline)
5fb5e09

YanBoChen commited on

Refactor evaluation modules and add hospital chart generation
71b7de3

VanKee commited on

Add RAG vs Direct Latency Comparison Chart Generator for performance analysis
2f35ee2

YanBoChen commited on

Enhance direct LLM evaluation with retry mechanism for 504 timeouts and improved guidance format
3edd46d

YanBoChen commited on

Add comprehensive evaluation reports and execution time breakdown for Hospital Customization System
24f6a16

YanBoChen commited on

Update query file references for full evaluation and correct typo in pre_user_query_evaluate.txt for pre-test.
e84171b

YanBoChen commited on

Merge branch 'newbranchYB-newest' into Merged20250805
abbc1cd

YanBoChen commited on

Add adaptive relevance thresholds for query complexity in PrecisionMRRAnalyzer; fix typo in condition mapping for postpartum hemorrhage
7620d26

YanBoChen commited on

Update threshold values in latency evaluator and coverage chart generator; enhance precision and MRR analysis with corrected thresholds and new chart generator for detailed metrics visualization.
5d4792a

YanBoChen commited on

Refactor relevance calculation and update thresholds in latency evaluator; enhance precision and MRR analyzer with angular distance metrics; increase timeout for primary generation in fallback configuration.
b0f56ec

YanBoChen commited on

Enhance Direct LLM Evaluator and Judge Evaluator:
40d39ed

YanBoChen commited on

feat(evaluation): add visualization generators for generating png files
6ccdca1

VanKee commited on

feat(evaluation): add comprehensive hospital customization evaluation system
550df1b

VanKee commited on

Add multi-system evaluation support for clinical actionability and evidence quality metrics
16a2990

YanBoChen commited on

Before Run the 1st Evalation: Add Precision & MRR Chart Generator and a sample test query
a2aaea2

YanBoChen commited on

feat: Add Extraction, LLM Judge, and Relevance Chart Generators
17613c8

YanBoChen commited on

Add extraction and relevance evaluators for condition extraction and retrieval relevance analysis
88e76fd

YanBoChen commited on

Add latency and relevance evaluators for medical query analysis (evaluatoin)
3e2ffcb

YanBoChen commited on

feat(evaluation): add seventh evaluation metric for multi-level fallback efficiency and early interception rate
9e4c1bc

YanBoChen commited on

fix(evaluation): improve evaluation instructions and add structured assessment phases
5f9dffa

YanBoChen commited on

fix(mild bug): enhance user query prompts (more robust dealing process with .txt or .json) and add postpartum hemorrhage condition mapping
253609b

YanBoChen commited on

Add evaluation instructions and user query prompts for clinical model assessment
16ee1e5

YanBoChen commited on