Spaces:

ybchen928
/

oncall-guide-ai

Running

App Files Files Community

oncall-guide-ai / evaluation

Commit History

Merge branch 'Merged20250805' into Merged20250811

4ad2c7c

YanBoChen commited on 21 days ago

Merge pull request #14 from YanBoChen0928/Jeff

b4a9ac6

Yan-Bo Chen commited on 21 days ago

Enhance evaluation framework with comprehensive metrics and improved query complexity analysis, temp bug fixing about metric 7-8

6577369

YanBoChen commited on 23 days ago

Update query file references for full evaluation and improve user prompts in evaluation scripts (before optimized_general_pipeline)

5fb5e09

YanBoChen commited on 23 days ago

Refactor evaluation modules and add hospital chart generation

71b7de3

VanKee commited on 26 days ago

Add RAG vs Direct Latency Comparison Chart Generator for performance analysis

2f35ee2

YanBoChen commited on 27 days ago

Enhance direct LLM evaluation with retry mechanism for 504 timeouts and improved guidance format

3edd46d

YanBoChen commited on 27 days ago

Add comprehensive evaluation reports and execution time breakdown for Hospital Customization System

24f6a16

YanBoChen commited on 27 days ago

Update query file references for full evaluation and correct typo in pre_user_query_evaluate.txt for pre-test.

e84171b

YanBoChen commited on 27 days ago

Merge branch 'newbranchYB-newest' into Merged20250805

abbc1cd

YanBoChen commited on 27 days ago

Add adaptive relevance thresholds for query complexity in PrecisionMRRAnalyzer; fix typo in condition mapping for postpartum hemorrhage

7620d26

YanBoChen commited on 27 days ago

Update threshold values in latency evaluator and coverage chart generator; enhance precision and MRR analysis with corrected thresholds and new chart generator for detailed metrics visualization.

5d4792a

YanBoChen commited on 27 days ago

Refactor relevance calculation and update thresholds in latency evaluator; enhance precision and MRR analyzer with angular distance metrics; increase timeout for primary generation in fallback configuration.

b0f56ec

YanBoChen commited on 27 days ago

Enhance Direct LLM Evaluator and Judge Evaluator:

40d39ed

YanBoChen commited on 27 days ago

feat(evaluation): add visualization generators for generating png files

6ccdca1

VanKee commited on 28 days ago

feat(evaluation): add comprehensive hospital customization evaluation system

550df1b

VanKee commited on 28 days ago

Add multi-system evaluation support for clinical actionability and evidence quality metrics

16a2990

YanBoChen commited on 28 days ago

Before Run the 1st Evalation: Add Precision & MRR Chart Generator and a sample test query

a2aaea2

YanBoChen commited on 28 days ago

feat: Add Extraction, LLM Judge, and Relevance Chart Generators

17613c8

YanBoChen commited on 28 days ago

Add extraction and relevance evaluators for condition extraction and retrieval relevance analysis

88e76fd

YanBoChen commited on 28 days ago

Add latency and relevance evaluators for medical query analysis (evaluatoin)

3e2ffcb

YanBoChen commited on 28 days ago

feat(evaluation): add seventh evaluation metric for multi-level fallback efficiency and early interception rate

9e4c1bc

YanBoChen commited on 28 days ago

fix(evaluation): improve evaluation instructions and add structured assessment phases

5f9dffa

YanBoChen commited on 28 days ago

fix(mild bug): enhance user query prompts (more robust dealing process with .txt or .json) and add postpartum hemorrhage condition mapping

253609b

YanBoChen commited on 28 days ago

Add evaluation instructions and user query prompts for clinical model assessment

16ee1e5

YanBoChen commited on 28 days ago