Spaces:

ybchen928
/

oncall-guide-ai

Sleeping

App Files Files Community

oncall-guide-ai / evaluation

661 kB

Ctrl+K

Ctrl+K

6 contributors

History: 25 commits

YanBoChen

Merge branch 'Merged20250805' into Merged20250811

4ad2c7c 12 months ago

modules
Refactor evaluation modules and add hospital chart generation 12 months ago
old
Before Run the 1st Evalation: Add Precision & MRR Chart Generator and a sample test query 12 months ago
results
Merge pull request #14 from YanBoChen0928/Jeff 12 months ago
README_HOSPITAL_CUSTOMIZATION.md
10.2 kB
feat(evaluation): add comprehensive hospital customization evaluation system 12 months ago
TEMP_MRR_complexity_fix.md
4.84 kB
Enhance evaluation framework with comprehensive metrics and improved query complexity analysis, temp bug fixing about metric 7-8 12 months ago
direct_llm_evaluator.py
22.2 kB
Update query file references for full evaluation and improve user prompts in evaluation scripts (before optimized_general_pipeline) 12 months ago
fixed_judge_evaluator.py
17.7 kB
Enhance evaluation framework with comprehensive metrics and improved query complexity analysis, temp bug fixing about metric 7-8 12 months ago
generate_combined_comparison_chart.py
8.56 kB
feat(evaluation): add visualization generators for generating png files 12 months ago
generate_comparison_report.py
18.8 kB
feat(evaluation): add comprehensive hospital customization evaluation system 12 months ago
generate_execution_time_table.py
7.6 kB
feat(evaluation): add visualization generators for generating png files 12 months ago
generate_hospital_charts.py
7.84 kB
Refactor evaluation modules and add hospital chart generation 12 months ago
generate_individual_analysis_charts.py
17.4 kB
Refactor evaluation modules and add hospital chart generation 12 months ago
generate_individual_rag_vs_direct_charts.py
12.9 kB
feat(evaluation): add visualization generators for generating png files 12 months ago
hospital_customization_evaluator.py
26.5 kB
feat(evaluation): add comprehensive hospital customization evaluation system 12 months ago
latency_evaluator.py
41.5 kB
Update query file references for full evaluation and improve user prompts in evaluation scripts (before optimized_general_pipeline) 12 months ago
metric1_latency_chart_generator.py
13.6 kB
Before Run the 1st Evalation: Add Precision & MRR Chart Generator and a sample test query 12 months ago
metric2_extraction_chart_generator.py
8.63 kB
Before Run the 1st Evalation: Add Precision & MRR Chart Generator and a sample test query 12 months ago
metric3_relevance_chart_generator.py
9.93 kB
Update threshold values in latency evaluator and coverage chart generator; enhance precision and MRR analysis with corrected thresholds and new chart generator for detailed metrics visualization. 12 months ago
metric4_coverage_chart_generator.py
9.32 kB
Update threshold values in latency evaluator and coverage chart generator; enhance precision and MRR analysis with corrected thresholds and new chart generator for detailed metrics visualization. 12 months ago
metric5_6_judge_evaluator_manual.md
9.86 kB
Add multi-system evaluation support for clinical actionability and evidence quality metrics 12 months ago
metric5_6_llm_judge_chart_generator.py
19.9 kB
Enhance evaluation framework with comprehensive metrics and improved query complexity analysis, temp bug fixing about metric 7-8 12 months ago
metric5_6_llm_judge_evaluator.py
30.3 kB
Enhance Direct LLM Evaluator and Judge Evaluator: 12 months ago
metric7_8_precision_MRR.py
19 kB
Enhance evaluation framework with comprehensive metrics and improved query complexity analysis, temp bug fixing about metric 7-8 12 months ago
metric7_8_precision_mrr_chart_generator.py
23.8 kB
Update threshold values in latency evaluator and coverage chart generator; enhance precision and MRR analysis with corrected thresholds and new chart generator for detailed metrics visualization. 12 months ago
pre_user_query_evaluate.txt
330 Bytes
Update query file references for full evaluation and correct typo in pre_user_query_evaluate.txt for pre-test. 12 months ago
rag_vs_direct_latency_chart_generator.py
14.7 kB
Add RAG vs Direct Latency Comparison Chart Generator for performance analysis 12 months ago
run_hospital_evaluation.py
3.58 kB
feat(evaluation): add comprehensive hospital customization evaluation system 12 months ago
run_rag_vs_direct_comparison.py
17.4 kB
Refactor evaluation modules and add hospital chart generation 12 months ago
single_test_query.txt
127 Bytes
Add comprehensive evaluation reports and execution time breakdown for Hospital Customization System 12 months ago
user_query.txt
1.52 kB
Update query file references for full evaluation and improve user prompts in evaluation scripts (before optimized_general_pipeline) 12 months ago
validate_expected_results.py
9.24 kB
Refactor evaluation modules and add hospital chart generation 12 months ago