EmpathRAG / eval

Commit History

Strict cleanup: remove internal handoff docs, neutralize teammate/sprint naming in public docs
091059d

MukulRay commited on

Diversity probe sweep + V1->V4 narrative README + HF Spaces entry + DV/incomplete fixes
97e19ad

MukulRay commited on

V4.3: prompt-injection audit, input length cap, per-layer ablation, unguarded baseline, limitations restored
847587d

MukulRay commited on

V4.2 part 3: sycophancy guard, F-1 flag decay, incomplete-message handler
d808a62

MukulRay commited on

V4.2 part 1: fix broken ISSS URL, ship URL audit + Karthik data brief
8fdff5c

MukulRay commited on

V4: streaming, controlled paraphrasing, support plan, voice, sweeps
655c300

MukulRay commited on

Add Eval B safety supplement
433900d

MukulRay commited on

Ingest Core dataset and harden router policy
f046303

MukulRay commited on

Prepare Core dataset intake and resource registry
e143b4a

MukulRay commited on

Polish peer helper and scope handling
ea1618f

MukulRay commited on

Add Core safety metadata and eval summaries
d50d1e1

MukulRay commited on

Implement EmpathRAG Core hybrid router
b2f5c42

MukulRay commited on

Add Karthik eval harness and safety patches
a246513

MukulRay commited on

Start V2.5 support navigator hardening
79a6369

MukulRay commited on

Checkpoint V2 curated support navigator
15594c0

MukulRay commited on

Add curated corpus integration scaffold
fadd796

MukulRay commited on

Start v2 safety hardening
81deeef

MukulRay commited on

Clean repo: fix README with verified metrics, pin requirements, update gitignore, remove log from tracking
404da58

Mukul Rayana commited on

Add conversation memory, fix Gradio stack, improve SYSTEM_PROMPT, log human eval turns
15920d0

Mukul Rayana commited on

Fix Condition C ablation - pure FAISS order, no safety score bias - mean=0.50
4e44b55

Mukul Rayana commited on

Add ablation Condition C eval - mean alignment 0.40 vs D=0.88
d64bbe6

Mukul Rayana commited on

fix: MistralJudge schema detection for truths/claims/verdicts β€” matches DeepEval 3-call pattern (Day 15)
ce15608

Mukul Rayana commited on

fix: MistralJudge uses from_json_schema grammar β€” correct structure for claim extraction and verification (Day 15)
5405e38

Mukul Rayana commited on

fix: GBNF grammar-constrained JSON sampling in MistralJudge β€” eliminates DeepEval JSON parse errors (Day 15)
c9853ac

Mukul Rayana commited on

fix: MistralJudge JSON extraction for DeepEval faithfulness, commit Wilcoxon results (Day 15)
2668471

Mukul Rayana commited on

fix: wilcoxon sys.path for condition_a import, zero_method=pratt for binary scores (Day 15)
7fc5654

Mukul Rayana commited on

fix: Wilcoxon uses stub guardrail β€” tests retrieval quality on all 50 prompts not 13 (Day 15)
68b0d74

Mukul Rayana commited on

feat: DeepEval FaithfulnessMetric with Mistral judge, async_mode=False (replaces RAGAS, Day 15)
d02b074

Mukul Rayana commited on

fix: guardrail dual-import path, bertscore key names, ragas reuse pipeline.llm (Day 14)
9bce0e0

Mukul Rayana commited on

eval: save adversarial results at both t=0.50 and t=0.85, reset production threshold to 0.50 (Day 14)
9f77f5f

Mukul Rayana commited on

fix: threshold sweep β€” calibrate DeBERTa guardrail threshold, patch eval imports (Day 14)
d5f8958

Mukul Rayana commited on

feat: eval scripts β€” BM25 baseline, adversarial probes, BERTScore, RAGAS, Wilcoxon (Day 14)
78fc1e6

Mukul Rayana commited on

feat: Gradio demo app, DeBERTa Colab notebook, updated smoke test results (Day 13)
d471138

Mukul Rayana commited on

Add eval suite: test prompts, adversarial probes, BERTScore references (Day 12)
5c84477

Mukul Rayana commited on

Add pipeline orchestrator + smoke test β€” 4/5 emotion predictions correct (Day 12)
8b1f355

Mukul Rayana commited on

Day 1: data pipeline, session tracker, query router, adversarial probes, Colab training notebooks
bc3ba9e

Mukul Rayana commited on