Spaces:

MukulRay
/

EmpathRAG

Sleeping

App Files Files Community

EmpathRAG / eval

Commit History

Strict cleanup: remove internal handoff docs, neutralize teammate/sprint naming in public docs

091059d

MukulRay commited on 13 days ago

Diversity probe sweep + V1->V4 narrative README + HF Spaces entry + DV/incomplete fixes

97e19ad

MukulRay commited on 13 days ago

V4.3: prompt-injection audit, input length cap, per-layer ablation, unguarded baseline, limitations restored

847587d

MukulRay commited on 14 days ago

V4.2 part 3: sycophancy guard, F-1 flag decay, incomplete-message handler

d808a62

MukulRay commited on 14 days ago

V4.2 part 1: fix broken ISSS URL, ship URL audit + Karthik data brief

8fdff5c

MukulRay commited on 14 days ago

V4: streaming, controlled paraphrasing, support plan, voice, sweeps

655c300

MukulRay commited on 16 days ago

Add Eval B safety supplement

433900d

MukulRay commited on 20 days ago

Ingest Core dataset and harden router policy

f046303

MukulRay commited on 20 days ago

Prepare Core dataset intake and resource registry

e143b4a

MukulRay commited on 24 days ago

Polish peer helper and scope handling

ea1618f

MukulRay commited on 24 days ago

Add Core safety metadata and eval summaries

d50d1e1

MukulRay commited on 24 days ago

Implement EmpathRAG Core hybrid router

b2f5c42

MukulRay commited on 24 days ago

Add Karthik eval harness and safety patches

a246513

MukulRay commited on 24 days ago

Start V2.5 support navigator hardening

79a6369

MukulRay commited on 24 days ago

Checkpoint V2 curated support navigator

15594c0

MukulRay commited on 24 days ago

Add curated corpus integration scaffold

fadd796

MukulRay commited on 28 days ago

Start v2 safety hardening

81deeef

MukulRay commited on 28 days ago

Clean repo: fix README with verified metrics, pin requirements, update gitignore, remove log from tracking

404da58

Mukul Rayana commited on Apr 11

Add conversation memory, fix Gradio stack, improve SYSTEM_PROMPT, log human eval turns

15920d0

Mukul Rayana commited on Apr 11

Fix Condition C ablation - pure FAISS order, no safety score bias - mean=0.50

4e44b55

Mukul Rayana commited on Apr 10

Add ablation Condition C eval - mean alignment 0.40 vs D=0.88

d64bbe6

Mukul Rayana commited on Apr 10

fix: MistralJudge schema detection for truths/claims/verdicts — matches DeepEval 3-call pattern (Day 15)

ce15608

Mukul Rayana commited on Apr 9

fix: MistralJudge uses from_json_schema grammar — correct structure for claim extraction and verification (Day 15)

5405e38

Mukul Rayana commited on Apr 9

fix: GBNF grammar-constrained JSON sampling in MistralJudge — eliminates DeepEval JSON parse errors (Day 15)

c9853ac

Mukul Rayana commited on Apr 9

fix: MistralJudge JSON extraction for DeepEval faithfulness, commit Wilcoxon results (Day 15)

2668471

Mukul Rayana commited on Apr 9

fix: wilcoxon sys.path for condition_a import, zero_method=pratt for binary scores (Day 15)

7fc5654

Mukul Rayana commited on Apr 9

fix: Wilcoxon uses stub guardrail — tests retrieval quality on all 50 prompts not 13 (Day 15)

68b0d74

Mukul Rayana commited on Apr 9

feat: DeepEval FaithfulnessMetric with Mistral judge, async_mode=False (replaces RAGAS, Day 15)

d02b074

Mukul Rayana commited on Apr 9

fix: guardrail dual-import path, bertscore key names, ragas reuse pipeline.llm (Day 14)

9bce0e0

Mukul Rayana commited on Apr 8

eval: save adversarial results at both t=0.50 and t=0.85, reset production threshold to 0.50 (Day 14)

9f77f5f

Mukul Rayana commited on Apr 8

fix: threshold sweep — calibrate DeBERTa guardrail threshold, patch eval imports (Day 14)

d5f8958

Mukul Rayana commited on Apr 8

feat: eval scripts — BM25 baseline, adversarial probes, BERTScore, RAGAS, Wilcoxon (Day 14)

78fc1e6

Mukul Rayana commited on Apr 8

feat: Gradio demo app, DeBERTa Colab notebook, updated smoke test results (Day 13)

d471138

Mukul Rayana commited on Apr 8

Add eval suite: test prompts, adversarial probes, BERTScore references (Day 12)

5c84477

Mukul Rayana commited on Apr 7

Add pipeline orchestrator + smoke test — 4/5 emotion predictions correct (Day 12)

8b1f355

Mukul Rayana commited on Apr 7

Day 1: data pipeline, session tracker, query router, adversarial probes, Colab training notebooks

bc3ba9e

Mukul Rayana commited on Apr 6