Fix: Correct attribute references for error detection/correction metrics 005cdbf RGB Evaluation commited on 16 days ago
fix: Information Integration evaluation - handle multiple answer variants with pipe-separated format 5253a83 RGB Evaluation commited on 16 days ago
feat: Show all 9 LLM models in app dropdown, add comprehensive code review and metric analysis documentation b1ccc5d RGB Evaluation commited on 17 days ago
feat: Add noise level percentage to Noise Robustness cards ca20603 RGB Evaluation commited on 25 days ago
feat: Add Moonshot Kimi K2 Instruct model to DEFAULT_MODELS 1b8d19d RGB Evaluation commited on 25 days ago
Merge main: accept local changes with new grid layout and model updates bdba13c RGB Evaluation commited on 25 days ago
feat: Add separate grid layout for 4 RAG abilities in Streamlit UI af25c62 RGB Evaluation commited on 25 days ago
Fix results file pattern matching - search for results_*.json instead of evaluation_*.json a757c21 RGB Deployment Bot commited on about 1 month ago
Update all LLMs to new model list with 4 default models 3d27cc5 RGB Deployment Bot commited on about 1 month ago
Fix debug function - replace st.debug() with st.write() in expander e7842fe RGB Deployment Bot commited on about 1 month ago
Add debug info to past results viewer to diagnose missing results 8796cb2 RGB Deployment Bot commited on about 1 month ago
Replace decommissioned llama-3.1-70b-versatile with llama2-70b-4096 800ecaf RGB Deployment Bot commited on about 1 month ago
Reduce RPM limit to 25 (safe margin below 30) and increase request interval to 2.5s 4649c35 RGB Deployment Bot commited on about 1 month ago
Implement RPM rate limiting (30 requests/minute) with sliding window 68627ba RGB Deployment Bot commited on about 1 month ago
Add deepseek-r1-distill-llama-70b free chat model b8b9c59 RGB Deployment Bot commited on about 1 month ago
Add persistent results storage with past results viewer 2397c87 RGB Deployment Bot commited on about 1 month ago
Replace decommissioned mixtral-8x7b-32768 with llama-3.1-70b-versatile dd2d90a RGB Deployment Bot commited on about 1 month ago
Fix continuous page refresh during background evaluation 0deb4ed RGB Deployment Bot commited on about 1 month ago
Add 4th model: meta-llama/llama-4-maverick-17b-128e-instruct d68feb2 RGB Deployment Bot commited on about 1 month ago
Replace deprecated gemma2-9b-it with mixtral-8x7b-32768 8268ab9 RGB Deployment Bot commited on about 1 month ago
Add background evaluation and downloadable reports (CSV, JSON, PDF) 81c6a30 RGB Deployment Bot commited on about 1 month ago
Deploy RGB Metrics dashboard - 2026-01-04 22:05:37 3f89944 RGB Deployment Bot commited on about 1 month ago