Commit History
Default sample size to 10 85f05a2 verified
Route cogtest/expert synthesis to GPT-OSS 20B when main model is Llama 3.1 8B Instant 0380dce verified
Route cogtest/expert synthesis to GPT-OSS 20B when main model is Llama 3.1 8B Instant 3667f74 verified
Update Anthropic model to Sonnet 4.6 and add reproducible-sample option 23287f3 verified
Update Anthropic model to Sonnet 4.6 and add reproducible-sample option 5e5c325 verified
Increase run limit to 15 for Haonan Sun 8463209 verified
Increase run limit to 15 for Haonan Sun bb8ee77 verified
Add student tokens CSV for assignment mode fafa9a8 verified
Update cogtest prompts: checklist respondent, label-free analyst, merge-by-issue synthesis 3a7eac5 verified
Update token usage: load from CSV, use HF_USAGE_REPO/TOKEN secrets, limit=8, show remaining runs 054b0d1 verified
Add huggingface_hub dependency for assignment usage persistence 365b39e verified
Add assignment token mode for MY456 students (per-token run limits, persistent usage tracking) 3cfee2d verified
Clean up model dropdown labels: remove (Free) and (On-Demand) tags 09c2f84 verified
Remove GPT-OSS 20B model 51b27ff verified
Add Groq Llama 3.3 70B (On-Demand) model 3c312a7 verified
Update models: Claude, OpenAI, Groq (Llama 4 Scout, GPT-OSS 20B, Llama 3.1 8B). Remove Winston/vLLM/local. c36d092 verified
Hide individual responses in Response Generation tab (keep for Question Testing only) f9c782b verified
QAS: summary ordered by rater consensus 0c0f867
Patrick Sturgis commited on
QAS: 5 independent raters with aggregated vote counts 0b392bb
Patrick Sturgis commited on
Fix QAS parser: item_id casing mismatch d82ecef
Patrick Sturgis commited on
Add QAS-99 (Question Appraisal System) evaluation 2cdd413
Patrick Sturgis commited on
Update Expert Review info text 94d592a
Patrick Sturgis commited on
Add Expert Review pipeline (3 experts + synthesis) 1a52391
Patrick Sturgis commited on
Add Question Appraisal System placeholder option 079575b
Patrick Sturgis commited on
Replace CogBot prompt style with Cognitive Interview default + Question testing expert placeholder 260bc84
Patrick Sturgis commited on
Fix Groq cogtest failure: sequential execution + error visibility 3e07d9c
Patrick Sturgis commited on
Fix over-diagnosis: reframe analyst taxonomy as reference categories, not checklist a37d8a4
Patrick Sturgis commited on
Add impact score (severity × prevalence), sort problems by impact 4ca8fa2
Patrick Sturgis commited on
Replace severity labels with B&C-aligned severity scores and prevalence a539994
Patrick Sturgis commited on
Fix confidence rating extraction to handle varied LLM output formats 9adf0d6
Patrick Sturgis commited on
Fix cogtest pipeline: remove traffic lights, align synthesis with coding, add confidence stats 8ba5d5e
Patrick Sturgis commited on
Add Cognitive Interview pipeline and Claude/OpenAI model support 04f658e
Patrick Sturgis commited on