AI & ML interests
LLM evaluation, prompt sensitivity, local LLMs, benchmark datasets, reproducible evaluation
Recent Activity
signaldepth 's models
None public yet
LLM evaluation, prompt sensitivity, local LLMs, benchmark datasets, reproducible evaluation