Peter PRO

pworth1971

3 3

pworth1971

AI & ML interests

Language Models

Organizations

upvoted a paper 8 months ago

AthenaBench: A Dynamic Benchmark for Evaluating LLMs in Cyber Threat Intelligence

Paper • 2511.01144 • Published Nov 3, 2025 • 4

upvoted an article 8 months ago

Article

CyberSecEval 2 - A Comprehensive Evaluation Framework for Cybersecurity Risks and Capabilities of Large Language Models

r34p3r1321, csahana95, liyueam10, cynikolai, dwjsong, simonwan, fa7pdn, is-eqv, yaohway, dhavalkapil, dmolnar, spencerwmeta, jdsaxe, vontimitta, carljparker, clefourrier

•

May 24, 2024

• 22

upvoted a collection about 1 year ago

REAL-MM-RAG-Bench

Collection

REAL-MM-RAG-Bench is a benchmark designed to evaluate multi-modal retrieval models under realistic and challenging conditions. • 4 items • Updated Mar 13, 2025 • 11

Peter PRO

AI & ML interests

Organizations

pworth1971's activity

CyberSecEval 2 - A Comprehensive Evaluation Framework for Cybersecurity Risks and Capabilities of Large Language Models