PatronusAI/llada_2.1_world_model_v3
Text Generation • 16B • Updated
• 24
LLM Evaluation
Benchmarking Reward Hack Detection in Code Environments via Contrastive Analysis
MEMTRACK: Evaluating Long-Term Memory and State Tracking in Multi-Platform Dynamic Agent Environments