InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation Paper • 2407.06423 • Published Jul 8, 2024 • 1
LM-CPPF: Paraphrasing-Guided Data Augmentation for Contrastive Prompt-Based Few-Shot Fine-Tuning Paper • 2305.18169 • Published May 29, 2023 • 2
BCAmirs at SemEval-2024 Task 4: Beyond Words: A Multimodal and Multilingual Exploration of Persuasion in Memes Paper • 2404.03022 • Published Apr 3, 2024 • 1
StarFlow: Generating Structured Workflow Outputs From Sketch Images Paper • 2503.21889 • Published Mar 27, 2025 • 3
UnpredictaBench: A Benchmark for Evaluating Distributional Randomness in LLMs Paper • 2606.06622 • Published 5 days ago • 15
UnpredictaBench: A Benchmark for Evaluating Distributional Randomness in LLMs Paper • 2606.06622 • Published 5 days ago • 15
GrepSeek: Training Search Agents for Direct Corpus Interaction Paper • 2605.29307 • Published 12 days ago • 104
DRBench: A Realistic Benchmark for Enterprise Deep Research Paper • 2510.00172 • Published Sep 30, 2025 • 2
Fara-7B: An Efficient Agentic Model for Computer Use Paper • 2511.19663 • Published Nov 24, 2025 • 20
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 43 items • Updated Mar 2 • 725
BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks Paper • 2412.04626 • Published Dec 5, 2024 • 14
BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks Paper • 2412.04626 • Published Dec 5, 2024 • 14