CMU-LTI

university

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

Xuhui updated a collection about 2 months ago

lwaekfjlk submitted a paper about 2 months ago

Building Social World Models with Large Language Models

Xuhui updated a collection about 2 months ago

View all activity

Papers

Benchmark Test-Time Scaling of General LLM Agents

On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models

View all Papers

updated a collection about 2 months ago

ODYSSIM

ODYSSIM • 8 items • Updated Jun 18 • 1

submitted a paper to Daily Papers about 2 months ago

Building Social World Models with Large Language Models

Paper • 2606.11482 • Published Jun 9 • 3

updated a collection about 2 months ago

ODYSSIM

ODYSSIM • 8 items • Updated Jun 18 • 1

posted an update about 2 months ago

Post

126

🚀 VQAScore now supports text-to-video evaluation!

VQAScore scores how well a generated image or video matches a prompt by asking a VLM "does this show {prompt}?" and using P(Yes). It became a go-to evaluation metric and reward model for image generation (2M+ downloads), and we just added text-to-video support across 20+ VLMs (GPT, Gemini, Qwen). Free and open-source, and it keeps improving as VLMs improve.

💻 Code: https://github.com/linzhiqiu/t2v_metrics
📄 Paper: https://arxiv.org/abs/2404.01291
🧵 Launch thread + demo video: https://x.com/ZhiqiuLin/status/2064316582461841499

1 reply

·

updated 2 datasets about 2 months ago

cmu-lti/osim-mid-training

Viewer • Updated Jun 9 • 21.2M • 1.79k • 4

cmu-lti/osim-post-training

Viewer • Updated Jun 9 • 33.5k • 1.13k • 1

published 2 datasets about 2 months ago

cmu-lti/osim-post-training

Viewer • Updated Jun 9 • 33.5k • 1.13k • 1

cmu-lti/osim-mid-training

Viewer • Updated Jun 9 • 21.2M • 1.79k • 4

authored a paper about 2 months ago

AutoResearch-RL: Perpetual Self-Evaluating Reinforcement Learning Agents for Autonomous Neural Architecture Discovery

Paper • 2603.07300 • Published Mar 7 • 18

authored 2 papers 2 months ago

Verus-SpecGym: An Agentic Environment for Evaluating Specification Autoformalization

Paper • 2605.26457 • Published May 26 • 7

K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts

Paper • 2606.02404 • Published Jun 1 • 59

updated a collection 2 months ago

ODYSSIM

ODYSSIM • 8 items • Updated Jun 18 • 1

updated a model 2 months ago

cmu-lti/osim-4b-mid

Text Generation • 4B • Updated Jun 5 • 2.03k

published a model 2 months ago

cmu-lti/osim-4b-mid

Text Generation • 4B • Updated Jun 5 • 2.03k

updated a model 2 months ago

cmu-lti/osim-4b-inst

Image-Text-to-Text • 5B • Updated Jun 5 • 7

published a model 2 months ago

cmu-lti/osim-4b-inst

Image-Text-to-Text • 5B • Updated Jun 5 • 7

updated a model 2 months ago

cmu-lti/osim-4b

Text Generation • 4B • Updated Jun 5 • 3.07k