OmniEvalKit

community

AI & ML interests

None defined yet.

authored 2 papers 2 months ago

UltraEval-Audio: A Unified Framework for Comprehensive Evaluation of Audio Foundation Models

Paper • 2601.01373 • Published Jan 4 • 1

MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction

Paper • 2604.27393 • Published Apr 30 • 81

updated a dataset 4 months ago

OmniEvalKit/omnievalkit-dataset

Viewer • Updated Mar 27 • 318k • 978

published a dataset 4 months ago

OmniEvalKit/omnievalkit-dataset

Viewer • Updated Mar 27 • 318k • 978

authored a paper 9 months ago

MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe

Paper • 2509.18154 • Published Sep 16, 2025 • 63

authored a paper over 1 year ago

VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents

Paper • 2410.10594 • Published Oct 14, 2024 • 29

authored a paper almost 2 years ago

MiniCPM-V: A GPT-4V Level MLLM on Your Phone

Paper • 2408.01800 • Published Aug 3, 2024 • 96

authored a paper about 2 years ago

GUICourse: From General Vision Language Models to Versatile GUI Agents

Paper • 2406.11317 • Published Jun 17, 2024 • 2

posted an update about 2 years ago

Post

2704

Introducing GUICourse! 🎉
By leveraging extensive OCR pretraining with grounding ability, we unlock the potential of parsing-free methods for GUIAgent.
📄 Paper: ( GUICourse: From General Vision Language Models to Versatile GUI Agents (2406.11317))
🌐 Github Repo: (https://github.com/yiye3/GUICourse)
📖 Dataset: ( yiye2023/GUIAct) / ( yiye2023/GUIChat) / ( yiye2023/GUIEnv)
🎯 Model: ( RhapsodyAI/minicpm-guidance) / ( RhapsodyAI/qwen_vl_guidance)

16 replies

·