Minseo Kim

MinseoKim-03

4 12

AI & ML interests

None yet

Recent Activity

reacted to ginigen-ai's post with 🔥 about 23 hours ago

🍳 The RoboCasa Kitchen Leaderboard What does it take for a robot to handle kitchen chores the way a person does? It has to see (Vision), understand instructions (Language), and actually act (Action) — and VLA (Vision-Language-Action) models are emerging as the answer. They're the bridge between large multimodal models and real-world embodied control. RoboCasa Kitchen is a leading robot-learning benchmark in which a single-arm robot (Franka Panda) performs 24 atomic manipulation tasks — picking up cups and bowls, opening drawers and doors, turning faucets, pressing buttons, and more — inside a photorealistic simulated kitchen. Because the layout and object placement are randomized every episode, it tests genuine generalization rather than memorized motions. The score (success rate, SR) is the average fraction of the 24 tasks completed as instructed, measured over multiple seeds so results aren't down to luck. The catch: this benchmark has no official leaderboard, and protocols (number of demonstrations, evaluation setup) differ from paper to paper, leaving scores scattered. Lining the numbers up naively quickly turns into an apples-to-oranges comparison. This leaderboard fixes that by collecting published scores with their sources and comparing only what is genuinely comparable. It's split into three tables: 🏆 Kitchen 24-task (matched) — head-to-head under identical conditions (per the RLDX-1 Technical Report). This is the core ranking you can actually trust. ➕ Other protocols — self-reported under different setups (e.g. fewer demos). Not directly comparable, so kept separate. 🤖 GR1-Tabletop — a different, humanoid-based variant suite, separated to avoid confusion. Any researcher can submit their own model's score directly, and submissions are reviewed before they appear on the board. Every number links to its source paper, so you can verify it yourself. 👉 https://huggingface.co/spaces/ginigen-ai/robocasa-kitchen-leaderboard

liked a Space about 23 hours ago

ginigen-ai/robocasa-kitchen-leaderboard

reacted to SeaWolf-AI's post with 🔥 1 day ago

🐯 Chitos — The Security Scanner That Actually Proves It Most security scanners hand you a suspect list and walk away. That gap between detection and proof is where attackers live — and it's exactly the gap that Chitos was built to close. Chitos is the successor to Mythos, a static analyzer built for quick code health checks. Mythos was good at pattern matching — spotting dangerous sinks, mapping CWEs, producing readable reports. But static analysis has a structural ceiling. A rule that sees eval(user_input) can tell you that looks dangerous. It cannot tell you whether the input is reachable, whether sanitization three layers up covers this path, or whether there's a live exploit chain for your exact framework version. Chitos was built to answer those questions. 🔍 Phase 1 applies 50 language-agnostic rules across Python, JavaScript, Go, Java, C/C++, Rust, PHP, YAML and more — covering injection sinks, deserialization gadgets, credential leakage, broken crypto, and prototype pollution. Every candidate is re-verified before reaching the report. Findings that can't be substantiated are excluded, not handed to you as noise. 🔬 Phase 2 dispatches an autonomous web-search agent to hunt live CVE databases, exploit advisories, and public PoC repositories. It formulates hypotheses, verifies them, and synthesizes a structured threat narrative. This phase needs a user-supplied Claude API key — Phases 1 and 3 run entirely free. 🎯 Phase 3 is where Chitos diverges from everything else. Against targets you own or are authorized to test, it fires real payloads — XSS, SQLi, path traversal, command injection — mutates on block, captures hard evidence, and connects every proven finding into a kill-chain showing which vulnerabilities to remediate first. No installation. No account. No code sent to third-party APIs. Article: https://huggingface.co/blog/FINAL-Bench/chitos Try it now 👉 https://chitos.vidraft.net

View all activity

Organizations

None yet

liked a Space about 23 hours ago

RoboCasa Kitchen Leaderboard

🍳

Neutral aggregation of VLA success rates on RoboCasa Kitchen

liked a Space 3 days ago

VKAE

🚀

Explore model performance with VKAE acceleration

liked a model 16 days ago

FINAL-Bench/Darwin-398B-JGOS

Text Generation • 403B • Updated 5 days ago • 374 • 29

liked a Space 17 days ago

FINAL-Bench Quantum Leaderboard

⚛

Neutral quantum-method benchmark — QEC decoders & more

liked a model 23 days ago

JGOS-Model/JGOS-31B-Citizen

Image-Text-to-Text • 31B • Updated 23 days ago • 320 • 22

liked 4 models about 1 month ago

liked 3 models 2 months ago

FINAL-Bench/Darwin-28B-Opus

Text Generation • 28B • Updated 18 days ago • 161 • • 34

FINAL-Bench/Darwin-9B-NEG

Text Generation • 10B • Updated 28 days ago • 147 • 50

FINAL-Bench/Darwin-36B-Opus

Text Generation • 35B • Updated 28 days ago • 450 • 76