Gradio-Blocks-Party

company

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

akhaliq submitted a paper about 1 month ago

DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation

akhaliq submitted a paper about 1 month ago

Contrastive Distribution Matching for Amortized Sequential Monte Carlo in Discrete Diffusion

akhaliq submitted a paper about 1 month ago

optimize_anything: A Universal API for Optimizing any Text Parameter

View all activity

fcakyon

authored a paper 21 days ago

Ultralytics YOLO26: Unified Real-Time End-to-End Vision Models

Paper • 2606.03748 • Published 29 days ago • 15

innovation64

authored a paper 22 days ago

AURA: Intent-Directed Probing for Implicit-Need Surfacing in Situated LLM Agents

Paper • 2606.05557 • Published 27 days ago • 1

innovation64

submitted a paper to Daily Papers 25 days ago

AURA: Intent-Directed Probing for Implicit-Need Surfacing in Situated LLM Agents

Paper • 2606.05557 • Published 27 days ago • 1

gigant

authored a paper about 1 month ago

Decoupling the Benefits of Subword Tokenization for Language Model Training via Byte-level Simulation

Paper • 2604.27263 • Published May 14 • 11

gigant

submitted a paper to Daily Papers about 1 month ago

Decoupling the Benefits of Subword Tokenization for Language Model Training via Byte-level Simulation

Paper • 2604.27263 • Published May 14 • 11

gigant

authored a paper about 2 months ago

Efficient Pre-Training with Token Superposition

Paper • 2605.06546 • Published May 7 • 47

fcakyon

authored a paper about 2 months ago

SenBen: Sensitive Scene Graphs for Explainable Content Moderation

Paper • 2604.08819 • Published Apr 9 • 1

fcakyon

posted an update about 2 months ago

Post

156

Let me introduce you to our CVPR 2026 paper!

Today's content moderation systems give you a label: safe or unsafe. They don't tell you what triggered the decision, who is involved, or where in the image it happens. That opacity hurts auditing, breaks adaptation across platforms, and frustrates the human review that responsible deployment demands.

We built SenBen to fix this: the first large-scale scene graph benchmark designed specifically for sensitive content moderation:

- 13,999 annotated frames from 157 movies
- Visual Genome style scene graphs with bounding boxes, attributes, and predicates
- Affective state attributes (pain, fear, aggression, distress) so the model captures not just what is in the frame, but what it means
- 16 safety tags across 5 categories, the broadest taxonomy of any dataset of this kind

A small model that beats much bigger ones:

We distilled a frontier VLM into a compact 241M parameter student built on Florence-2.

On grounded scene graph metrics, the 241M student beats every evaluated VLM except Gemini, and every commercial safety API. It also wins on object detection and captioning across the entire model zoo. It runs at 733 ms per frame on 1.2 GB VRAM, which is 7.6 times faster than the next-best local VLM at zero per-frame cost. The whole benchmark, from dataset creation through all baseline evaluations, is reproducible for under $350.

Project: https://senben.kim/
Paper: SenBen: Sensitive Scene Graphs for Explainable Content Moderation (2604.08819)
Dataset: fcakyon/senben
Code (soon): https://github.com/fcakyon/senben

innovation64

authored a paper 5 months ago

BMAM: Brain-inspired Multi-Agent Memory Framework

Paper • 2601.20465 • Published Jan 28 • 6

innovation64

submitted a paper to Daily Papers 5 months ago

BMAM: Brain-inspired Multi-Agent Memory Framework

Paper • 2601.20465 • Published Jan 28 • 6

Paper99

authored 2 papers 7 months ago

Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer

Paper • 2511.22699 • Published Nov 27, 2025 • 248

Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield

Paper • 2511.22677 • Published Nov 27, 2025 • 36

hysts

in Gradio-Blocks/ViTPose 7 months ago

runtime error

#10 opened 7 months ago by

liangnanying

hysts

updated a Space 7 months ago

ViTPose

📊

Detect and visualize human poses in images

nouamanetazi

posted an update 8 months ago

Post

4961

After training 𝐒𝐦𝐨𝐥𝐋𝐌𝟑 on 𝟑𝟖𝟒 𝐇𝟏𝟎𝟎𝐬 for nearly a month, I've come to realize something most people overlook: 𝐢𝐧𝐟𝐫𝐚𝐬𝐭𝐫𝐮𝐜𝐭𝐮𝐫𝐞 𝐢𝐬 𝐭𝐡𝐞 𝐦𝐚𝐤𝐞-𝐨𝐫-𝐛𝐫𝐞𝐚𝐤 𝐟𝐚𝐜𝐭𝐨𝐫 𝐢𝐧 𝐋𝐋𝐌 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠. 🔥

Everyone talks about model architecture and data quality. And yes, those matter immensely. But here's what nobody tells you: when your training run fails at 2 AM because of mysterious 𝐍𝐂𝐂𝐋 𝐞𝐫𝐫𝐨𝐫𝐬, or when your expensive GPU cluster is running at 𝟔𝟎% 𝐞𝐟𝐟𝐢𝐜𝐢𝐞𝐧𝐜𝐲, the problem isn't your model. It's most probably a 𝐦𝐢𝐬𝐮𝐬𝐞 𝐨𝐟 𝐭𝐡𝐞 𝐡𝐚𝐫𝐝𝐰𝐚𝐫𝐞. 🛠️

Questions that seemed simple but had no clear answers: Why is 𝐌𝐨𝐄 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐬𝐥𝐨𝐰𝐞𝐫 𝐭𝐡𝐚𝐧 𝐝𝐞𝐧𝐬𝐞 𝐦𝐨𝐝𝐞𝐥𝐬? Which 𝐍𝐂𝐂𝐋 𝐟𝐥𝐚𝐠𝐬 should we actually set? How often should we checkpoint without killing throughput?

That's why we built 𝐓𝐡𝐞 𝐒𝐦𝐨𝐥 𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐏𝐥𝐚𝐲𝐛𝐨𝐨𝐤 📖: a complete guide covering everything from model architecture and data curation to the SmolLM3 training marathon, post-training techniques, and crucially, the 𝐢𝐧𝐟𝐫𝐚𝐬𝐭𝐫𝐮𝐜𝐭𝐮𝐫𝐞 𝐥𝐚𝐲𝐞𝐫 that most teams get wrong.

We validated real vs theoretical bandwidth across the entire stack: 𝐇𝐁𝐌𝟑 𝐡𝐢𝐭𝐭𝐢𝐧𝐠 𝟑 𝐓𝐁/𝐬, 𝐍𝐕𝐋𝐢𝐧𝐤 𝟒.𝟎 𝐫𝐞𝐚𝐜𝐡𝐢𝐧𝐠 𝟕𝟖𝟔 𝐆𝐁/𝐬, 𝐏𝐂𝐈𝐞 𝐆𝐞𝐧𝟒 𝐚𝐭 𝟏𝟒.𝟐 𝐆𝐁/𝐬. Then we ran collective operations across 𝟏𝟐𝟖 𝐆𝐏𝐔𝐬 (16 nodes, 8xH100s each) and measured how performance degrades at scale: all-reduce drops from 𝟒𝟖𝟎 𝐆𝐁/𝐬 on a single node to 𝟑𝟐𝟎-𝟑𝟓𝟎 𝐆𝐁/𝐬 across 16 nodes.

If you've ever wondered why your training runs are slower than they should be, or you're planning to scale up and want to avoid expensive mistakes, this guide might save you weeks of debugging.

𝐓𝐡𝐞 𝐒𝐦𝐨𝐥 𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐏𝐥𝐚𝐲𝐛𝐨𝐨𝐤: https://lnkd.in/e5MKXUHS

Shared with ❤️ by the HuggingFace team