ZeroGPU Explorers

community

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

tianweiy authored a paper 10 days ago

One-step Diffusion with Distribution Matching Distillation

wenbopan authored a paper 20 days ago

Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts

wenbopan authored a paper 23 days ago

The Hidden Dimensions of LLM Alignment: A Multi-Dimensional Analysis of Orthogonal Safety Directions

View all activity

yuntian-deng

submitted a paper to Daily Papers 2 days ago

Program-as-Weights: A Programming Paradigm for Fuzzy Functions

Paper • 2607.02512 • Published 4 days ago • 76

eienmojiki

posted an update 10 days ago

Post

141

Hi everyone,

I've created a Gradio space for embedding and extracting invisible watermarks in images:
👉 eienmojiki/blind-watermark-studio

It supports hiding text, images, and bit arrays using the DWT-DCT-SVD algorithm.

Credits:
- Original library: https://github.com/guofei9987/blind_watermark
- Author: Guo Fei

:).

yuntian-deng

authored 3 papers 27 days ago

Long Grounded Thoughts: Distilling Compositional Visual Reasoning Chains at Scale

Paper • 2511.05705 • Published Nov 7, 2025 • 10

DetectZoo: A Unified Toolkit for AI-Generated Content Detection Across Text, Audio, and Image Modalities

Paper • 2606.04205 • Published Jun 2

Code2LoRA: Hypernetwork-Generated Adapters for Code Language Models under Software Evolution

Paper • 2606.06492 • Published Jun 4 • 95

gagan3012

authored a paper about 1 month ago

Who Annotates in NLP? A Large-scale Assessment of Human Annotation Reporting between 2018 and 2025

Paper • 2606.02255 • Published Jun 1

gagan3012

submitted a paper to Daily Papers about 1 month ago

Who Annotates in NLP? A Large-scale Assessment of Human Annotation Reporting between 2018 and 2025

Paper • 2606.02255 • Published Jun 1

tianbaoxiexxx

authored 6 papers about 1 month ago

OS-MAP: How Far Can Computer-Using Agents Go in Breadth and Depth?

Paper • 2507.19132 • Published Jul 25, 2025

Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents

Paper • 2510.24702 • Published Oct 28, 2025 • 32

OSWorld-MCP: Benchmarking MCP Tool Invocation In Computer-Use Agents

Paper • 2510.24563 • Published Oct 28, 2025 • 23

Qwen3-VL Technical Report

Paper • 2511.21631 • Published Nov 26, 2025 • 164

RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System

Paper • 2602.02488 • Published Feb 2 • 36

CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents

Paper • 2605.25624 • Published May 25 • 34

Locutusque

posted an update about 1 month ago

Post

339

🚀 Introducing Esmeralda-Llama-3.1-8B-control
The first release in the Esmeralda model family by Locutusque.

This model is intentionally small and experimental — a control/baseline proof-of-concept designed to answer one question:

«“How strong is my new "Locutusque/esmeralda-agentic" dataset before scaling to larger runs?”»

Training Details

- Base: Llama 3.1 8B
- Training precision: bf16 mixed precision
- Chat template: modified ChatML
- Dataset size: ~37k examples
- Examples actually used for this run: ~5k

The dataset includes:

- multi-turn agentic traces
- reasoning traces
- structured assistant behavior
- generalist instruction data

Benchmark Results

Compared against:

- Llama 3.1 8B Instruct
- Hermes-3-Llama-3.1-8B

HumanEval

57.3 — Esmeralda
56.1 — Llama 3.1 Instruct
52.4 — Hermes-3

MBPP

53.2 — Esmeralda
56.8 — Llama 3.1 Instruct
48.2 — Hermes-3

GPQA Diamond

15.7 — Esmeralda
15.7 — Llama 3.1 Instruct
18.2 — Hermes-3

EQ-Bench

59.2 — Esmeralda
61.1 — Llama 3.1 Instruct
63.1 — Hermes-3

EQ-Bench Parseable (Syntax Stability)

🔥 100.0% — Esmeralda
92.4% — Llama 3.1 Instruct
91.2% — Hermes-3

Here Be Dragons 🐉

I also experimented with a new TruthfulQA free-generation evaluation setup.

- Responses were judged by Gemma 4 26B A4B
- The judge compared generations directly against ground-truth answers
- Models were evaluated in 8-bit quantized form to speed up inference

TruthfulQA (LLM Judge)

0.682 — Esmeralda-Llama-3.1-8B-control
0.587 — Hermes-3-Llama-3.1-8B (reported MC2 score; methodology differs)

For a lightweight control run trained on only a fraction of the dataset, I’m pretty encouraged by the results.

The model is released under the standard Llama 3.1 license, and I’d genuinely love feedback from people testing it in real workflows.

Model: Locutusque/Esmeralda-Llama-3.1-8B-control

Dataset: Locutusque/esmeralda-agentic

alvarobartt

posted an update about 1 month ago

Post

443

Open agents on AWS SageMaker AI with open models from the Hugging Face Hub!

> Deploy an open model from the Hugging Face Hub on SageMaker AI
> Connect the deployed model to Strands Agents
> Add built-in and custom tools for tool calling
> Expose external capabilities through MCP integration
> Bonus: talk to your agent and visualize traces with Gradio

https://alvarobartt.com/agents-on-aws-sagemaker

alvarobartt

posted an update about 2 months ago

Post

3346

Latest hf-mem release added a breakdown of Mixture-of-Experts (MoE) memory usage!

TL; DR MoEs can be misleading to reason about from active parameters alone, since each token only activates a subset of experts, while the serving setup still needs to account for the full resident memory footprint.

🧠 hf-mem now splits MoE memory into base model weights, routed experts, and KV cache
🏗️ Dense models usually load and use most weights every forward pass, while MoEs load many experts but only route each token to a few of them
⚡ Active params isn't the same as memory footprint, especially for sparse architectures
📦 Runtime memory is about what is used per request/token, while loading memory also includes the expert weights that need to be resident
📚 KV cache can still dominate depending on context length, batch size, and concurrency
🔀 Expert Parallelism (EP) helps shard experts across accelerators when expert weights dominate
🚀 Data Parallelism (DP) + EP is often a good fit for throughput-oriented MoE serving

Check the repository at https://github.com/alvarobartt/hf-mem

blanchon

posted an update about 2 months ago

Post

2765

I'm releasing OpenCS2 a 11TB dataset of around 5000 hours of counter strike gameplay recording.
- HD resolution - 1280×720 · 32 fps
- For each frame keyboard and mouse + world state (player position, velocity, weapon ...)
- HD Stereo audio
- All 10 players perspective

https://huggingface.co/collections/blanchon/opencs2

1 reply

seyedhamidreza

authored a paper about 2 months ago

A Single Neuron Is Sufficient to Bypass Safety Alignment in Large Language Models

Paper • 2605.08513 • Published May 8 • 16

seyedhamidreza

submitted a paper to Daily Papers about 2 months ago

A Single Neuron Is Sufficient to Bypass Safety Alignment in Large Language Models

Paper • 2605.08513 • Published May 8 • 16

anakin87

posted an update 2 months ago

Post

3405

A small model that struggled against a random opponent now beats GPT-5-mini at tic-tac-toe

I took LiquidAI/LFM2-2.6B and trained it through play.

🧑‍🍳 Here's how:

1️⃣ Build a solid RL env with Verifiers (Prime Intellect)
2️⃣ Generate synthetic data: <200 games sampled from GPT-5-mini playing in the env
3️⃣ SFT warm-up to teach format
4️⃣ Group-based RL (CISPO) against opponents making 20-70% random moves
5️⃣ RL again with stronger opponents (0-25% random moves) + 1.25 temperature to push exploration and shake off suboptimal strategies

Done! Beats GPT-5-mini 🏆

---

🎮 Play against the model: anakin87/LFM2-2.6B-mr-tictactoe

🤗 Model: anakin87/LFM2-2.6B-mr-tictactoe

📚 Walkthrough/course: https://github.com/anakin87/llm-rl-environments-lil-course

🤗 Dataset and checkpoints: https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe

AI & ML interests

Recent Activity

Team members 748

zero-gpu-explorers's activity