In a Training Loop 🔄

Julien BLANCHON PRO

blanchon

https://guybrush.ink/

AI & ML interests

None yet

Recent Activity

published a dataset about 6 hours ago

blanchon/opencs2_dataset_frames_wds

published a dataset about 6 hours ago

blanchon/opencs2_dataset_preview_wds

liked a Space about 6 hours ago

fffiloni/spectrogram-to-music

View all activity

Organizations

replied to their post about 8 hours ago

Link to X thread/post: https://x.com/JulienBlanchon/status/2054519347574350115

reacted to spillai's post with 🔥 about 8 hours ago

Post

6302

mm-ctx – fast, multimodal context for agents.

LLM-based agents handle text incredibly well, but images, videos, or PDFs with visual content are hard to interpret. mm-ctx gives your CLI agent multi-modal skills.

Try it interactively in Spaces: vlm-run/mm-ctx

Readme: https://vlm-run.github.io/mm/
PyPI: https://pypi.org/project/mm-ctx
SKILL.md: https://github.com/vlm-run/skills/blob/main/skills/mm-cli-skill/SKILL.md

mm-ctx is meant to feel familiar: the UNIX tools we already love (find/cat/grep/wc), rebuilt for file types LLMs can't read natively and designed to work with agents via the CLI.
- mm grep "invoice #1234" ~/Downloads searches across PDFs and returns line-numbered matches
- mm cat <document>.pdf returns a metadata description of the file
- mm cat <photo>.jpg returns a caption of the photo
- mm cat <video>.mp4 returns a caption of the video

A few things we obsessed over:
⚡ Speed: Rust core for the hot paths
🏠 Local-first, BYO model: Uses any OpenAI-compatible endpoint: Ollama, vLLM/SGLang, LMStudio with any multimodal LLM (Gemma4, Qwen3.5, GLM-4.6V).
🔗 Composable: stdin + structured outputs
🤖 Drops into any agent via mm-cli-skills: Claude Code, Codex, Gemini CLI, OpenClaw.

We’d love to hear your feedback! Especially on the CLI and what file types and workflows you would like to see next.

2 replies

posted an update about 8 hours ago

Post

I'm releasing OpenCS2 a 11TB dataset of around 5000 hours of counter strike gameplay recording.
- HD resolution - 1280×720 · 32 fps
- For each frame keyboard and mouse + world state (player position, velocity, weapon ...)
- HD Stereo audio
- All 10 players perspective

https://huggingface.co/collections/blanchon/opencs2

1 reply

reacted to projectlosangeles's post with 🔥❤️ 4 months ago

Post

1906

Check out Orpheus Karaoke! Turn any MIDI into a unique Karaoke MIDI!

projectlosangeles/Orpheus-Karaoke

replied to Tonic's post 6 months ago

Hey @Tonic , I'm absolutly not related with the Liquid AI team. But happy to chat anytime (you can PM me on X maybe) !

reacted to vikhyatk's post with 🔥 6 months ago

Post

5017

Announcing RefCOCO-M, a refreshed RefCOCO with pixel-accurate masks and the problematic prompts removed.

moondream/refcoco-m

reacted to Kseniase's post with 🔥 6 months ago

Post

11245

11 Fascinating new Policy Optimization techniques

Policy optimization (PO) algorithms are central to training AI models with preference-based feedback. In recent weeks, numerous new PO methods have emerged that build on or replace the popular PPO and GRPO, solving their issues. Here are 11 of them:

1. BAlanced Policy Optimization (BAPO) → BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping (2510.18927)
Dynamically adjusting the clipping bounds in PPO-style updates to balance positive and negative gradients and prevent entropy collapse

2. Training-Free GRPO → Training-Free Group Relative Policy Optimization (2510.08191)
Instead of using numeric rewards, it compares rollouts semantically to distill useful knowledge as a token prior, which is then applied during inference to guide the model’s behavior

3. Asymmetric Importance Sampling Policy Optimization (ASPO) → ASPO: Asymmetric Importance Sampling Policy Optimization (2510.06062)
Fixes imbalanced token weighting in LLM training. It flips the importance sampling ratios for positive tokens to correct over- and under-updates, and adds a soft dual-clipping step to keep gradients stable

4. In-Context Steered Policy Optimization (ICPO) → https://arxiv.org/abs/2510.26519
Uses a model’s own in-context learning ability to guide training with existing data. It combines Mixed-Policy GRPO with Implicit Expert Forcing to expand exploration and adds Expert Region Reject Sampling and Annealed Expert-Bonus Reward Shaping to ensure stability and balanced expert influence

5. Graph-Enhanced Policy Optimization (GEPO) → https://arxiv.org/abs/2510.26270
Builds a graph of an agent’s experiences to understand how different states connect, guide exploration and assign rewards more effectively

6. Information Gain-based Policy Optimization (IGPO) → Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn LLM Agents (2510.14967)
Uses the model’s own belief updates to create dense, informative feedback for smoother multi-turn learning

Read further below ⬇️
If you like this, also subscribe to the Turing post: https://www.turingpost.com/subscribe

2 replies

replied to piercus's post 6 months ago

Amazing ! Any spaces to try this out quickly ?

reacted to piercus's post with 🔥👍 6 months ago

Post

4033

Starts erasing! 🎉 🎉 🎉
This is made with a one-step SD1.5 LBM [1] eraser !

Data is open. Data pipeline is open. Training code is open.
On our LBM fork : https://github.com/finegrain-ai/LBM

[1] LBM: Latent Bridge Matching for Fast Image-to-Image Translation (2503.07535)

1 reply

replied to mrfakename's post 7 months ago

LAION data is all you need xd

reacted to mrfakename's post with 🔥 7 months ago

Post

6371

Trained a model for emotion-controllable TTS based on MiMo audio on LAION's dataset.

Still very early and does have an issue with hallucinating but results seem pretty good so far, given that it is very early into the training run.

Will probably kick off a new run later with some settings tweaked.

Put up a demo here: https://huggingface.co/spaces/mrfakename/EmoAct-MiMo

(Turn 🔊 on to hear audio samples)

5 replies

reacted to AdinaY's post with 🔥 7 months ago

Post

4438

At the close of the National Holiday🇨🇳, Antgroup drops a new SoTA model.

Ling-1T 🔥 the trillion-parameter flagship of the Ling 2.0 series.

inclusionAI/Ling-1T

✨1T total / 50B active params per token
✨20T+ reasoning-dense tokens (Evo-CoT)
✨128K context via YaRN
✨FP8 training: 15%+ faster, same precision as BF16
✨Hybrid Syntax-Function-Aesthetics reward for front-end & visual generation

1 reply

reacted to AdinaY's post with 🔥 10 months ago

Post

1792

Qwen is on fire this week 🔥
They just released Qwen3-MT 🌍 a translation model supports 92 languages.

Demo is available on the hub.
Qwen/Qwen3-MT-Demo

✨ Highly Customizable: Supports custom terms, domain prompts, and translation memory for accurate, context-aware results.
✨ Fast and affordable: $0.5 per million tokens.

reacted to AdinaY's post with 🚀 10 months ago

Post

2025

The Chinese Open Source Heatmap is live 🔥
You can now track the companies/ research labs/ communities powering China’s open source AI movement.

zh-ai-community/model-release-heatmap-zh

Some highlights:

✨Giant Tech are investing more in open source.
-Alibaba: Full stack open ecosystem
-Tecent: Hunyuan image/video/3D
-Bytedance: Catching up fast in 2025
-Baidu: New player in open LLM

✨New players emerging post–DeepSeek moment.
-Xiaomi
-Red Note
-Bilibili
-MiniMax
-Moonshot AI

✨Startup list is shifting fast! Those who find a direction aligned with their strengths are the ones who endure.
-DeepSeek
-MiniMax
-StepFun
-Moonshot AI
-Zhipu AI
-OpenBMB

✨Research Lab & Community are making key contributions.
-BAAI
-Shanghai AI Lab
-OpenMOSS
-MAP

4 replies

reacted to ArturoNereu's post with 🔥❤️ about 1 year ago

Post

4499

I’ve been learning AI for several years (coming from the games industry), and along the way, I curated a list of the tools, courses, books, papers, and models that actually helped me understand things.

I turned this into a GitHub repo:
https://github.com/ArturoNereu/AI-Study-Group

If you’re just getting started, I recommend:

📘 Deep Learning – A Visual Approach: https://www.glassner.com/portfolio/deep-learning-a-visual-approach
🎥 Dive into LLMs with Andrej Karpathy: https://youtu.be/7xTGNNLPyMI?si=aUTq_qUzyUx36BsT
🧠 The 🤗 Agents course](https://huggingface.co/learn/agents-course/

The repo has grown with help from the community (Reddit, Discord, etc.) and I’ll keep updating it.

If you have any favorite resources, I’d love to include them.

replied to dylanebert's post over 1 year ago

I really like the style of your 1 minute video. I still remember the one you did for 3DGS a long time ago

reacted to dylanebert's post with 🔥 over 1 year ago

Post

3415

I made a 1 minute video explaining the DeepSeek situation

R1: deepseek-ai/DeepSeek-R1
Janus Pro: deepseek-ai/Janus-Pro-7B

3 replies

Julien BLANCHON PRO

AI & ML interests

Recent Activity

Organizations

blanchon's activity