Link to X thread/post: https://x.com/JulienBlanchon/status/2054519347574350115
Julien BLANCHON PRO
blanchon
AI & ML interests
None yet
Recent Activity
published a dataset about 6 hours ago
blanchon/opencs2_dataset_frames_wds published a dataset about 6 hours ago
blanchon/opencs2_dataset_preview_wds liked a Space about 6 hours ago
fffiloni/spectrogram-to-musicOrganizations
replied to their post about 8 hours ago
reacted to spillai's post with 🔥 about 8 hours ago
Post
6302
mm-ctx – fast, multimodal context for agents.
LLM-based agents handle text incredibly well, but images, videos, or PDFs with visual content are hard to interpret. mm-ctx gives your CLI agent multi-modal skills.
Try it interactively in Spaces: vlm-run/mm-ctx
Readme: https://vlm-run.github.io/mm/
PyPI: https://pypi.org/project/mm-ctx
SKILL.md: https://github.com/vlm-run/skills/blob/main/skills/mm-cli-skill/SKILL.md
mm-ctx is meant to feel familiar: the UNIX tools we already love (find/cat/grep/wc), rebuilt for file types LLMs can't read natively and designed to work with agents via the CLI.
- mm grep "invoice #1234" ~/Downloads searches across PDFs and returns line-numbered matches
- mm cat <document>.pdf returns a metadata description of the file
- mm cat <photo>.jpg returns a caption of the photo
- mm cat <video>.mp4 returns a caption of the video
A few things we obsessed over:
⚡ Speed: Rust core for the hot paths
🏠 Local-first, BYO model: Uses any OpenAI-compatible endpoint: Ollama, vLLM/SGLang, LMStudio with any multimodal LLM (Gemma4, Qwen3.5, GLM-4.6V).
🔗 Composable: stdin + structured outputs
🤖 Drops into any agent via mm-cli-skills: Claude Code, Codex, Gemini CLI, OpenClaw.
We’d love to hear your feedback! Especially on the CLI and what file types and workflows you would like to see next.
LLM-based agents handle text incredibly well, but images, videos, or PDFs with visual content are hard to interpret. mm-ctx gives your CLI agent multi-modal skills.
Try it interactively in Spaces: vlm-run/mm-ctx
Readme: https://vlm-run.github.io/mm/
PyPI: https://pypi.org/project/mm-ctx
SKILL.md: https://github.com/vlm-run/skills/blob/main/skills/mm-cli-skill/SKILL.md
mm-ctx is meant to feel familiar: the UNIX tools we already love (find/cat/grep/wc), rebuilt for file types LLMs can't read natively and designed to work with agents via the CLI.
- mm grep "invoice #1234" ~/Downloads searches across PDFs and returns line-numbered matches
- mm cat <document>.pdf returns a metadata description of the file
- mm cat <photo>.jpg returns a caption of the photo
- mm cat <video>.mp4 returns a caption of the video
A few things we obsessed over:
⚡ Speed: Rust core for the hot paths
🏠 Local-first, BYO model: Uses any OpenAI-compatible endpoint: Ollama, vLLM/SGLang, LMStudio with any multimodal LLM (Gemma4, Qwen3.5, GLM-4.6V).
🔗 Composable: stdin + structured outputs
🤖 Drops into any agent via mm-cli-skills: Claude Code, Codex, Gemini CLI, OpenClaw.
We’d love to hear your feedback! Especially on the CLI and what file types and workflows you would like to see next.
posted an update about 8 hours ago
Post
32
I'm releasing OpenCS2 a 11TB dataset of around 5000 hours of counter strike gameplay recording.
- HD resolution - 1280×720 · 32 fps
- For each frame keyboard and mouse + world state (player position, velocity, weapon ...)
- HD Stereo audio
- All 10 players perspective
https://huggingface.co/collections/blanchon/opencs2
- HD resolution - 1280×720 · 32 fps
- For each frame keyboard and mouse + world state (player position, velocity, weapon ...)
- HD Stereo audio
- All 10 players perspective
https://huggingface.co/collections/blanchon/opencs2
reacted to projectlosangeles's post with 🔥❤️ 4 months ago
Post
1906
Check out Orpheus Karaoke! Turn any MIDI into a unique Karaoke MIDI!
projectlosangeles/Orpheus-Karaoke
projectlosangeles/Orpheus-Karaoke
Hey @Tonic , I'm absolutly not related with the Liquid AI team. But happy to chat anytime (you can PM me on X maybe) !
reacted to vikhyatk's post with 🔥 6 months ago
Post
5017
Announcing RefCOCO-M, a refreshed RefCOCO with pixel-accurate masks and the problematic prompts removed.
moondream/refcoco-m
moondream/refcoco-m
reacted to Kseniase's post with 🔥 6 months ago
Post
11245
11 Fascinating new Policy Optimization techniques
Policy optimization (PO) algorithms are central to training AI models with preference-based feedback. In recent weeks, numerous new PO methods have emerged that build on or replace the popular PPO and GRPO, solving their issues. Here are 11 of them:
1. BAlanced Policy Optimization (BAPO) → BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping (2510.18927)
Dynamically adjusting the clipping bounds in PPO-style updates to balance positive and negative gradients and prevent entropy collapse
2. Training-Free GRPO → Training-Free Group Relative Policy Optimization (2510.08191)
Instead of using numeric rewards, it compares rollouts semantically to distill useful knowledge as a token prior, which is then applied during inference to guide the model’s behavior
3. Asymmetric Importance Sampling Policy Optimization (ASPO) → ASPO: Asymmetric Importance Sampling Policy Optimization (2510.06062)
Fixes imbalanced token weighting in LLM training. It flips the importance sampling ratios for positive tokens to correct over- and under-updates, and adds a soft dual-clipping step to keep gradients stable
4. In-Context Steered Policy Optimization (ICPO) → https://arxiv.org/abs/2510.26519
Uses a model’s own in-context learning ability to guide training with existing data. It combines Mixed-Policy GRPO with Implicit Expert Forcing to expand exploration and adds Expert Region Reject Sampling and Annealed Expert-Bonus Reward Shaping to ensure stability and balanced expert influence
5. Graph-Enhanced Policy Optimization (GEPO) → https://arxiv.org/abs/2510.26270
Builds a graph of an agent’s experiences to understand how different states connect, guide exploration and assign rewards more effectively
6. Information Gain-based Policy Optimization (IGPO) → Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn LLM Agents (2510.14967)
Uses the model’s own belief updates to create dense, informative feedback for smoother multi-turn learning
Read further below ⬇️
If you like this, also subscribe to the Turing post: https://www.turingpost.com/subscribe
Policy optimization (PO) algorithms are central to training AI models with preference-based feedback. In recent weeks, numerous new PO methods have emerged that build on or replace the popular PPO and GRPO, solving their issues. Here are 11 of them:
1. BAlanced Policy Optimization (BAPO) → BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping (2510.18927)
Dynamically adjusting the clipping bounds in PPO-style updates to balance positive and negative gradients and prevent entropy collapse
2. Training-Free GRPO → Training-Free Group Relative Policy Optimization (2510.08191)
Instead of using numeric rewards, it compares rollouts semantically to distill useful knowledge as a token prior, which is then applied during inference to guide the model’s behavior
3. Asymmetric Importance Sampling Policy Optimization (ASPO) → ASPO: Asymmetric Importance Sampling Policy Optimization (2510.06062)
Fixes imbalanced token weighting in LLM training. It flips the importance sampling ratios for positive tokens to correct over- and under-updates, and adds a soft dual-clipping step to keep gradients stable
4. In-Context Steered Policy Optimization (ICPO) → https://arxiv.org/abs/2510.26519
Uses a model’s own in-context learning ability to guide training with existing data. It combines Mixed-Policy GRPO with Implicit Expert Forcing to expand exploration and adds Expert Region Reject Sampling and Annealed Expert-Bonus Reward Shaping to ensure stability and balanced expert influence
5. Graph-Enhanced Policy Optimization (GEPO) → https://arxiv.org/abs/2510.26270
Builds a graph of an agent’s experiences to understand how different states connect, guide exploration and assign rewards more effectively
6. Information Gain-based Policy Optimization (IGPO) → Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn LLM Agents (2510.14967)
Uses the model’s own belief updates to create dense, informative feedback for smoother multi-turn learning
Read further below ⬇️
If you like this, also subscribe to the Turing post: https://www.turingpost.com/subscribe
Amazing ! Any spaces to try this out quickly ?
reacted to piercus's post with 🔥👍 6 months ago
Post
4033
Starts erasing! 🎉 🎉 🎉
This is made with a one-step SD1.5 LBM [1] eraser !
Data is open. Data pipeline is open. Training code is open.
On our LBM fork : https://github.com/finegrain-ai/LBM
[1] LBM: Latent Bridge Matching for Fast Image-to-Image Translation (2503.07535)
This is made with a one-step SD1.5 LBM [1] eraser !
Data is open. Data pipeline is open. Training code is open.
On our LBM fork : https://github.com/finegrain-ai/LBM
[1] LBM: Latent Bridge Matching for Fast Image-to-Image Translation (2503.07535)
replied to mrfakename's post 7 months ago
LAION data is all you need xd
reacted to mrfakename's post with 🔥 7 months ago
Post
6371
Trained a model for emotion-controllable TTS based on MiMo audio on LAION's dataset.
Still very early and does have an issue with hallucinating but results seem pretty good so far, given that it is very early into the training run.
Will probably kick off a new run later with some settings tweaked.
Put up a demo here: https://huggingface.co/spaces/mrfakename/EmoAct-MiMo
(Turn 🔊 on to hear audio samples)
Still very early and does have an issue with hallucinating but results seem pretty good so far, given that it is very early into the training run.
Will probably kick off a new run later with some settings tweaked.
Put up a demo here: https://huggingface.co/spaces/mrfakename/EmoAct-MiMo
(Turn 🔊 on to hear audio samples)
reacted to AdinaY's post with 🔥 7 months ago
Post
4438
At the close of the National Holiday🇨🇳, Antgroup drops a new SoTA model.
Ling-1T 🔥 the trillion-parameter flagship of the Ling 2.0 series.
inclusionAI/Ling-1T
✨1T total / 50B active params per token
✨20T+ reasoning-dense tokens (Evo-CoT)
✨128K context via YaRN
✨FP8 training: 15%+ faster, same precision as BF16
✨Hybrid Syntax-Function-Aesthetics reward for front-end & visual generation
Ling-1T 🔥 the trillion-parameter flagship of the Ling 2.0 series.
inclusionAI/Ling-1T
✨1T total / 50B active params per token
✨20T+ reasoning-dense tokens (Evo-CoT)
✨128K context via YaRN
✨FP8 training: 15%+ faster, same precision as BF16
✨Hybrid Syntax-Function-Aesthetics reward for front-end & visual generation
reacted to AdinaY's post with 🔥 10 months ago
Post
1792
Qwen is on fire this week 🔥
They just released Qwen3-MT 🌍 a translation model supports 92 languages.
Demo is available on the hub.
Qwen/Qwen3-MT-Demo
✨ Highly Customizable: Supports custom terms, domain prompts, and translation memory for accurate, context-aware results.
✨ Fast and affordable: $0.5 per million tokens.
They just released Qwen3-MT 🌍 a translation model supports 92 languages.
Demo is available on the hub.
Qwen/Qwen3-MT-Demo
✨ Highly Customizable: Supports custom terms, domain prompts, and translation memory for accurate, context-aware results.
✨ Fast and affordable: $0.5 per million tokens.
reacted to AdinaY's post with 🚀 10 months ago
Post
2025
The Chinese Open Source Heatmap is live 🔥
You can now track the companies/ research labs/ communities powering China’s open source AI movement.
zh-ai-community/model-release-heatmap-zh
Some highlights:
✨Giant Tech are investing more in open source.
-Alibaba: Full stack open ecosystem
-Tecent: Hunyuan image/video/3D
-Bytedance: Catching up fast in 2025
-Baidu: New player in open LLM
✨New players emerging post–DeepSeek moment.
-Xiaomi
-Red Note
-Bilibili
-MiniMax
-Moonshot AI
✨Startup list is shifting fast! Those who find a direction aligned with their strengths are the ones who endure.
-DeepSeek
-MiniMax
-StepFun
-Moonshot AI
-Zhipu AI
-OpenBMB
✨Research Lab & Community are making key contributions.
-BAAI
-Shanghai AI Lab
-OpenMOSS
-MAP
You can now track the companies/ research labs/ communities powering China’s open source AI movement.
zh-ai-community/model-release-heatmap-zh
Some highlights:
✨Giant Tech are investing more in open source.
-Alibaba: Full stack open ecosystem
-Tecent: Hunyuan image/video/3D
-Bytedance: Catching up fast in 2025
-Baidu: New player in open LLM
✨New players emerging post–DeepSeek moment.
-Xiaomi
-Red Note
-Bilibili
-MiniMax
-Moonshot AI
✨Startup list is shifting fast! Those who find a direction aligned with their strengths are the ones who endure.
-DeepSeek
-MiniMax
-StepFun
-Moonshot AI
-Zhipu AI
-OpenBMB
✨Research Lab & Community are making key contributions.
-BAAI
-Shanghai AI Lab
-OpenMOSS
-MAP
reacted to ArturoNereu's post with 🔥❤️ about 1 year ago
Post
4499
I’ve been learning AI for several years (coming from the games industry), and along the way, I curated a list of the tools, courses, books, papers, and models that actually helped me understand things.
I turned this into a GitHub repo:
https://github.com/ArturoNereu/AI-Study-Group
If you’re just getting started, I recommend:
📘 Deep Learning – A Visual Approach: https://www.glassner.com/portfolio/deep-learning-a-visual-approach
🎥 Dive into LLMs with Andrej Karpathy: https://youtu.be/7xTGNNLPyMI?si=aUTq_qUzyUx36BsT
🧠 The 🤗 Agents course](https://huggingface.co/learn/agents-course/
The repo has grown with help from the community (Reddit, Discord, etc.) and I’ll keep updating it.
If you have any favorite resources, I’d love to include them.
I turned this into a GitHub repo:
https://github.com/ArturoNereu/AI-Study-Group
If you’re just getting started, I recommend:
📘 Deep Learning – A Visual Approach: https://www.glassner.com/portfolio/deep-learning-a-visual-approach
🎥 Dive into LLMs with Andrej Karpathy: https://youtu.be/7xTGNNLPyMI?si=aUTq_qUzyUx36BsT
🧠 The 🤗 Agents course](https://huggingface.co/learn/agents-course/
The repo has grown with help from the community (Reddit, Discord, etc.) and I’ll keep updating it.
If you have any favorite resources, I’d love to include them.
replied to dylanebert's post over 1 year ago
I really like the style of your 1 minute video. I still remember the one you did for 3DGS a long time ago
reacted to dylanebert's post with 🔥 over 1 year ago
Post
3415
I made a 1 minute video explaining the DeepSeek situation
R1: deepseek-ai/DeepSeek-R1
Janus Pro: deepseek-ai/Janus-Pro-7B
R1: deepseek-ai/DeepSeek-R1
Janus Pro: deepseek-ai/Janus-Pro-7B