Activity Feed

AI & ML interests

None defined yet.

Recent Activity

eienmojikiย 
posted an update 10 days ago
Locutusqueย 
posted an update about 1 month ago
view post
Post
339
๐Ÿš€ Introducing Esmeralda-Llama-3.1-8B-control
The first release in the Esmeralda model family by Locutusque.

This model is intentionally small and experimental โ€” a control/baseline proof-of-concept designed to answer one question:

ยซโ€œHow strong is my new "Locutusque/esmeralda-agentic" dataset before scaling to larger runs?โ€ยป

Training Details

- Base: Llama 3.1 8B
- Training precision: bf16 mixed precision
- Chat template: modified ChatML
- Dataset size: ~37k examples
- Examples actually used for this run: ~5k

The dataset includes:

- multi-turn agentic traces
- reasoning traces
- structured assistant behavior
- generalist instruction data

Benchmark Results

Compared against:

- Llama 3.1 8B Instruct
- Hermes-3-Llama-3.1-8B

HumanEval

57.3 โ€” Esmeralda
56.1 โ€” Llama 3.1 Instruct
52.4 โ€” Hermes-3

MBPP

53.2 โ€” Esmeralda
56.8 โ€” Llama 3.1 Instruct
48.2 โ€” Hermes-3

GPQA Diamond

15.7 โ€” Esmeralda
15.7 โ€” Llama 3.1 Instruct
18.2 โ€” Hermes-3

EQ-Bench

59.2 โ€” Esmeralda
61.1 โ€” Llama 3.1 Instruct
63.1 โ€” Hermes-3

EQ-Bench Parseable (Syntax Stability)

๐Ÿ”ฅ 100.0% โ€” Esmeralda
92.4% โ€” Llama 3.1 Instruct
91.2% โ€” Hermes-3

Here Be Dragons ๐Ÿ‰

I also experimented with a new TruthfulQA free-generation evaluation setup.

- Responses were judged by Gemma 4 26B A4B
- The judge compared generations directly against ground-truth answers
- Models were evaluated in 8-bit quantized form to speed up inference

TruthfulQA (LLM Judge)

0.682 โ€” Esmeralda-Llama-3.1-8B-control
0.587 โ€” Hermes-3-Llama-3.1-8B (reported MC2 score; methodology differs)

For a lightweight control run trained on only a fraction of the dataset, Iโ€™m pretty encouraged by the results.

The model is released under the standard Llama 3.1 license, and Iโ€™d genuinely love feedback from people testing it in real workflows.

Model: Locutusque/Esmeralda-Llama-3.1-8B-control

Dataset: Locutusque/esmeralda-agentic

alvarobarttย 
posted an update about 1 month ago
view post
Post
443
Open agents on AWS SageMaker AI with open models from the Hugging Face Hub!

> Deploy an open model from the Hugging Face Hub on SageMaker AI
> Connect the deployed model to Strands Agents
> Add built-in and custom tools for tool calling
> Expose external capabilities through MCP integration
> Bonus: talk to your agent and visualize traces with Gradio

https://alvarobartt.com/agents-on-aws-sagemaker
alvarobarttย 
posted an update about 2 months ago
view post
Post
3346
Latest hf-mem release added a breakdown of Mixture-of-Experts (MoE) memory usage!

TL; DR MoEs can be misleading to reason about from active parameters alone, since each token only activates a subset of experts, while the serving setup still needs to account for the full resident memory footprint.

๐Ÿง  hf-mem now splits MoE memory into base model weights, routed experts, and KV cache
๐Ÿ—๏ธ Dense models usually load and use most weights every forward pass, while MoEs load many experts but only route each token to a few of them
โšก Active params isn't the same as memory footprint, especially for sparse architectures
๐Ÿ“ฆ Runtime memory is about what is used per request/token, while loading memory also includes the expert weights that need to be resident
๐Ÿ“š KV cache can still dominate depending on context length, batch size, and concurrency
๐Ÿ”€ Expert Parallelism (EP) helps shard experts across accelerators when expert weights dominate
๐Ÿš€ Data Parallelism (DP) + EP is often a good fit for throughput-oriented MoE serving

Check the repository at https://github.com/alvarobartt/hf-mem
blanchonย 
posted an update about 2 months ago
view post
Post
2765
I'm releasing OpenCS2 a 11TB dataset of around 5000 hours of counter strike gameplay recording.
- HD resolution - 1280ร—720 ยท 32 fps
- For each frame keyboard and mouse + world state (player position, velocity, weapon ...)
- HD Stereo audio
- All 10 players perspective

https://huggingface.co/collections/blanchon/opencs2
  • 1 reply
ยท
anakin87ย 
posted an update 2 months ago
view post
Post
3405
A small model that struggled against a random opponent now beats GPT-5-mini at tic-tac-toe

I took LiquidAI/LFM2-2.6B and trained it through play.

๐Ÿง‘โ€๐Ÿณ Here's how:

1๏ธโƒฃ Build a solid RL env with Verifiers (Prime Intellect)
2๏ธโƒฃ Generate synthetic data: <200 games sampled from GPT-5-mini playing in the env
3๏ธโƒฃ SFT warm-up to teach format
4๏ธโƒฃ Group-based RL (CISPO) against opponents making 20-70% random moves
5๏ธโƒฃ RL again with stronger opponents (0-25% random moves) + 1.25 temperature to push exploration and shake off suboptimal strategies

Done! Beats GPT-5-mini ๐Ÿ†

---

๐ŸŽฎ Play against the model: anakin87/LFM2-2.6B-mr-tictactoe

๐Ÿค— Model: anakin87/LFM2-2.6B-mr-tictactoe

๐Ÿ“š Walkthrough/course: https://github.com/anakin87/llm-rl-environments-lil-course

๐Ÿค— Dataset and checkpoints: https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe