3 4 13

Danil

DanyaTrolleybus

DanyaTrolleybus

AI & ML interests

None yet

Recent Activity

liked a model 12 days ago

TheDrummer/Skyfall-31B-v4.2

liked a model 12 days ago

zerofata/G4-MeroMero-31B

liked a model 12 days ago

zerofata/G4-MeroMero-31B-gguf

View all activity

Organizations

None yet

liked 3 models 12 days ago

liked a model 16 days ago

Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash

Image-Text-to-Text • 10B • Updated 13 days ago • 4.5k • 22

upvoted a collection 18 days ago

Portfolio 2026

Collection

7 items • Updated 23 days ago • 9

liked a model 19 days ago

bartowski/FINAL-Bench_Darwin-36B-Opus-GGUF

Text Generation • 35B • Updated 20 days ago • 12.4k • 33

New activity in bartowski/FINAL-Bench_Darwin-36B-Opus-GGUF 19 days ago

Thank YOU!) can you download mmproj to?

🔥 2

#1 opened 20 days ago by

Mithnick

reacted to omarkamali's post with 👍 about 1 month ago

Post

4571

We got Qwen 3.5 to count Rs in Strawberry correctly! 🚨

Building on Sawtone, we’ve been testing a different way to feed language into an LLM to build the next generation of multilingual AI.

The usual setup gives the model tokenized text and asks it to perform various linguistic tasks. That works surprisingly well, until it doesn’t. Accents disappear. Words get mangled. Internal structure gets blurred away. And the cost of that gets higher once you move into multilingual and lower-resource settings.

So we tried adding a second path.

In addition to the normal text input, the model also receives Sawtone: a byte-level word representation that preserves how a word is written, how it sounds, and how it is structured.

Same LLM. Better interface.

In this proof of concept with Qwen 3.5 0.8B, that pushed our eval from 64% to 88%. The gains showed up exactly where tokenized models usually get shaky: diacritics, character order, exact spelling, and other form-sensitive behavior.

Sawtone itself is tokenizer-free, byte-level, and pre-trained across 507 languages.

Still early, but promising!

5 replies

upvoted a changelog about 2 months ago

Hugging Face Changelog

Hugging Face Papers for AI Agents

Mar 18

• 140

liked a model 5 months ago

ussoewwin/Flash-Attention-2_for_Windows

Updated about 2 hours ago • 96

upvoted an article 5 months ago

Article

Transformers v5: Simple model definitions powering the AI ecosystem

lysandre, ArthurZ, cyrilvallez, reach-vb

•

Dec 1, 2025

• 311

reacted to hesamation's post with ❤️ 5 months ago

Post

4834

this is big... 50 AI researchers from Bytedance, Alibaba, Tencent, and other labs/universities just published a 300-page paper with surprising lessons about coding models and agents (data, pre and post-training, etc).

key highlights:

> small LLMs can beat proprietary giants
RL (RLVR specifically) gives small open-source models an edge over big models in reasoning. a 14B model trained with RLVR on high-quality verified problems can match the performance of OpenAI's o3.

> models have a hard time learning Python.
mixing language models during pre-training is good, but Python behaves different from statically typed languages. languages with similar syntax (Java and C#, or JavaScript and TypeScript) creates high positive synergy. mixing Python heavily into the training of statically typed languages can actually hurt because of Python's dynamic typing.

> not all languages are equal (coding scaling laws)
the amount of data required to specialize a model on a language drastically depends on the language. paper argues like C# and Java are easier to learn (less training data required). languages like Python and Javascript are actually more tricky to learn, ironically (you see AI most used for these languages :)

> MoE vs Dense (ability vs stability)
MoE models offer higher capacity, but are much more fragile during SFT than dense models. hyperparams in training have a more drastic effect in MoE models, while dense models are more stable. MoE models also require constant learning rate schedules to avoid routing instability.

> code models are "insecure" by default (duh)
training on public repos makes models learn years of accumulated insecure coding patterns. safety fine-tuning often fails to work much on code. a model might refuse to write a hate speech email but will happily generate a SQL-injection vulnerable function because it "works."

read the full paper:
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence (2511.18538)