AI & ML interests

None defined yet.

Ujjwal-TyagiΒ 
posted an update 6 days ago
view post
Post
2748
I am sharing my study material for AI & ML, these books are really a "bible" and gives very strong foundation, I also have given guidance, introduction and my master notes in the dataset repo card! I hope you will find them helpful, if you have any queries, just start a discussion and I am always there to help you out!
Ujjwal-Tyagi/ai-ml-foundations-book-collection
  • 3 replies
Β·
codelionΒ 
posted an update 27 days ago
view post
Post
3181
Scaling Pedagogical Pre-training to 10 Billion Tokens

New blog post exploring what happens when you take optimal data mixing insights and scale up the data generation itself.

We built Sutra, a multi-stage framework for generating pedagogical pre-training data guided by a knowledge graph of ~2,000 concepts across 9 domains. The pipeline includes structured content generation, six-dimension quality evaluation, diversity management across 20 content styles, and a cleaning stage to prevent collapse.

The result is codelion/sutra-10B, a 10.2 billion token pedagogical dataset with rich metadata (domain, complexity, prerequisites, quality scores) on every entry.

We trained codelion/SmolLM2-70M on it for 3 full epochs (30.6B tokens) on a single A10 GPU in ~78 hours.

Key finding: perplexity kept improving across epochs, but benchmark gains plateaued fast. At 70M parameters, the model hits a representational ceiling that more data alone can't break through.

Full writeup with comparisons against 7 other datasets, detailed benchmark breakdowns, and connections to recent work on synthetic data scaling, curriculum learning, and data mixing laws: https://huggingface.co/blog/codelion/scaling-pedagogical-pretraining-10-billion-tokens

All datasets at multiple scales (10M, 100M, 1B, 10B) plus seed concepts and an SFT variant are in the Sutra Pedagogical Datasets collection.
  • 2 replies
Β·
Ujjwal-TyagiΒ 
posted an update 29 days ago
view post
Post
406
We have now LTX 2.3 with more better visual quality and richer sound, check it out! Lightricks/LTX-2.3
Ujjwal-TyagiΒ 
posted an update about 1 month ago
view post
Post
2916
Public reports allege that Anthropic gobbled up trillions of tokens of copyrighted material and public data to build their castle. πŸ°πŸ“„ Now that they're sitting on top, they're begging for special laws to protect their profits while pulling the ladder up behind them. πŸͺœπŸš«

But the hypocrisy meter just broke! πŸ“‰ They are accusing Chinese labs like DeepSeek, Minimax, and Kimi of "huge distillation attacks. The Reality is that You can't just loot the entire internet's library, lock the door, and then sue everyone else for reading through the window. Stop trying to gatekeep the tech you didn't own in the first place. Read the complete article on it: https://huggingface.co/blog/Ujjwal-Tyagi/the-dark-underbelly-of-anthropic
  • 3 replies
Β·
Ujjwal-TyagiΒ 
posted an update about 2 months ago
view post
Post
224
Qwen 3.5 Model is here! Supporting 1m context length by default, It is giving much good performance and competitive to Claude Opus 4.6, Qwen/Qwen3.5-397B-A17B, here it's GGUF: unsloth/Qwen3.5-397B-A17B-GGUF, Follow me and turn on the notification for the latest news!
Ujjwal-TyagiΒ 
posted an update about 2 months ago
view post
Post
3030
GLM 5 is insane, it ranks #4 Globally!
  • 4 replies
Β·
Sri-Vigneshwar-DJΒ 
posted an update about 2 months ago
view post
Post
1436
Just released a new dataset designed for training reasoning models on Meta (Facebook/Instagram) advertising fatigue detection!

What is it? A GRPO (Group Relative Policy Optimization) training dataset with 200+ carefully crafted scenarios covering:

πŸ” Fatigue Signal Detection: CTR drops, CPM spikes, frequency analysis
🩺 Performance Diagnosis: Root cause analysis frameworks
πŸ“‹ Strategy: Creative refresh cadence, testing frameworks
πŸ“Š Analysis: ROI calculations, metric interpretation
Why GRPO? GRPO training helps models learn structured reasoning. Each response follows the <thinking> and <answer> format.

Check it out here: Sri-Vigneshwar-DJ/meta-fatigue-grpo-dataset
Ujjwal-TyagiΒ 
posted an update 2 months ago
Sri-Vigneshwar-DJΒ 
posted an update 2 months ago
view post
Post
228
πŸ™οΈ Hugging Face Community Post
Title: 🧬 Experimenting with "Dynamic Chaos" in Tamil SLMs

Hi everyone! I just published a new experimental study on Small Language Model (SLM) resilience.

I took the Qwen2.5-0.5B model and put it through a "Chaos Phase" to see how much weight data a tiny model can lose before its understanding of classical Tamil grammar breaks.

Key highlights of the study:

Target Data: Fine-tuned on the Thirukkural (1,330 couplets + modern explanations).
The Chaos Step: Applied 20% random weight pruning but implemented "Layer Protection" for the Token Embeddings and LM Head to keep the characters readable.
Compression: 4-bit (Q4_K_M) quantization for extreme efficiency.
Result: A surrealist classical Tamil model that is ultra-light (~300MB) and ultra-fast!

Check out the model and the experiment logic here: Sri-Vigneshwar-DJ/qwen-tamil-chaos-v1
codelionΒ 
posted an update 2 months ago
view post
Post
3242
Reverse Engineering a $500M Mystery: From HashHop to Memory-Augmented Language Models

I wrote a deep dive into how Magic AI's 100M token context window might work, starting from their HashHop benchmark and building up to MALM - a Memory-Augmented Language Model.

Key insight: treating each key as a single token enables perfect retrieval at unlimited context lengths.

The article covers:

- How HashHop works and why its perfect accuracy is suspicious
- Building a tokenized solver that achieves 100% accuracy
- Scaling to MALM for real code search tasks
- Why this approach could handle 100M+ tokens

Read the full article: https://huggingface.co/blog/codelion/reverse-engineering-magic-hashhop

Try the model: codelion/malm-165m

Code: https://github.com/codelion/hash-hop
  • 1 reply
Β·
Ujjwal-TyagiΒ 
posted an update 2 months ago
view post
Post
1818
There is a new open-source music generation model called HeartMuLa. It offers strong, competitive performance compared to Suno and supports English, Chinese, Japanese, Korean, and Spanish. It is optimized to run easily on RTX GPUs and other consumer-grade hardware. HeartMuLa/HeartMuLa-oss-3B
https://github.com/HeartMuLa/heartlib
  • 1 reply
Β·
Ujjwal-TyagiΒ 
posted an update 3 months ago
view post
Post
2791
So, Koreans are also doing great progress behind Chinese,
Their two open source ai models that are actually good in coding. upstage/Solar-Open-100B skt/A.X-K1
  • 1 reply
Β·
Ujjwal-TyagiΒ 
posted an update 3 months ago
Sri-Vigneshwar-DJΒ 
posted an update 3 months ago
view post
Post
322
Performance Marketing meets "Thinking Mode" 🧠

I’m excited to release hawky-ai-Qwen3-0.6B-Marketing-MoT, a specialized SLM designed for deep strategic reasoning in performance marketing.

While small at 0.6B parameters, this model punches way above its weight class by utilizing a Mixture of Thoughts (MoT) framework. It doesn't just give you an answer; it thinks through the logic of Meta Ads scaling, GA4 attribution, and unit economics before providing a strategic recommendation.

Key Features:

Thinking-First: Trained on 1,500+ critical thinking scenarios.
MoT Framework: 5 distinct reasoning styles (Linear, Exploratory, Critical, Deconstructive, Analogical).
SLM Speed: Perfect for low-latency, high-precision marketing audits.
Check it out on Hugging Face: πŸ”— Sri-Vigneshwar-DJ/hawky-ai-Qwen3-0.6B-Marketing-MoT
mmhamdyΒ 
posted an update 3 months ago
view post
Post
3119
The new DeepSeek Engram paper is super fun! It also integrates mHC, and I suspect they're probably releasing all these papers to make the V4 report of reasonable lengthπŸ˜„

Here's a nice short summary from Gemini
Ujjwal-TyagiΒ 
posted an update 3 months ago
view post
Post
2614
I am very excited to see the release of nyuuzyou/gitee-code. This is exactly what I have been looking for. Thank you to @nyuuzyou for his hard work on this.
  • 3 replies
Β·
Ujjwal-TyagiΒ 
posted an update 3 months ago
view post
Post
2310
I’m looking for AI engineers and researchers to join my company as part of the core team. We’ll be working on cutting-edge research and hands-on implementation across LLMs and related systems. I’m especially interested in founding engineers for my ai startup, who want to build from the ground up and shape both the product and the research direction. If this sounds interesting to you, reply to this post and message me on Discord β€” my username is "ujjwal_tyagi.shirova", Please also attach your Resume and Details of your open source projects (if any related to LLMs) on discord, avoid sharing here as a reply to this post.
Sri-Vigneshwar-DJΒ 
posted an update 3 months ago
view post
Post
2194
Introducing Hawky-AI H1 4B PM: The First Open-Source LLM for Performance Marketing 🎯

Hey HF Community! πŸ‘‹

Just released the first LLM fine-tuned specifically for Performance Marketing.
What is it?
Gemma 3 4B distilled from Claude Opus 4.5 with expert-level marketing knowledge.
Covers:
πŸ“± Meta Ads (campaign structure, bidding, scaling, creative fatigue)
πŸ” Google Ads (Quality Score, Performance Max, lead gen)
πŸ“Š Measurement (ROAS vs MER, incrementality, LTV:CAC)
🎨 Creative Strategy (hook rates, A/B testing, funnel creative)
Why we built it:
Generic LLMs say "optimize your targeting" β€” not helpful. This model gives specific frameworks like "frequency at 4.5 + CTR drop = creative fatigue, here's the fix..."
Technical:

Base: Gemma 3 4B
Method: QLoRA (r=64)
Teacher: Claude Opus 4.5

πŸ”— Model: Sri-Vigneshwar-DJ/hawky-ai-H1-4b-PM
Built by Hawky.ai

Try it and let us know what you think! πŸš€
Ujjwal-TyagiΒ 
posted an update 3 months ago
view post
Post
2692
For more better details and analysis, you can read the article here: https://huggingface.co/blog/Ujjwal-Tyagi/steering-not-censoring, We are sleepwalking into a crisis. I am deeply concerned about AI model safety right now because, as the community rushes to roll out increasingly powerful open-source models, we are completely neglecting the most critical aspect: safety. It seems that nobody is seriously thinking about the potential consequences of unregulated model outputs or the necessity of robust guardrails. We are essentially planting the seeds for our own destruction if we prioritize raw performance over security.

This negligence is terrifyingly evident when you look at the current landscape. Take Qwen Image 2512, for example; while it delivers undeniably strong performance, it has incredibly weak guardrails that make it dangerous to deploy. In stark contrast, Z Image might not get as much hype for its power, but it has much better safety guardrails than Qwen Image 2512.

It is imperative that the open-source community and developers recognize that capability without responsibility is a liability. We must actively work on protecting these models from bad actors who seek to exploit them for malicious purposes, such as generating disinformation, creating non-consensual imagery, or automating cyberattacks. It is no longer enough to simply release a powerful model; we must build layers of defense that make it resistant to jailbreaking and adversarial attacks. Developers need to prioritize alignment and robust filtering techniques just as much as they prioritize benchmark scores. We cannot hand such potent tools to the world without ensuring they have the safety mechanisms to prevent them from being turned against us.
  • 17 replies
Β·