AI & ML interests

None defined yet.

Recent Activity

rajkumarrawalย 
posted an update about 4 hours ago
view post
Post
121
I submitted a "FlashLabs Chroma 1.0: A Real-Time End-to-End Spoken Dialogue Model with Personalized Voice Cloning" Paper by Tanyu Chen, Tairan Chen, Kai shen , Zhenghua Bao, Zhihui Zhang, Man Yuan, Yi Shi From
FlashLabs
to Daily Papers on
huggingface
.

Chroma 1.0 enables real time spoken dialogue with personalized voice cloning through discrete speech representations and interleaved text audio token scheduling.

Chroma 1.0 , the worldโ€™s first open source, real time speech to speech model with voice cloning.

FlashLabs Chroma 1.0: A Real-Time End-to-End Spoken Dialogue Model with Personalized Voice Cloning (2601.11141)
codelionย 
posted an update 10 days ago
view post
Post
3023
Reverse Engineering a $500M Mystery: From HashHop to Memory-Augmented Language Models

I wrote a deep dive into how Magic AI's 100M token context window might work, starting from their HashHop benchmark and building up to MALM - a Memory-Augmented Language Model.

Key insight: treating each key as a single token enables perfect retrieval at unlimited context lengths.

The article covers:

- How HashHop works and why its perfect accuracy is suspicious
- Building a tokenized solver that achieves 100% accuracy
- Scaling to MALM for real code search tasks
- Why this approach could handle 100M+ tokens

Read the full article: https://huggingface.co/blog/codelion/reverse-engineering-magic-hashhop

Try the model: codelion/malm-165m

Code: https://github.com/codelion/hash-hop
  • 1 reply
ยท
rajkumarrawalย 
posted an update 12 days ago
view post
Post
849
I submitted a "AgencyBench: Benchmarking the Frontiers of Autonomous Agents in 1M-Token Real-World Contexts" Paper by @weizhihao1KeyuLi Junhao shi @dqwangDequan Wang @YangXiao-nlpYang Xiao Mohan Jiang @Sunshine279Jie Sun Yunze Wu Shijie Xia Xiaojie Cai Tianze Xu Weiye Si Wenjie Li Pengfei Liu From
SJTU
Shanghai Jiao Tong University
PolyUHK
The Hong Kong Polytechnic University GAIRSII-GAIR to Daily Papers on huggingfaceHugging Face.

A potentially another direction for Benchmarking the Frontiers of Autonomous Agents in 2026

Some of the observations founded are :-

-- Long-horizon tasks remain challenging :
Even frontier models struggle with sustained reasoning over real world tasks that require 1M tokens and 90 tool calls, indicating limits in long context autonomy.

-- Proprietary models outperform open source models:
Closed source models achieve a higher average score (48.4%) than open source counterparts (32.1%), revealing a persistent performance gap on complex agentic tasks.

-- Feedback driven self correction varies widely:
Models like GPT 5.2 and Claude show strong gains from iterative feedback, while others (e.g. DeepSeek V3.2) exhibit minimal or no improvement after feedback.

-- Efficiency trade offs are significant:
High performing models often consume far more tokens and time, some models (e.g. Grok 4.1 Fast) are more token efficient despite lower absolute scores.

-- Agentic scaffolds strongly influence performance:
Models tend to perform best within their native or optimized ecosystems, highlighting that agent performance depends on tight coupling between the model and its scaffold not the model alone.

..... many more...

AgencyBench: Benchmarking the Frontiers of Autonomous Agents in 1M-Token Real-World Contexts (2601.11044)
  • 1 reply
ยท
rajkumarrawalย 
posted an update 17 days ago
codelionย 
posted an update about 1 month ago
view post
Post
6072
Introducing Dhara-70M: A diffusion language model that achieves 3.8x higher throughput than autoregressive models!

Key findings from our research on optimal architectures for small language models:

โ†’ Depth beats width: 32 layers outperforms 12 layers at the same parameter count
โ†’ Best-in-class factuality: 47.5% on TruthfulQA
โ†’ 10x training efficiency using WSD (Warmup-Stable-Decay) conversion
โ†’ Canon layers add only 0.13% parameters but improve reasoning

We trained on 1B tokens using the optimal 50-30-20 dataset mix (PDFs + filtered web + educational content), then converted to diffusion with just 100M additional tokens.

Blog: https://huggingface.co/blog/codelion/optimal-model-architecture
Model: codelion/dhara-70m
  • 1 reply
ยท
codelionย 
posted an update about 1 month ago
view post
Post
2403
Introducing PTS Visualizer - an interactive tool for exploring how language models reason!

Visualize pivotal tokens, thought anchors, and reasoning circuits. See which tokens and sentences significantly impact success probability, explore embedding clusters, and trace reasoning step-by-step.

Try it: codelion/pts-visualizer

Explore PTS datasets:
- Qwen3-0.6B: codelion/Qwen3-0.6B-pts
- DeepSeek-R1: codelion/DeepSeek-R1-Distill-Qwen-1.5B-pts

Or upload your own JSONL files!

GitHub: https://github.com/codelion/pts
rajkumarrawalย 
posted an update about 2 months ago
view post
Post
1917
" An open standardized protocol enabling communication for autonomous robots to exchange data, coordinate tasks, and collaborate in real-time environments in the age of AI ". r2r-protocol (Robot2Robot Protocol) is now officially open source! ๐Ÿ”“

"pip install r2r-protocol"

Whether you're a developer, researcher, or tech enthusiast, we invite you to explore, use, and contribute to the project.

๐Ÿ”— Check it out here: [ https://github.com/Tech-Parivartan/r2r-protocol?tab=readme-ov-file ]

Letโ€™s build the future together! ๐Ÿ’ก
AiParivartanResearchLab
techparivartan


Documentation of the r2r-protocal : [ https://techparivartanai.notion.site/Robot-to-Robot-r2r-Protocol-1f008f0fb18780439d70e8b9bbbdb869 ]

The R2R Protocol enables seamless robot-to-robot interaction across industrial automation, swarm robotics, logistics, and multi-agent systems. It defines structured message formats, negotiation logic, discovery mechanisms, and extensible APIs.

#r2r_protocol #robot2robot_protocol #ai #aiparivartanresearchlab #techparivartan

https://huggingface.co/blog/rajkumarrawal/rawalraj
codelionย 
posted an update about 2 months ago
view post
Post
2596
Recently, Essential AI released a new 8B base model EssentialAI/rnj-1 they highlighted the importance of data mix for pretraning -

"In the long run, we expect our methods to automatically represent, transform, and blend data to optimize measurable abilities in pre-training. Our work on modeling data taxonomies led to new approaches for jointly clustering and mixing data distributions under data repetition penalties. Many improvements in our STEM abilities can be traced back to this. "

This resonates with the recent work we did around optimal dataset mixing for pretraining where we saw have the right mix can increase the efficiency of training -
https://huggingface.co/blog/codelion/optimal-dataset-mixing
rajkumarrawalย 
posted an update about 2 months ago
view post
Post
2838
September(2025) LLM Core Knowledge & Reasoning Benchmarks Report By
AiParivartanResearchLab
(AIPRL-LIR)

Monthly LLM's Intelligence Reports for AI Decision Makers :
Our "aiprl-llm-intelligence-report" repo to establishes (AIPRL-LIR) framework for Large Language Model overall evaluation and analysis through systematic monthly intelligence reports. Unlike typical AI research papers or commercial reports. It provides structured insights into AI model performance, benchmarking methodologies, Multi-hosting provider analysis, industry trends ...

( all in one monthly report ) Leading Models & Companies, 23 Benchmarks in 6 Categories, Global Hosting Providers, & Research Highlights

Hereโ€™s what youโ€™ll find inside this monthโ€™s intelligence report:-

Leading Models & Companies :
openai
,
Anthropic
,
meta-llama
,
deepmind
google
,
mistralai
,
Cohere
,
Qwen
,
deepseek-ai
,
MicrosoftResearch
,
amazonwebservices
,
nvidia
,
grokgpt-org
and more.

23 Benchmarks in 6 Categories :
With a special focus on Core Knowledge & Reasoning performance across diverse tasks.

Repository link is in comments below :

https://huggingface.co/blog/rajkumarrawal/september-2025-aiprl-lir-core-knowledge-reasoning#6935ab96da60512619b7cf1f
ยท
codelionย 
posted an update about 2 months ago
codelionย 
posted an update about 2 months ago
view post
Post
2319
Perplexity released a dataset (BrowseSafe) and benchmark to catch and prevent malicious prompt-injection instructions in real-time.

We trained a prompt injection classifier on BrowseSafe using adaptive-classifier with ModernBERT-base embeddings.

74.9% F1 on detecting prompt injection in web content.

Model -> adaptive-classifier/browsesafe
Dataset -> perplexity-ai/browsesafe-bench
Repo -> https://github.com/codelion/adaptive-classifier
  • 1 reply
ยท