AI & ML interests

AI & Blockchain & Robotics

Recent Activity

rajkumarrawal 
posted an update 12 days ago
view post
Post
847
I submitted a "AgencyBench: Benchmarking the Frontiers of Autonomous Agents in 1M-Token Real-World Contexts" Paper by @weizhihao1KeyuLi Junhao shi @dqwangDequan Wang @YangXiao-nlpYang Xiao Mohan Jiang @Sunshine279Jie Sun Yunze Wu Shijie Xia Xiaojie Cai Tianze Xu Weiye Si Wenjie Li Pengfei Liu From
SJTU
Shanghai Jiao Tong University
PolyUHK
The Hong Kong Polytechnic University GAIRSII-GAIR to Daily Papers on huggingfaceHugging Face.

A potentially another direction for Benchmarking the Frontiers of Autonomous Agents in 2026

Some of the observations founded are :-

-- Long-horizon tasks remain challenging :
Even frontier models struggle with sustained reasoning over real world tasks that require 1M tokens and 90 tool calls, indicating limits in long context autonomy.

-- Proprietary models outperform open source models:
Closed source models achieve a higher average score (48.4%) than open source counterparts (32.1%), revealing a persistent performance gap on complex agentic tasks.

-- Feedback driven self correction varies widely:
Models like GPT 5.2 and Claude show strong gains from iterative feedback, while others (e.g. DeepSeek V3.2) exhibit minimal or no improvement after feedback.

-- Efficiency trade offs are significant:
High performing models often consume far more tokens and time, some models (e.g. Grok 4.1 Fast) are more token efficient despite lower absolute scores.

-- Agentic scaffolds strongly influence performance:
Models tend to perform best within their native or optimized ecosystems, highlighting that agent performance depends on tight coupling between the model and its scaffold not the model alone.

..... many more...

AgencyBench: Benchmarking the Frontiers of Autonomous Agents in 1M-Token Real-World Contexts (2601.11044)
  • 1 reply
·
rajkumarrawal 
posted an update 17 days ago
rajkumarrawal 
posted an update about 2 months ago
view post
Post
1917
" An open standardized protocol enabling communication for autonomous robots to exchange data, coordinate tasks, and collaborate in real-time environments in the age of AI ". r2r-protocol (Robot2Robot Protocol) is now officially open source! 🔓

"pip install r2r-protocol"

Whether you're a developer, researcher, or tech enthusiast, we invite you to explore, use, and contribute to the project.

🔗 Check it out here: [ https://github.com/Tech-Parivartan/r2r-protocol?tab=readme-ov-file ]

Let’s build the future together! 💡
AiParivartanResearchLab
techparivartan


Documentation of the r2r-protocal : [ https://techparivartanai.notion.site/Robot-to-Robot-r2r-Protocol-1f008f0fb18780439d70e8b9bbbdb869 ]

The R2R Protocol enables seamless robot-to-robot interaction across industrial automation, swarm robotics, logistics, and multi-agent systems. It defines structured message formats, negotiation logic, discovery mechanisms, and extensible APIs.

#r2r_protocol #robot2robot_protocol #ai #aiparivartanresearchlab #techparivartan

https://huggingface.co/blog/rajkumarrawal/rawalraj
rajkumarrawal 
posted an update about 2 months ago
view post
Post
2838
September(2025) LLM Core Knowledge & Reasoning Benchmarks Report By
AiParivartanResearchLab
(AIPRL-LIR)

Monthly LLM's Intelligence Reports for AI Decision Makers :
Our "aiprl-llm-intelligence-report" repo to establishes (AIPRL-LIR) framework for Large Language Model overall evaluation and analysis through systematic monthly intelligence reports. Unlike typical AI research papers or commercial reports. It provides structured insights into AI model performance, benchmarking methodologies, Multi-hosting provider analysis, industry trends ...

( all in one monthly report ) Leading Models & Companies, 23 Benchmarks in 6 Categories, Global Hosting Providers, & Research Highlights

Here’s what you’ll find inside this month’s intelligence report:-

Leading Models & Companies :
openai
,
Anthropic
,
meta-llama
,
deepmind
google
,
mistralai
,
Cohere
,
Qwen
,
deepseek-ai
,
MicrosoftResearch
,
amazonwebservices
,
nvidia
,
grokgpt-org
and more.

23 Benchmarks in 6 Categories :
With a special focus on Core Knowledge & Reasoning performance across diverse tasks.

Repository link is in comments below :

https://huggingface.co/blog/rajkumarrawal/september-2025-aiprl-lir-core-knowledge-reasoning#6935ab96da60512619b7cf1f
·
rajkumarrawal 
posted an update 2 months ago
view post
Post
1144
September(2025) LLM Commonsense & Social Benchmarks Report By
AiParivartanResearchLab
(AIPRL-LIR)

Monthly LLM's Intelligence Reports for AI Decision Makers :
Our "aiprl-llm-intelligence-report" repo to establishes (AIPRL-LIR) framework for Large Language Model overall evaluation and analysis through systematic monthly intelligence reports. Unlike typical AI research papers or commercial reports. It provides structured insights into AI model performance, benchmarking methodologies, Multi-hosting provider analysis, industry trends ...

( all in one monthly report ) Leading Models & Companies, 23 Benchmarks in 6 Categories, Global Hosting Providers, & Research Highlights

Here’s what you’ll find inside this month’s intelligence report:-

Leading Models & Companies :
openai
,
Anthropic
,
meta-llama
,
google
deepmind
,
mistralai
,
Cohere
,
Qwen
,
deepseek-ai
,
MicrosoftResearch
,
amazonwebservices
,
nvidia
,
grokgpt-org
and more.

23 Benchmarks in 6 Categories :
With a special focus on Commonsense & Social performance across diverse tasks.

Repository link is in comments below :

https://huggingface.co/blog/rajkumarrawal/september-2025-aiprl-lir-commonsense-social

  • 2 replies
·
rajkumarrawal 
posted an update 2 months ago
view post
Post
2504
September(2025) LLM Safety & Reliability Benchmarks Report By AI Parivartan Research Lab (AIPRL-LIR)

Monthly LLM's Intelligence Reports for AI Decision Makers :

Our "aiprl-llm-intelligence-report" repo to establishes (AIPRL-LIR) framework for Large Language Model overall evaluation and analysis through systematic monthly intelligence reports. Unlike typical AI research papers or commercial reports. It provides structured insights into AI model performance, benchmarking methodologies, Multi-hosting provider analysis, industry trends ...

( all in one monthly report ) Leading Models & Companies, 23 Benchmarks in 6 Categories, Global Hosting Providers, & Research Highlights

Here’s what you’ll find inside this month’s intelligence report:-

Leading Models & Companies :

23 Benchmarks in 6 Categories :
With a special focus on Safety & Reliability performance across diverse tasks.

Global Hosting Providers :

Research Highlights :
Comparative insights, evaluation methodologies, and industry trends for AI decision makers.

Disclaimer:
This comprehensive Safety & Reliability analysis represents the current state of large language model capabilities as of September 2025. All performance metrics are based on standardized evaluations and may vary based on specific implementation details, hardware configurations, and testing methodologies. Users are advised to consult original research papers and official documentation for detailed technical insights and application guidelines. Individual model performance may differ in real-world scenarios and should be validated accordingly. If there are any discrepancies or updates beyond this report, please refer to the respective model providers for the most current information.

Repository link is in comments below :

https://huggingface.co/blog/rajkumarrawal/september-2025-aiprl-lir-safety-reliability

AiParivartanResearchLab


  • 1 reply
·
rajkumarrawal 
posted an update 2 months ago
view post
Post
850
Annual AI Investment ROI Calculator:

Current Manual Process Cost: $X
AI Implementation Cost: $Y
Expected Efficiency Gain: Z%

Annual Savings = X × Z% = A
Annual AI Cost = Y
Net Annual Benefit = A - Y = B

ROI = (B ÷ Y) × 100%
Payback Period = (Y ÷ B) * 12 months

Monthly LLM's Intelligence Reports for AI Decision Makers :

Repository link is in comments below :
AiParivartanResearchLab

  • 1 reply
·
rajkumarrawal 
posted an update 3 months ago