AI & ML interests

None defined yet.

RakshitAralimatti 
posted an update about 2 months ago
view post
Post
3051
Just built my entire AI Engineer portfolio by pasting 2 links (GitHub and LinkedIn) into
moonshotai
Kimi 2.5.
That's it. That's the workflow.
Zero coding. Zero iteration. Zero "make the button bigger."
See for yourself: https://rakshit2020.github.io/rakshitaralimatti.github.io/

The model:
✅ Scraped my GitHub repos automatically
✅ Pulled my experience from LinkedIn
✅ Designed an Aurora Glass theme
✅ Mapped every skill to projects
✅ Added animations I'd never code myself


·
RakshitAralimatti 
posted an update 2 months ago
view post
Post
1249
I built a crazy ultra–low latency voice assistant agent using Pipecat, NVIDIA Riva, NVIDIA NIM, and an MCP‑powered tool stack. It can talk in real time, search the web, and manage your project directory files, document your code and docs hands‑free (create, read, summarise, and clean up).

Link - https://github.com/rakshit2020/Voice-Agent-using-Nvidia-Riva-NIM-Pipecat
I put everything into a small demo repo with the full architecture diagram and a short demo video so you can see exactly how it works and adapt it to your own projects.

Check out the GitHub, play with the agent, and let me know if it’s useful or if you want a breakdown of any part of the setup.
  • 1 reply
·
RakshitAralimatti 
posted an update 2 months ago
view post
Post
1996
One of the most practical and genuinely useful use cases of agentic systems is a research assistant.

I built a Deep Research multi-agent system using NVIDIA’s Nemotron-3-Nano-30B-A3B model and CrewAI.
Try it out yourself 👇
🔗 GitHub: https://github.com/rakshit2020/Deep-Research-Agent-using-CrewAI
What truly made this system feel next-level was powering it with NVIDIA Nemotron-3-Nano-30B-A3B, its built for real-world agentic applications.

The agentic system I built:

1. First talks to you and clarifies what you actually want, removing ambiguity
2. Then creates a proper research plan based on that clarity
3. Performs deep research using web search and content extraction tools
4. Finally produces a well-structured research report grounded in sources


RakshitAralimatti 
posted an update 3 months ago
view post
Post
2456
I built something crazy you never saw before.

Please check - https://huggingface.co/blog/RakshitAralimatti/streaming-data-rag

A real-time Streaming Data to RAG system that listens to live radio, transcribes it on-the-fly, and lets you query across TIME.

Not just "what was discussed" – but "what happened in the last 10 minutes on channel 0?" or "at 9 AM, what was the breaking news?" This is RAG that understands temporal context.

  • 1 reply
·
RakshitAralimatti 
posted an update 4 months ago
view post
Post
1378
OCR has absolutely blown up in 2025, and honestly, my perspective on document processing has completely changed.

This year has been wild. Vision Language Models like Nanonets OCR2-3B hit the scene and suddenly we're getting accuracy on complex forms (vs for traditional OCR). We're talking handwritten checkboxes, watermarked documents, multi-column layouts, even LaTeX equations all handled in a single pass.​

The market numbers say it all: OCR accuracy passed 98% for printed text, AI integration is everywhere, and real-time processing is now standard. The entire OCR market is hitting $25.13 billion in 2025 because this tech actually works now.

I wrote a detailed Medium article walking through:

1. Why vision LMs changed the game
2. NVIDIA NeMo Retriever architecture
3. Complete code breakdown
4. Real government/healthcare use cases
5. Production deployment guide

Article: https://medium.com/@rakshitaralimatti2001/nvidia-nemo-retriever-ocr-building-document-intelligence-systems-for-enterprise-and-government-42a6684c37a1

Try It Yourself
  • 3 replies
·
RakshitAralimatti 
posted an update 6 months ago
view post
Post
263
Have you ever wanted to easily deploy a cutting-edge speech recognition system that actually works in real time? How about one powered by NVIDIA GPUs on Kubernetes, but without the headache of complicated installs?

Well, your wait is over! My latest blog shows how to deploy NVIDIA Riva ASR in just 5 minutes using Helm charts. From validating GPU readiness in Kubernetes to customizing your ASR models and spinning up the service, this guide covers it all.
Read it here - https://medium.com/@rakshitaralimatti2001/deploy-nvidia-riva-asr-on-kubernetes-gpu-ready-in-minutes-30955d6ed7b8

BONUS: I even built simple Streamlit apps so you can test with your mic or upload audio files to see the magic live.

✨ Bookmark this post and the blog for your next voice AI project or production-ready speech application!
RakshitAralimatti 
posted an update 7 months ago
view post
Post
6792
When you ask ChatGPT, Claude, or Gemini a really tough question,
you might notice that little "thinking..." moment before it answers.

But what does it actually mean when an LLM is “thinking”?

Imagine a chess player pausing before their next move not because they don’t know how to play, but because they’re running through possibilities, weighing options, and choosing the best one.
LLMs do something similar… except they’re not really thinking like us.

Here’s the surprising part :-
You might think these reasoning skills come from futuristic architectures or alien neural networks.
In reality, most reasoning LLMs still use the same transformer decoder-only architecture as other models
The real magic?
It’s in how they’re trained and what data they learn from.

Can AI actually think, or is it just insanely good at faking it?
I broke it down in a simple, 4-minute Medium read.
Bet you’ll walk away with at least one “aha!” moment. 🚀

Read here - https://lnkd.in/edZ8Ceyg
·
RakshitAralimatti 
posted an update 7 months ago
view post
Post
250
🤔 Ever wondered how OpenAI’s massive GPT‑OSS‑20B runs on just 16 GB of memory or how GPT‑OSS‑120B runs on a single H100 GPU?

Seems impossible, right?

The secret is Native MXFP4 Quantization it's a 4-bit floating-point format that’s making AI models faster, lighter, and more deployable than ever.

🧠 What’s MXFP4?

MXFP4, or Microscaling FP4, is a specialized 4-bit floating‑point format (E2M1) standardized by the Open Compute Project under the MX (Microscaling) specification. It compresses groups of 32 values using a shared 8-bit scale (E8M0), dramatically lowering memory usage while preserving the dynamic range perfect for compact AI model deployment.

💡 Think of it like this:

Instead of everyone ordering their own expensive meal (full-precision weights), a group shares a family meal (shared scaling). It’s cheaper, lighter, and still gets the job done.

✍️ I’ve broken all of this down in my first Medium blog:

What’s MXFP4? The 4-Bit Secret Powering OpenAI’s GPT‑OSS Models on Modest Hardware
Link - https://medium.com/@rakshitaralimatti2001/4-bit-alchemy-how-mxfp4-makes-massive-models-like-gpt-oss-feasible-for-everyone-573d6630b56c

HF - https://huggingface.co/blog/RakshitAralimatti/learn-ai-with-me
RakshitAralimatti 
posted an update 8 months ago
view post
Post
284
🚀 Introducing Multi-Model RAG with LangChain!
Understand and query across images, tables, text, and files — all in one pipeline.
Get smart answers with relevant visuals or tables as references.

🔗 GitHub: https://github.com/rakshit2020/Multi-Model-RAG-LangChain
🎥 Demo video included — see it in action!

✅ Built for developers & researchers
⭐ Try it out, explore the code, and drop a star if you find it useful!
reach-vb 
updated a Space 10 months ago
s884812 
in ai-starter-pack/README about 1 year ago