AI & ML interests

Local LLMs

Recent Activity

Abhaykoul 
posted an update 8 days ago
view post
Post
194
Shipped v0.1.2 of vtx — a minimalist coding agent for the terminal.

Most agentic CLIs ship 10k+ token system prompts. Vtx is ~2,200. Less prompt overhead means more room for your code in the model's context window.

Vtx is a from-scratch Python implementation of the design philosophy behind pi-mono — same principles, pure Python, no transpiled runtime.

What ships out of the box:

→ Textual TUI + headless CLI (vtx -p "fix the failing test")
→ 49 LLM provider gateways, all declared in a single provider.yaml
→ 5 core tools (read / edit / write / bash / find) plus web search and fetch
→ Session tree with compaction, handoff, and resume
→ AGENTS.md / CLAUDE.md auto-discovery
→ Skills system — drop SKILL.md files in .agents/skills/ and they become slash commands
→ Two OAuth flows (GitHub Copilot device flow, OpenAI Codex PKCE)
→ Two-mode permissions: prompt (default) or auto, with a safe-command allowlist

This release adds a proper extension system. Register new LLM-callable tools, intercept tool calls, hook lifecycle events, and add slash commands from a single register(api) function in a Python file under ~/.vtx/agent/extensions/. Extensions can override built-in tools by name and chain handler logic across subscribers.

Apache 2.0. uv tool install vtx-coding-agent and you're running.

GitHub: https://github.com/OEvortex/vtx-coding-agent
PyPI: https://pypi.org/project/vtx-coding-agent

Built in the open. Feedback, extensions, and PRs welcome.
prithivMLmods 
posted an update 9 days ago
view post
Post
3772
Wan2.2-I2V-Fast with highly upscaled sequential frame sampling is now available as a Spaces demo, built using Wan2.2-I2V and FLUX.2-Klein. Try the demo using the links below.👇

➠ wan2.2-i2v-fast : prithivMLmods/wan2.2-i2v-fast
➠ github: https://github.com/prithivsakthiur/wan2.2-i2v-fast
➠ collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection

⤷ To learn more, visit the app page or the respective model pages.
OzTianlu 
posted an update 11 days ago
view post
Post
6297
ResNet is Explicit Euler. GPT is Implicit Euler. What Else is Hiding in Plain Sight?

Read online: https://datawhalechina.github.io/learning-terrain/

I wrote an open-source monograph on learning dynamics — The Terrain of Learning. Bilingual (Chinese/English), 4 volumes, 12 chapters, 30+ print-grade figures. Completely free (CC BY-NC-SA 4.0).

The core argument: gradient descent is not optimization. It's terrain motion. The loss function is a landscape. The gradient is the direction of slope. The optimizer is how you choose each step. Once you see it this way, everything clicks:

ResNet = explicit Euler integration on a vector field. The residual branch is the vector field. Each layer takes one Euler step.

GPT autoregression = implicit-state Euler iteration. Stable where explicit Euler explodes. That's why transformers handle long-range dependencies.

DEQ = the Banach fixed-point theorem in production. The forward pass is root-finding. There are no layers to backprop through.

KL divergence = a Bregman divergence on the entropy landscape. Your belief space is curved, not flat.

Chain-of-thought reasoning = hidden states flowing along a reasoning field toward an attractor basin. Correct answers have wide basins. The number of reasoning steps is determined by the terrain, not by the problem.

Diffusion models = systems flowing downhill along a score vector field, from noise to structure, from high energy to low energy.

The book traces one idea across 337 years — from F=ma (Newton, 1687) to H=T+V (Hamilton, 1833) to loss landscape + gradient field (2020s). Hamilton replaced a catalog of forces with one geometric object. This book does the same for deep learning.

GitHub: https://github.com/datawhalechina/learning-terrain
Discussion: https://github.com/datawhalechina/learning-terrain/discussions/2

Convergence is not hope. Convergence is geometry. You see.
  • 1 reply
·
pankajpandey-dev 
posted an update 18 days ago
view post
Post
908
🇮🇳 Gemma-3-1B Hindi Instruct — a Hindi LLM that runs fully offline, anywhere.
Last week I shipped Qwen3-4B Hindi. This week I went the other direction: how tiny can a useful Hindi model get? So I fine-tuned Gemma-3-1B on quality-filtered Hindi instruction data and shipped the full GGUF ladder.
✅ Fine-tune (16-bit): pankajpandey-dev/gemma-3-1b-hindi-instruct
✅ GGUF (Q4/Q5/Q8): pankajpandey-dev/gemma-3-1b-hindi-instruct-GGUF
Runs in Ollama, llama.cpp, and LM Studio. The Q4_K_M is just 806 MB — runs on CPU, a cheap laptop, even a Raspberry Pi.
What I tried this round: chrF-filtered the training data to drop weak translations, and used response-only loss so the model learns how to answer, not how to repeat prompts.
Honest note: at 1B, Hindi fluency is strong but coherence is bounded by size — it's a lightweight/edge experiment, not a 4B replacement. Gemma-3-4B Hindi is next.
Part of my Hindi LLM Series — openly-licensed Indic models for local & edge use. Feedback welcome 🙏
#Hindi #IndicNLP #GGUF #LocalLLM #Gemma #EdgeAI
prithivMLmods 
posted an update 24 days ago
pankajpandey-dev 
posted an update 25 days ago
view post
Post
14982
🇮🇳 Qwen3-4B Hindi Instruct v2 — a Hindi LLM that runs on your own machine
Most strong Hindi-capable models are either huge or cloud-only. I wanted one that's small enough to run locally but actually follows instructions in Hindi — so I fine-tuned Qwen3-4B on 10K Hindi instruction pairs and shipped it with a full GGUF quant ladder.
✅ Fine-tune (16-bit): huggingface.co/pankajpandey-dev/Qwen3-4B-Hindi-Instruct-v2
✅ GGUF (Q4/Q5/Q8): huggingface.co/pankajpandey-dev/Qwen3-4B-Hindi-Instruct-v2-GGUF
Runs in Ollama, llama.cpp, and LM Studio. The Q4_K_M is just 2.5 GB — fits comfortably on a laptop, CPU or GPU.
Part of my Hindi LLM Series — building openly-licensed Indic models for local and edge use. More coming (Gemma next). Feedback welcome 🙏
#Hindi #IndicNLP #GGUF #LocalLLM #Qwen
  • 4 replies
·
prithivMLmods 
posted an update 27 days ago
view post
Post
6182
PiD — Pixel Diffusion Decoder Image Edit Upscale and Image Generation Upscale, an all-in-one demo, is now live on Spaces! Great improvements in realism-based image generation and editing are powered by FLUX.2-Klein, while image generation is paired with Z-Image, and upscaling is enabled by default!

🤗 Space: prithivMLmods/PiD-Image-Upscaler
🔗 Collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection

🤗 > To learn more, visit the app page or the respective model pages.
pankajpandey-dev 
posted an update 29 days ago
view post
Post
692
🇮🇳 Just shipped: MiniCPM5-1B-Hindi-Instruct (+ GGUF quants)

First Hindi instruction-tuned fine-tune of OpenBMB's brand-new MiniCPM5-1B (released this week).

Trained with Unsloth + LoRA (r=32) on AI4Bharat's anudesh + dolly Hindi splits — ~4k high-quality examples, 2 epochs on a single T4 in 60 minutes.

🔗 Model (16-bit + LoRA adapter):
pankajpandey-dev/MiniCPM5-1B-Hindi-Instruct

📦 GGUF quants for llama.cpp / Ollama / LM Studio:
pankajpandey-dev/MiniCPM5-1B-Hindi-Instruct-v1-GGUF

5 quant levels — from Q3_K_M (~560 MB, runs on a Raspberry Pi) to Q8_0 (~1.2 GB, near-lossless). Q4_K_M is the recommended default.

Part of my ongoing 🇮🇳 Hindi LLM Series — bringing strong open-source LLMs to Indian languages.

#Hindi #IndicNLP #MiniCPM5 #LoRA #Unsloth #GGUF #llamacpp #Ollama #LocalLLM
pankajpandey-dev 
posted an update 30 days ago
view post
Post
2699
🧬 Just uploaded K-quants of Carbon-3B for llama.cpp users!
@HuggingFaceBio released the original GGUF in bf16 only — so I added the full quant ladder for CPU/edge inference:
• Q2_K → 1.4 GB
• Q3_K_M → 1.8 GB
• Q4_K_M → 2.1 GB ⭐
• Q5_K_M → 2.4 GB
• Q6_K → 2.7 GB
• Q8_0 → 3.5 GB
🔗 pankajpandey-dev/Carbon-3B-GGUF
Now you can generate DNA sequences on your laptop. Needs a llama.cpp build with PR #23410 (HybridDNATokenizer support).
Huge thanks to the HuggingFaceBio team for the original model 🙏
#GGUF #llamacpp #genomics #DNA

pankajpandey-dev 
posted an update about 1 month ago
view post
Post
276
Just released Qwen3-0.6B fine-tuned on Hindi instruction data 🇮🇳

✅ Full model: pankajpandey-dev/Qwen3-0.6B-Hindi-Instruct-v1
✅ GGUF versions (Q2/Q4/Q5/Q8): pankajpandey-dev/Qwen3-0.6B-Hindi-Instruct-v1-GGUF

Smallest Hindi-capable GGUF — runs on any laptop at 0.37GB.
Next: v2 with more data, better responses.

#Hindi #LLM #GGUF #OpenSource
prithivMLmods 
posted an update about 1 month ago
view post
Post
5594
I've made 8 Spaces in the Qwen-Image-Edit series, and out of them, 5 Spaces reached “Space of the Week”! A few Spaces are still topping the list even after many months.

Cumulatively, the series has crossed 8.2 million+ ZeroGPU runs and nearly 4 million visitors overall.

Thanks for all the community support! 🤗❤️

🔗 Spaces: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection
  • 4 replies
·
Aurelien-Morgan 
posted an update about 2 months ago
view post
Post
1094
@retrain-pipelines v0.2.0 is out !
I'm at Station F at My booth with GOSIM Paris 2026 today & tomorrow.
Come meet me for a live in-person demo and a chat !
  • 1 reply
·
Ujjwal-Tyagi 
posted an update about 2 months ago
view post
Post
391
6 Open-Source Libraries to FineTune LLMs
1. Unsloth
GitHub: https://github.com/unslothai/unsloth
→ Fastest way to fine-tune LLMs locally
→ Optimized for low VRAM (even laptops)
→ Plug-and-play with Hugging Face models

2. Axolotl
GitHub: https://github.com/OpenAccess-AI-Collective/axolotl
→ Flexible LLM fine-tuning configs
→ Supports LoRA, QLoRA, multi-GPU
→ Great for custom training pipelines

3. TRL (Transformer Reinforcement Learning)
GitHub: https://github.com/huggingface/trl
→ RLHF, DPO, PPO for LLM alignment
→ Built on Hugging Face ecosystem
→ Essential for post-training optimization

4. DeepSpeed
GitHub: https://github.com/microsoft/DeepSpeed
→ Train massive models efficiently
→ Memory + speed optimization
→ Industry standard for scaling

5. LLaMA-Factory
GitHub: https://github.com/hiyouga/LLaMA-Factory
→ All-in-one fine-tuning UI + CLI
→ Supports multiple models (LLaMA, Qwen, etc.)
→ Beginner-friendly + powerful

6. PEFT
GitHub: https://github.com/huggingface/peft
→ Fine-tune with minimal compute
→ LoRA, adapters, prefix tuning
→ Best for cost-efficient training
  • 1 reply
·
Sri-Vigneshwar-DJ 
posted an update about 2 months ago
view post
Post
154
![Feather DB LongMemEval Results]( Hawky-ai/longmemeval-results)

We ran Feather DB v0.8.0 on LongMemEval (ICLR 2025) — 500 questions across real multi-session conversations, up to 115K tokens each.

**Score: 0.693** · GPT-4o full-context baseline: 0.640
Full 500-question run with Gemini-Flash: **$2.40**

Per-axis breakdown:
→ Info-extraction: **0.942**
→ Knowledge-update: **0.714**
→ Multi-session: **0.606**
→ Temporal: **0.477** ← the hard one, Phase 9 addresses this

Architecture: Hybrid BM25+dense · adaptive temporal decay · embedded (no server) · p50 = 0.19ms · MIT

pip install feather-db

Raw results + audit JSONs: Hawky-ai/longmemeval-results
prithivMLmods 
posted an update about 2 months ago
view post
Post
5945
Multimodal-Edge Demo, a node-based inference canvas demo, is now live on Spaces. It features node-based Transformers for fast inference across 10+ edge-device multimodal models on the Hub, all within a single space. The series includes models from Qwen3.5, Qwen3-VL, Gemma 4, and the LFM 2.5 VL model series, with support for reasoning and grounding tasks.

🤗 Demo: prithivMLmods/Multimodal-Edge-Node
🔗 GitHub: https://github.com/PRITHIVSAKTHIUR/Multimodal-Edge-Node
✅ Multimodal Apps Collections: https://huggingface.co/collections/prithivMLmods/hall-of-multimodal-apps

🤗 > To learn more, visit the app page or the respective model pages.
Ujjwal-Tyagi 
posted an update 2 months ago
view post
Post
340
This is the best set of AI and ML books and a full guide to learning machine learning from the ground up. This is my study material that I used, so I thought it would be helpful to share it with others. Like, share, and add it to your collection at Ujjwal-Tyagi/ai-ml-foundations-book-collection.