AI & ML interests

CVPR Demo Track @ CVPR 2022

abidlabs 
posted an update about 15 hours ago
Abhaykoul 
posted an update 11 days ago
view post
Post
211
Shipped v0.1.2 of vtx — a minimalist coding agent for the terminal.

Most agentic CLIs ship 10k+ token system prompts. Vtx is ~2,200. Less prompt overhead means more room for your code in the model's context window.

Vtx is a from-scratch Python implementation of the design philosophy behind pi-mono — same principles, pure Python, no transpiled runtime.

What ships out of the box:

→ Textual TUI + headless CLI (vtx -p "fix the failing test")
→ 49 LLM provider gateways, all declared in a single provider.yaml
→ 5 core tools (read / edit / write / bash / find) plus web search and fetch
→ Session tree with compaction, handoff, and resume
→ AGENTS.md / CLAUDE.md auto-discovery
→ Skills system — drop SKILL.md files in .agents/skills/ and they become slash commands
→ Two OAuth flows (GitHub Copilot device flow, OpenAI Codex PKCE)
→ Two-mode permissions: prompt (default) or auto, with a safe-command allowlist

This release adds a proper extension system. Register new LLM-callable tools, intercept tool calls, hook lifecycle events, and add slash commands from a single register(api) function in a Python file under ~/.vtx/agent/extensions/. Extensions can override built-in tools by name and chain handler logic across subscribers.

Apache 2.0. uv tool install vtx-coding-agent and you're running.

GitHub: https://github.com/OEvortex/vtx-coding-agent
PyPI: https://pypi.org/project/vtx-coding-agent

Built in the open. Feedback, extensions, and PRs welcome.
victor 
posted an update about 1 month ago
view post
Post
1637
Sharing how I built the LongCat-Video-Avatar 1.5 Space (+500k views on X) in one agent session. Gave a coding agent its own AI lab on ZeroGPU, framed the goal, walked away. It designed, deployed, tested against the live API, fixed, shipped.

Full recipe with the copy-paste prompt: https://huggingface.co/blog/victor/building-zerogpu-spaces-autonomously
  • 1 reply
·
victor 
posted an update 2 months ago
view post
Post
6167
Want to share my enthusiasm for zai-org/GLM-5.1 here too 🔥

I think we have it: our open source Claude Code = GLM-5.1 + Pi (https://pi.dev/) - Built a Three.js racing game to eval and it's extremely impressive. Thoughts:

- One-shot car physics with real drift mechanics (this is hard)

- My fav part: Awesome at self iterating (with no vision!) created 20+ Bun.WebView debugging tools to drive the car programmatically and read game state. Proved a winding bug with vector math without ever seeing the screen

- 531-line racing AI in a single write: 4 personalities, curvature map, racing lines, tactical drifting. Built telemetry tools to compare player vs AI speed curves and data-tuned parameters

- All assets from scratch: 3D models, procedural textures, sky shader, engine sounds, spatial AI audio!

- Can do hard math: proved road normals pointed DOWN via vector cross products, computed track curvature normalized by arc length to tune AI cornering speed

You are going to hear about this model a lot in the next months - open source let's go - and thanks z-ai🚀🚀
  • 5 replies
·
victor 
posted an update 5 months ago
view post
Post
2825
Interesting article: use Claude Code to help open models write CUDA kernels (for eg) by turning CC traces into Skills. They made a library out of it 👀

https://huggingface.co/blog/upskill
victor 
posted an update 6 months ago
abidlabs 
posted an update 8 months ago
view post
Post
11527
Why I think local, open-source models will eventually win.

The most useful AI applications are moving toward multi-turn agentic behavior: systems that take hundreds or even thousands of iterative steps to complete a task, e.g. Claude Code, computer-control agents that click, type, and test repeatedly.

In these cases, the power of the model is not how smart it is per token, but in how quickly it can interact with its environment and tools across many steps. In that regime, model quality becomes secondary to latency.

An open-source model that can call tools quickly, check that the right thing was clicked, or verify that a code change actually passes tests can easily outperform a slightly “smarter” closed model that has to make remote API calls for every move.

Eventually, the balance tips: it becomes impractical for an agent to rely on remote inference for every micro-action. Just as no one would tolerate a keyboard that required a network request per keystroke, users won’t accept agent workflows bottlenecked by latency. All devices will ship with local, open-source models that are “good enough” and the expectation will shift toward everything running locally. It’ll happen sooner than most people think.
  • 8 replies
·