Instructions to use WithinUsAI/Agent.Nano.Coder-2B-gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use WithinUsAI/Agent.Nano.Coder-2B-gguf with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="WithinUsAI/Agent.Nano.Coder-2B-gguf", filename="Agent.Nano.Coder-Q4_K_M.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use WithinUsAI/Agent.Nano.Coder-2B-gguf with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf WithinUsAI/Agent.Nano.Coder-2B-gguf:Q4_K_M # Run inference directly in the terminal: llama-cli -hf WithinUsAI/Agent.Nano.Coder-2B-gguf:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf WithinUsAI/Agent.Nano.Coder-2B-gguf:Q4_K_M # Run inference directly in the terminal: llama-cli -hf WithinUsAI/Agent.Nano.Coder-2B-gguf:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf WithinUsAI/Agent.Nano.Coder-2B-gguf:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf WithinUsAI/Agent.Nano.Coder-2B-gguf:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf WithinUsAI/Agent.Nano.Coder-2B-gguf:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf WithinUsAI/Agent.Nano.Coder-2B-gguf:Q4_K_M
Use Docker
docker model run hf.co/WithinUsAI/Agent.Nano.Coder-2B-gguf:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use WithinUsAI/Agent.Nano.Coder-2B-gguf with Ollama:
ollama run hf.co/WithinUsAI/Agent.Nano.Coder-2B-gguf:Q4_K_M
- Unsloth Studio
How to use WithinUsAI/Agent.Nano.Coder-2B-gguf with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for WithinUsAI/Agent.Nano.Coder-2B-gguf to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for WithinUsAI/Agent.Nano.Coder-2B-gguf to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for WithinUsAI/Agent.Nano.Coder-2B-gguf to start chatting
- Docker Model Runner
How to use WithinUsAI/Agent.Nano.Coder-2B-gguf with Docker Model Runner:
docker model run hf.co/WithinUsAI/Agent.Nano.Coder-2B-gguf:Q4_K_M
- Lemonade
How to use WithinUsAI/Agent.Nano.Coder-2B-gguf with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull WithinUsAI/Agent.Nano.Coder-2B-gguf:Q4_K_M
Run and chat with the model
lemonade run user.Agent.Nano.Coder-2B-gguf-Q4_K_M
List all available models
lemonade list
| datasets: | |
| - bigcode/the-stack-v2 | |
| - yulan-team/YuLan-Mini-Datasets | |
| - HuggingFaceFW/fineweb-edu | |
| - bigcode/the-stack-v2 | |
| - mlfoundations/dclm-baseline-1.0 | |
| - math-ai/AutoMathText | |
| - gair-prox/open-web-math-pro | |
| - RUC-AIBOX/long_form_thought_data_5k | |
| - internlm/Lean-Workbook | |
| - internlm/Lean-Github | |
| - deepseek-ai/DeepSeek-Prover-V1 | |
| - ScalableMath/Lean-STaR-base | |
| - ScalableMath/Lean-STaR-plus | |
| - ScalableMath/Lean-CoT-base | |
| - ScalableMath/Lean-CoT-plus | |
| - opencsg/chinese-fineweb-edu | |
| - liwu/MNBVC | |
| - vikp/textbook_quality_programming | |
| - HuggingFaceTB/smollm-corpus | |
| - OpenCoder-LLM/opc-annealing-corpus | |
| - OpenCoder-LLM/opc-sft-stage1 | |
| - OpenCoder-LLM/opc-sft-stage2 | |
| - XinyaoHu/AMPS_mathematica | |
| - deepmind/math_dataset | |
| - mrfakename/basic-math-10m | |
| - microsoft/orca-math-word-problems-200k | |
| - AI-MO/NuminaMath-CoT | |
| - HuggingFaceTB/cosmopedia | |
| - MU-NLPC/Calc-ape210k | |
| - manu/project_gutenberg | |
| - storytracer/LoC-PD-Books | |
| - allenai/dolma | |
| Agent.Nano.Coder-2B (GGUF) | |
| 📌 Model Overview | |
| Model Name: WithinUsAI/Agent.Nano.Coder-2B-gguf | |
| Organization: Within Us AI | |
| Model Type: Lightweight Agentic Code LLM | |
| Parameter Size: 2B | |
| Format: GGUF (quantized for local inference) | |
| Primary Focus: Ultra-efficient coding + agent workflows | |
| This model is a compact, high-efficiency coding agent, designed to deliver useful software engineering reasoning in extremely small compute environments. | |
| It belongs to the Within Us AI family of agentic coders, emphasizing action-oriented outputs over passive text generation.  | |
| ⸻ | |
| 🧬 Architecture & Lineage | |
| * Model Class: Small-scale transformer (2B parameter range) | |
| * Design Goal: Maximize reasoning-per-parameter | |
| * Format Conversion: GGUF quantization for local runtime compatibility | |
| Ecosystem Context | |
| Part of a broader WithinUsAI lineup including: | |
| * 4B agentic coders | |
| * reasoning-distilled Gemma variants | |
| * nano-scale experimental models | |
| The Nano series focuses on: | |
| “Minimum size, maximum usefulness.” | |
| ⸻ | |
| 🧠 Core Design Philosophy | |
| This model is built around a sharp constraint: | |
| If a model only has 2B parameters… every neuron has to earn its place. | |
| Key ideas: | |
| * Prioritize coding over general chat | |
| * Bias toward structured outputs | |
| * Encourage step-based reasoning | |
| * Optimize for tool-augmented environments | |
| ⸻ | |
| ⚙️ Key Capabilities | |
| 💻 Coding | |
| * Python, JavaScript, C++, and more | |
| * Function generation and refactoring | |
| * Lightweight debugging assistance | |
| 🤖 Agentic Behavior | |
| * Task decomposition | |
| * Instruction-following for multi-step tasks | |
| * Compatible with external tool pipelines | |
| 🧠 Reasoning (Compact) | |
| * Basic chain-of-thought patterns | |
| * Logical step breakdowns | |
| * Efficient problem-solving within tight parameter limits | |
| ⸻ | |
| 📦 GGUF Format & Deployment | |
| Designed for fast, local inference with minimal hardware. | |
| Compatible Runtimes: | |
| * llama.cpp | |
| * LM Studio | |
| * Ollama (GGUF-compatible builds) | |
| Typical Quantization Sizes (2B class): | |
| * Q4_K_M (~1.1–1.4GB) | |
| * Q5_K_M (~1.3–1.6GB) | |
| ⸻ | |
| 🚀 Intended Use | |
| ✅ Ideal Use Cases | |
| * Low-resource coding assistants | |
| * Embedded / edge AI systems | |
| * Fast iteration environments | |
| * Local copilots on consumer hardware | |
| * Multi-agent systems with many small models | |
| ⚠️ Limitations | |
| * Smaller parameter count limits deep reasoning depth | |
| * Not suited for highly complex multi-domain reasoning | |
| * Performance depends heavily on prompt clarity | |
| ⸻ | |
| 🛠️ Usage Example (llama.cpp) | |
| ./main -m Agent.Nano.Coder-2B.Q4_K_M.gguf \ | |
| -p "Write a Python function to validate email addresses using regex." \ | |
| -n 256 | |
| ⸻ | |
| 🧪 Training & Methodology | |
| Within Us AI approach emphasizes: | |
| * Agentic coding datasets | |
| * Instruction-tuned workflows | |
| * Reasoning traces (lightweight) | |
| * Evaluation-driven refinement | |
| Data Sources | |
| * Proprietary datasets created by Within Us AI | |
| * Third-party datasets may be used without ownership claims | |
| * Focus on: | |
| * Code tasks | |
| * Debugging patterns | |
| * Structured outputs | |
| ⸻ | |
| 📊 Expected Performance Profile | |
| Capability Strength | |
| Coding (basic–intermediate) High | |
| Speed / efficiency Very High | |
| Reasoning depth Moderate | |
| General knowledge Moderate | |
| Tool-use readiness High | |
| ⸻ | |
| 📜 License | |
| License Type: Custom / Other (Within Us AI License Model)** | |
| Terms: | |
| * Base architectures originate from third-party LLM ecosystems | |
| * Within Us AI developed: | |
| * Fine-tuning methodology | |
| * Merging processes | |
| * Training pipelines | |
| * Third-party datasets are used without ownership claims | |
| * Full credit belongs to original creators | |
| ⸻ | |
| 🙏 Acknowledgements | |
| * Open-source LLM community | |
| * GGUF / llama.cpp ecosystem | |
| * Dataset contributors across Hugging Face | |
| * Researchers advancing small-model efficiency | |
| ⸻ | |
| 🔗 Links | |
| * Model: https://huggingface.co/WithinUsAI/Agent.Nano.Coder-2B-gguf | |
| * Organization: https://huggingface.co/WithinUsAI | |
| ⸻ | |
| 🧩 Closing Note | |
| This model is like a pocket-sized engineer 🧰⚡ | |
| Not built to dominate benchmarks… | |
| but to quietly get things done fast, locally, and efficiently. | |