Instructions to use leeroy-jankins/buddy with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use leeroy-jankins/buddy with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("leeroy-jankins/buddy", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- ⚙️ Code Respository
- 🧰 Streamlit UI
- ⚙️ Vectorized Datasets
- Base Model Description
- Inputs and outputs
- Intended Usage
- Example:
llama-cpp-python - Example:
llama.cpp - Prompting Notes
- Quantization Notes
- Provenance and What Is Known
- What Is Not Claimed Here
- Limitations
- Safety and Responsible Use
- Hardware Considerations
- License and Upstream Terms
- Files
- Relationship to the Main buddy Repository
- Recommended Repository Description
- Acknowledgments
buddy-3-270m-it-Q4_K_M.gguf is a quantized GGUF deployment artifact used by the buddy
application. The application source indicates that this model is loaded locally through
llama-cpp-python and is based on google/gemma-3-270m-it.
Please use the correct settings:
temperature = 1.0, top_k = 64, top_p = 0.95, min_p = 0.0
This repository is intended to host the GGUF model file used by the application, while the main application code is hosted separately on GitHub:
- Main application repository:
https://github.com/is-leeroy-jenkins/buddy.git - Model file:
buddy-3-270m-it-Q4_K_M.gguf - Base model:
google/gemma-3-270m-it - Primary runtime for this artifact:
llama.cpp/llama-cpp-python - Role in the application: local fallback model for text generation when the primary provider is unavailable or when a local path is preferred
⚙️ Code Respository
🧰 Streamlit UI
Within that architecture, buddy-3-270m-it-Q4_K_M.gguf serves as a lightweight local option for
text generation when a local model is preferred or when the primary remote provider path is not being
used.
The application configuration points to the following default local model path:
models/buddy-3-270m-it-Q4_K_M.gguf
The runtime loads the model through Llama(...) from llama_cpp, and the application defaults to a
4096-token context window for local inference.
⚙️ Vectorized Datasets
Vectorization is the process of converting textual data into numerical vectors and is a process that is usually applied once the text is cleaned. It can help improve the execution speed and reduce the training time of your code. BudgetPy provides the following vector stores on the OpenAI platform to support environmental data analysis with machine-learning
- Appropriations - Enacted appropriations from 1996-2024 available for fine-tuning learning models
- Regulations - Collection of federal regulations on the use of appropriated funds
- SF-133 - The Report on Budget Execution and Budgetary Resources
- Balances - U.S. federal agency Account Balances (File A) submitted as part of the DATA Act 2014.
- Outlays - The actual disbursements of funds by the U.S. federal government from 1962 to 2025
- SF-133 The Report on Budget Execution and Budgetary Resources
- Balances - U.S. federal agency Account Balances (File A) submitted as part of the DATA Act 2014.
- Circular A11 - Guidance from OMB on the preparation, submission, and execution of the federal budget
- Fastbook - Treasury guidance on federal ledger accounts
- Title 31 CFR - Money & Finance
- Redbook - The Principles of Appropriations Law (Volumes I & II).
- US Standard General Ledger - Account Definitions
- Treasury Appropriation Fund Symbols (TAFSs) Dataset - Collection of TAFSs used by federal agencies
Base Model Description
Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. Gemma 3 models are multimodal, handling text and image input and generating text output, with open weights for both pre-trained variants and instruction-tuned variants. Gemma 3 has a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions. Gemma 3 models are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as laptops, desktops or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone.
Inputs and outputs
Input:
- Text string, such as a question, a prompt, or a document to be summarized
- Images, normalized to 896 x 896 resolution and encoded to 256 tokens each
- Total input context of 128K tokens for the 4B, 12B, and 27B sizes, and 32K tokens for the 1B and 270M sizes.
Output:
- Generated text in response to the input, such as an answer to a question, analysis of image content, or a summary of a document
- Total output context up to 128K tokens for the 4B, 12B, and 27B sizes, and 32K tokens for the 1B and 270M sizes per request, subtracting the request input tokens
Intended Usage
This model is intended for local inference in GGUF-compatible runtimes, especially:
llama.cppllama-cpp-python- desktop applications or Streamlit applications that load GGUF models directly
In the context of buddy, this repository should be understood as the model-hosting companion to the main GitHub application repository rather than as the complete application itself.
Typical usage scenarios include:
- lightweight local assistants
- document-grounded Q&A
- prompt-based drafting and summarization
- local fallback inference for a larger multimodal GPT-5.x application
- experimentation with small-footprint local Gemma-family deployments
Example: llama-cpp-python
from llama_cpp import Llama
llm = Llama(
model_path="buddy-3-270m-it-Q4_K_M.gguf",
n_ctx=4096,
)
response = llm.create_chat_completion(
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain what this model is for."},
]
)
print(response["choices"][0]["message"]["content"])
Example: llama.cpp
./llama-cli \
-m buddy-3-270m-it-Q4_K_M.gguf \
-c 4096 \
-p "Write a short description of buddy."
Prompting Notes
Because this file is based on an instruction-tuned Gemma-family model, best results generally come from:
- clear task-oriented prompts
- concise system instructions
- grounded context when using RAG
- short to moderate generations for factual tasks
For document Q&A workflows, pair the model with retrieved context rather than relying on parametric memory alone.
Quantization Notes
This repository hosts a GGUF quantized artifact rather than an original full-precision checkpoint. The file name indicates a Q4_K_M quantization variant.
Quantization typically reduces model size and memory requirements, making local inference easier on consumer hardware, but it may also reduce generation quality relative to higher-precision variants.
Provenance and What Is Known
The following points are supported by the buddy source files:
- the application expects a model file named
buddy-3-270m-it-Q4_K_M.gguf - the configured model path is
models/buddy-3-270m-it-Q4_K_M.gguf - inference is performed through
llama-cpp-python - the application uses a 4096-token context window by default
- the application combines local text generation with embedding-based retrieval components
What Is Not Claimed Here
This model card does not claim any of the following unless separate evidence is provided:
- a custom fine-tuning dataset
- a specific post-training alignment procedure
- benchmark results
- exact quantization tooling or conversion commands
- safety evaluations beyond those of the upstream model family
If you have those details, they should be added explicitly in a later revision.
Limitations
As a small quantized instruction model, this artifact may:
- hallucinate facts
- struggle with long multi-step reasoning
- lose fidelity on highly technical or domain-dense tasks
- perform worse than larger or less aggressively quantized models
- require careful retrieval support for document-heavy workflows
It should be treated as an assistive generation component, not as an authoritative source.
Safety and Responsible Use
Users should review outputs before acting on them, especially for:
- legal matters
- financial decisions
- medical or health-related questions
- employment or compliance workflows
- any task requiring high factual precision
Do not rely on model output as a substitute for professional judgment or verified source material.
Hardware Considerations
Because this is a small GGUF quantized model, it is suitable for lightweight local inference relative to larger checkpoints. Actual performance will depend on:
- runtime configuration
- CPU versus GPU offloading
- available RAM / VRAM
- context length
- batch size and thread settings
License and Upstream Terms
This artifact is based on google/gemma-3-270m-it. Use of this repository and any redistributed
artifacts should comply with:
- the license and usage terms attached to the upstream Gemma model
- any additional redistribution requirements that apply to converted or quantized derivatives
Before publishing, confirm that your intended distribution of the GGUF file is consistent with the applicable upstream license terms.
Files
This repository is expected to contain:
README.md
buddy-3-270m-it-Q4_K_M.gguf
Relationship to the Main buddy Repository
This repository is best used alongside the main buddy codebase hosted on GitHub:
https://github.com/is-leeroy-jenkins/buddy.git
The GitHub repository contains the application logic and user-facing features, while this repository is intended to host the GGUF model artifact used for local fallback inference.
Recommended Repository Description
A GGUF-hosted quantized local fallback model for the buddy application, based on
google/gemma-3-270m-it, intended for local inference with llama.cpp / llama-cpp-python, and
used alongside the main buddy application hosted on GitHub.
Acknowledgments
- Google for the upstream Gemma model family
- The
llama.cppandllama-cpp-pythoncommunities for GGUF-compatible local inference tooling - The buddy application source, which documents how this model is loaded and used in practice}
Model tree for leeroy-jankins/buddy
Base model
google/gemma-3-270m