buddy-3-270m-it-Q4_K_M.gguf is a quantized GGUF deployment artifact used by the buddy application. The application source indicates that this model is loaded locally through llama-cpp-python and is based on google/gemma-3-270m-it.

Please use the correct settings: temperature = 1.0, top_k = 64, top_p = 0.95, min_p = 0.0

This repository is intended to host the GGUF model file used by the application, while the main application code is hosted separately on GitHub:

Main application repository: https://github.com/is-leeroy-jenkins/buddy.git
Model file: buddy-3-270m-it-Q4_K_M.gguf
Base model: google/gemma-3-270m-it
Primary runtime for this artifact: llama.cpp / llama-cpp-python
Role in the application: local fallback model for text generation when the primary provider is unavailable or when a local path is preferred

⚙️ Code Respository

🧰 Streamlit UI

Within that architecture, buddy-3-270m-it-Q4_K_M.gguf serves as a lightweight local option for text generation when a local model is preferred or when the primary remote provider path is not being used.

The application configuration points to the following default local model path:

models/buddy-3-270m-it-Q4_K_M.gguf

The runtime loads the model through Llama(...) from llama_cpp, and the application defaults to a 4096-token context window for local inference.

⚙️ Vectorized Datasets

Vectorization is the process of converting textual data into numerical vectors and is a process that is usually applied once the text is cleaned. It can help improve the execution speed and reduce the training time of your code. BudgetPy provides the following vector stores on the OpenAI platform to support environmental data analysis with machine-learning

Appropriations - Enacted appropriations from 1996-2024 available for fine-tuning learning models
Regulations - Collection of federal regulations on the use of appropriated funds
SF-133 - The Report on Budget Execution and Budgetary Resources
Balances - U.S. federal agency Account Balances (File A) submitted as part of the DATA Act 2014.
Outlays - The actual disbursements of funds by the U.S. federal government from 1962 to 2025
SF-133 The Report on Budget Execution and Budgetary Resources
Balances - U.S. federal agency Account Balances (File A) submitted as part of the DATA Act 2014.
Circular A11 - Guidance from OMB on the preparation, submission, and execution of the federal budget
Fastbook - Treasury guidance on federal ledger accounts
Title 31 CFR - Money & Finance
Redbook - The Principles of Appropriations Law (Volumes I & II).
US Standard General Ledger - Account Definitions
Treasury Appropriation Fund Symbols (TAFSs) Dataset - Collection of TAFSs used by federal agencies

Base Model Description

Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. Gemma 3 models are multimodal, handling text and image input and generating text output, with open weights for both pre-trained variants and instruction-tuned variants. Gemma 3 has a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions. Gemma 3 models are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as laptops, desktops or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone.

Inputs and outputs

Input:
- Text string, such as a question, a prompt, or a document to be summarized
- Images, normalized to 896 x 896 resolution and encoded to 256 tokens each
- Total input context of 128K tokens for the 4B, 12B, and 27B sizes, and 32K tokens for the 1B and 270M sizes.
Output:
- Generated text in response to the input, such as an answer to a question, analysis of image content, or a summary of a document
- Total output context up to 128K tokens for the 4B, 12B, and 27B sizes, and 32K tokens for the 1B and 270M sizes per request, subtracting the request input tokens

Intended Usage

This model is intended for local inference in GGUF-compatible runtimes, especially:

llama.cpp
llama-cpp-python
desktop applications or Streamlit applications that load GGUF models directly

In the context of buddy, this repository should be understood as the model-hosting companion to the main GitHub application repository rather than as the complete application itself.

Typical usage scenarios include:

lightweight local assistants
document-grounded Q&A
prompt-based drafting and summarization
local fallback inference for a larger multimodal GPT-5.x application
experimentation with small-footprint local Gemma-family deployments

Example: `llama-cpp-python`

from llama_cpp import Llama

llm = Llama(
    model_path="buddy-3-270m-it-Q4_K_M.gguf",
    n_ctx=4096,
)

response = llm.create_chat_completion(
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain what this model is for."},
    ]
)

print(response["choices"][0]["message"]["content"])

Example: `llama.cpp`

./llama-cli \
  -m buddy-3-270m-it-Q4_K_M.gguf \
  -c 4096 \
  -p "Write a short description of buddy."

Prompting Notes

Because this file is based on an instruction-tuned Gemma-family model, best results generally come from:

clear task-oriented prompts
concise system instructions
grounded context when using RAG
short to moderate generations for factual tasks

For document Q&A workflows, pair the model with retrieved context rather than relying on parametric memory alone.

Quantization Notes

This repository hosts a GGUF quantized artifact rather than an original full-precision checkpoint. The file name indicates a Q4_K_M quantization variant.

Quantization typically reduces model size and memory requirements, making local inference easier on consumer hardware, but it may also reduce generation quality relative to higher-precision variants.

Provenance and What Is Known

The following points are supported by the buddy source files:

the application expects a model file named buddy-3-270m-it-Q4_K_M.gguf
the configured model path is models/buddy-3-270m-it-Q4_K_M.gguf
inference is performed through llama-cpp-python
the application uses a 4096-token context window by default
the application combines local text generation with embedding-based retrieval components

What Is Not Claimed Here

This model card does not claim any of the following unless separate evidence is provided:

a custom fine-tuning dataset
a specific post-training alignment procedure
benchmark results
exact quantization tooling or conversion commands
safety evaluations beyond those of the upstream model family

If you have those details, they should be added explicitly in a later revision.

Limitations

As a small quantized instruction model, this artifact may:

hallucinate facts
struggle with long multi-step reasoning
lose fidelity on highly technical or domain-dense tasks
perform worse than larger or less aggressively quantized models
require careful retrieval support for document-heavy workflows

It should be treated as an assistive generation component, not as an authoritative source.

Safety and Responsible Use

Users should review outputs before acting on them, especially for:

legal matters
financial decisions
medical or health-related questions
employment or compliance workflows
any task requiring high factual precision

Do not rely on model output as a substitute for professional judgment or verified source material.

Hardware Considerations

Because this is a small GGUF quantized model, it is suitable for lightweight local inference relative to larger checkpoints. Actual performance will depend on:

runtime configuration
CPU versus GPU offloading
available RAM / VRAM
context length
batch size and thread settings

License and Upstream Terms

This artifact is based on google/gemma-3-270m-it. Use of this repository and any redistributed artifacts should comply with:

the license and usage terms attached to the upstream Gemma model
any additional redistribution requirements that apply to converted or quantized derivatives

Before publishing, confirm that your intended distribution of the GGUF file is consistent with the applicable upstream license terms.

Files

This repository is expected to contain:

README.md
buddy-3-270m-it-Q4_K_M.gguf

Relationship to the Main buddy Repository

This repository is best used alongside the main buddy codebase hosted on GitHub:

https://github.com/is-leeroy-jenkins/buddy.git

The GitHub repository contains the application logic and user-facing features, while this repository is intended to host the GGUF model artifact used for local fallback inference.

Recommended Repository Description

A GGUF-hosted quantized local fallback model for the buddy application, based on google/gemma-3-270m-it, intended for local inference with llama.cpp / llama-cpp-python, and used alongside the main buddy application hosted on GitHub.

Acknowledgments

Google for the upstream Gemma model family
The llama.cpp and llama-cpp-python communities for GGUF-compatible local inference tooling
The buddy application source, which documents how this model is loaded and used in practice}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for leeroy-jankins/buddy

Base model

google/gemma-3-270m

Finetuned

google/gemma-3-270m-it

Quantized

unsloth/gemma-3-270m-it-GGUF

Finetuned

(1)

this model

leeroy-jankins
/

buddy

⚙️ Code Respository

🧰 Streamlit UI

⚙️ Vectorized Datasets

Base Model Description

Inputs and outputs

Intended Usage

Example: `llama-cpp-python`

Example: `llama.cpp`

Prompting Notes

Quantization Notes

Provenance and What Is Known

What Is Not Claimed Here

Limitations

Safety and Responsible Use

Hardware Considerations

License and Upstream Terms

Files

Relationship to the Main buddy Repository

Recommended Repository Description

Acknowledgments

Model tree for leeroy-jankins/buddy

Datasets used to train leeroy-jankins/buddy

⚙️ Code Respository

🧰 Streamlit UI

⚙️ Vectorized Datasets

Base Model Description

Inputs and outputs

Intended Usage

Example: llama-cpp-python

Example: llama.cpp

Prompting Notes

Quantization Notes

Provenance and What Is Known

What Is Not Claimed Here

Limitations

Safety and Responsible Use

Hardware Considerations

License and Upstream Terms

Files

Relationship to the Main buddy Repository

Recommended Repository Description

Acknowledgments

Model tree for leeroy-jankins/buddy

Datasets used to train leeroy-jankins/buddy

Example: `llama-cpp-python`

Example: `llama.cpp`