Preview

buddy-3-270m-it-Q4_K_M.gguf is a quantized GGUF deployment artifact used by the buddy application. The application source indicates that this model is loaded locally through llama-cpp-python and is based on google/gemma-3-270m-it.

Please use the correct settings: temperature = 1.0, top_k = 64, top_p = 0.95, min_p = 0.0

This repository is intended to host the GGUF model file used by the application, while the main application code is hosted separately on GitHub:

  • Main application repository: https://github.com/is-leeroy-jenkins/buddy.git
  • Model file: buddy-3-270m-it-Q4_K_M.gguf
  • Base model: google/gemma-3-270m-it
  • Primary runtime for this artifact: llama.cpp / llama-cpp-python
  • Role in the application: local fallback model for text generation when the primary provider is unavailable or when a local path is preferred

⚙️ Code Respository

🧰 Streamlit UI

Open In Streamlit

Within that architecture, buddy-3-270m-it-Q4_K_M.gguf serves as a lightweight local option for text generation when a local model is preferred or when the primary remote provider path is not being used.

The application configuration points to the following default local model path:

models/buddy-3-270m-it-Q4_K_M.gguf

The runtime loads the model through Llama(...) from llama_cpp, and the application defaults to a 4096-token context window for local inference.

⚙️ Vectorized Datasets

Vectorization is the process of converting textual data into numerical vectors and is a process that is usually applied once the text is cleaned. It can help improve the execution speed and reduce the training time of your code. BudgetPy provides the following vector stores on the OpenAI platform to support environmental data analysis with machine-learning

  • Appropriations - Enacted appropriations from 1996-2024 available for fine-tuning learning models
  • Regulations - Collection of federal regulations on the use of appropriated funds
  • SF-133 - The Report on Budget Execution and Budgetary Resources
  • Balances - U.S. federal agency Account Balances (File A) submitted as part of the DATA Act 2014.
  • Outlays - The actual disbursements of funds by the U.S. federal government from 1962 to 2025
  • SF-133 The Report on Budget Execution and Budgetary Resources
  • Balances - U.S. federal agency Account Balances (File A) submitted as part of the DATA Act 2014.
  • Circular A11 - Guidance from OMB on the preparation, submission, and execution of the federal budget
  • Fastbook - Treasury guidance on federal ledger accounts
  • Title 31 CFR - Money & Finance
  • Redbook - The Principles of Appropriations Law (Volumes I & II).
  • US Standard General Ledger - Account Definitions
  • Treasury Appropriation Fund Symbols (TAFSs) Dataset - Collection of TAFSs used by federal agencies

Base Model Description

Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. Gemma 3 models are multimodal, handling text and image input and generating text output, with open weights for both pre-trained variants and instruction-tuned variants. Gemma 3 has a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions. Gemma 3 models are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as laptops, desktops or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone.

Inputs and outputs

  • Input:

    • Text string, such as a question, a prompt, or a document to be summarized
    • Images, normalized to 896 x 896 resolution and encoded to 256 tokens each
    • Total input context of 128K tokens for the 4B, 12B, and 27B sizes, and 32K tokens for the 1B and 270M sizes.
  • Output:

    • Generated text in response to the input, such as an answer to a question, analysis of image content, or a summary of a document
    • Total output context up to 128K tokens for the 4B, 12B, and 27B sizes, and 32K tokens for the 1B and 270M sizes per request, subtracting the request input tokens

Intended Usage

This model is intended for local inference in GGUF-compatible runtimes, especially:

  • llama.cpp
  • llama-cpp-python
  • desktop applications or Streamlit applications that load GGUF models directly

In the context of buddy, this repository should be understood as the model-hosting companion to the main GitHub application repository rather than as the complete application itself.

Typical usage scenarios include:

  • lightweight local assistants
  • document-grounded Q&A
  • prompt-based drafting and summarization
  • local fallback inference for a larger multimodal GPT-5.x application
  • experimentation with small-footprint local Gemma-family deployments

Example: llama-cpp-python

from llama_cpp import Llama

llm = Llama(
    model_path="buddy-3-270m-it-Q4_K_M.gguf",
    n_ctx=4096,
)

response = llm.create_chat_completion(
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain what this model is for."},
    ]
)

print(response["choices"][0]["message"]["content"])

Example: llama.cpp

./llama-cli \
  -m buddy-3-270m-it-Q4_K_M.gguf \
  -c 4096 \
  -p "Write a short description of buddy."

Prompting Notes

Because this file is based on an instruction-tuned Gemma-family model, best results generally come from:

  • clear task-oriented prompts
  • concise system instructions
  • grounded context when using RAG
  • short to moderate generations for factual tasks

For document Q&A workflows, pair the model with retrieved context rather than relying on parametric memory alone.

Quantization Notes

This repository hosts a GGUF quantized artifact rather than an original full-precision checkpoint. The file name indicates a Q4_K_M quantization variant.

Quantization typically reduces model size and memory requirements, making local inference easier on consumer hardware, but it may also reduce generation quality relative to higher-precision variants.

Provenance and What Is Known

The following points are supported by the buddy source files:

  • the application expects a model file named buddy-3-270m-it-Q4_K_M.gguf
  • the configured model path is models/buddy-3-270m-it-Q4_K_M.gguf
  • inference is performed through llama-cpp-python
  • the application uses a 4096-token context window by default
  • the application combines local text generation with embedding-based retrieval components

What Is Not Claimed Here

This model card does not claim any of the following unless separate evidence is provided:

  • a custom fine-tuning dataset
  • a specific post-training alignment procedure
  • benchmark results
  • exact quantization tooling or conversion commands
  • safety evaluations beyond those of the upstream model family

If you have those details, they should be added explicitly in a later revision.

Limitations

As a small quantized instruction model, this artifact may:

  • hallucinate facts
  • struggle with long multi-step reasoning
  • lose fidelity on highly technical or domain-dense tasks
  • perform worse than larger or less aggressively quantized models
  • require careful retrieval support for document-heavy workflows

It should be treated as an assistive generation component, not as an authoritative source.

Safety and Responsible Use

Users should review outputs before acting on them, especially for:

  • legal matters
  • financial decisions
  • medical or health-related questions
  • employment or compliance workflows
  • any task requiring high factual precision

Do not rely on model output as a substitute for professional judgment or verified source material.

Hardware Considerations

Because this is a small GGUF quantized model, it is suitable for lightweight local inference relative to larger checkpoints. Actual performance will depend on:

  • runtime configuration
  • CPU versus GPU offloading
  • available RAM / VRAM
  • context length
  • batch size and thread settings

License and Upstream Terms

This artifact is based on google/gemma-3-270m-it. Use of this repository and any redistributed artifacts should comply with:

  • the license and usage terms attached to the upstream Gemma model
  • any additional redistribution requirements that apply to converted or quantized derivatives

Before publishing, confirm that your intended distribution of the GGUF file is consistent with the applicable upstream license terms.

Files

This repository is expected to contain:

README.md
buddy-3-270m-it-Q4_K_M.gguf

Relationship to the Main buddy Repository

This repository is best used alongside the main buddy codebase hosted on GitHub:

https://github.com/is-leeroy-jenkins/buddy.git

The GitHub repository contains the application logic and user-facing features, while this repository is intended to host the GGUF model artifact used for local fallback inference.

Recommended Repository Description

A GGUF-hosted quantized local fallback model for the buddy application, based on google/gemma-3-270m-it, intended for local inference with llama.cpp / llama-cpp-python, and used alongside the main buddy application hosted on GitHub.

Acknowledgments

  • Google for the upstream Gemma model family
  • The llama.cpp and llama-cpp-python communities for GGUF-compatible local inference tooling
  • The buddy application source, which documents how this model is loaded and used in practice}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for leeroy-jankins/buddy

Finetuned
(1)
this model

Datasets used to train leeroy-jankins/buddy