Preview Gipity (gipity-oss-20b-Q4_K_XL.GGUF) is a fine-tuned LLM based on OpenAI’s Chat GPT-5. This release packages the fine-tuned weights (or adapters) for practical, low-latency instruction following, summarization, reasoning, and light code generation. It is intended for local or self-hosted environments and RAG (Retrieval-Augmented Generation) stacks that require predictable, fast outputs.

Quantized, and fine-tuned GGUF based on OpenAI’s gpt-oss-20b Format: GGUF (for llama.cpp and compatible runtimes) β€’ Quantization: Q4_K_XL (4-bit, K-grouped, extra-low loss) Gipity is a multimodal LLM for AI workflows based on OpenAI GPT-5.x. It is designed to provide a unified workspace for text, image and vision, audio, embeddings, files, vector stores, prompt engineering, and document-grounded analysis that comes with an optional UI.


πŸ“₯ Download the Gipity Model

  1. Download the GGUF file:

    gipity-oss-20b.Q4_K_M.gguf
    
  2. Place the file anywhere on your system, for example:

    C:\Users\<you>\leeroy-jankins\gipity\gipity-oss-20b.Q4_K_M.gguf
    

βš™οΈ Streamlit UI

Open In Streamlit

Highlights

  • Permissive Apache 2.0 license: Build freely without copyleft restrictions or patent riskβ€”ideal for experimentation, customization, and commercial deployment.
  • Configurable reasoning effort: Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs.
  • Full chain-of-thought: Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs. It’s not intended to be shown to end users.
  • Fine-tunable: Fully customize models to your specific use case through parameter fine-tuning.
  • Agentic capabilities: Use the models’ native capabilities for function calling, web browsing, Python code execution, and Structured Outputs.
  • Native MXFP4 quantization: The models are trained with native MXFP4 precision for the MoE layer, making gipity-oss-20b model run within 16GB of memory.

βš™οΈ Datasets

Vectorization is the process of converting textual data into numerical vectors and is a process that is usually applied once the text is cleaned. It can help improve the execution speed and reduce the training time of your code. BudgetPy provides the following vector stores on the OpenAI platform to support environmental data analysis with machine-learning

  • Appropriations - Enacted appropriations from 1996-2024 available for fine-tuning learning models
  • Regulations - Collection of federal regulations on the use of appropriated funds
  • SF-133 - The Report on Budget Execution and Budgetary Resources
  • Balances - U.S. federal agency Account Balances (File A) submitted as part of the DATA Act 2014.
  • Outlays - The actual disbursements of funds by the U.S. federal government from 1962 to 2025
  • Circular A11 - Guidance from OMB on the preparation, submission, and execution of the federal budget
  • Fastbook - Treasury guidance on federal ledger accounts
  • Title 31 CFR - Money & Finance
  • Redbook - The Principles of Appropriations Law (Volumes I & II).
  • US Standard General Ledger - Account Definitions
  • Treasury Appropriation Fund Symbols (TAFSs) Dataset - Collection of TAFSs used by federal agencies

Base Model Details

Read our How to GPT Guide here!

See our collection for all versions of gpt-oss including GGUF, 4-bit & 16-bit formats.

Learn to run gpt-oss correctly - Read the Guide.

See Dynamic 2.0 GGUFs for quantization benchmarks.

✨ Read our gpt-oss Guide here!

The F32 quant is MXFP4 upcasted to BF16 for every single layer and is unquantized.

gpt-oss-20b

Try gpt-oss Β· Guides Β· System card Β· OpenAI blog

Inference examples

Transformers

You can use gipity-oss-20b with Transformers. If you use the Transformers chat template, it will automatically apply the harmony response format. If you use model.generate directly, you need to apply the harmony format manually using the chat template or use our openai-harmony package.

To get started, install the necessary dependencies to setup your environment:

pip install -U transformers kernels torch 

Once, setup you can proceed to run the model by running the snippet below:

from transformers import pipeline
import torch

model_id = "leeroy-jankins/gipity-oss-20b"

pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype="auto",
    device_map="auto",
)

messages = [
    {"role": "user", "content": "Explain quantum mechanics clearly and concisely."},
]

outputs = pipe(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])

Alternatively, you can run the model via Transformers Serve to spin up a OpenAI-compatible webserver:

transformers serve
transformers chat localhost:8000 --model-name-or-path leeroy-jankins/gipity-oss-20b

Learn more about how to use gpt-oss with Transformers.


vLLM

vLLM recommends using uv for Python dependency management. You can use vLLM to spin up an OpenAI-compatible webserver. The following command will automatically download the model and start the server.

uv pip install --pre vllm==0.10.1+gptoss \
    --extra-index-url https://wheels.vllm.ai/gpt-oss/ \
    --extra-index-url https://download.pytorch.org/whl/nightly/cu128 \
    --index-strategy unsafe-best-match

vllm serve openai/gipity-oss-20b

Learn more about how to use gipity-oss with vLLM.


PyTorch / Triton

To learn about how to use this model with PyTorch and Triton, check out our reference implementations in the gpt-oss repository.

Ollama

If you are trying to run gpt-oss on consumer hardware, you can use Ollama by running the following commands after installing Ollama.

# gipity-oss-20b
ollama pull gipity-oss:20b
ollama run gipity-oss:20b

Learn more about how to use gpt-oss with Ollama.

LM Studio

If you are using LM Studio you can use the following commands to download.

# gipity-oss-20b
lms get leeroy-jankins/gipity-oss-20b

Check out our awesome list for a broader collection of gpt-oss resources and inference partners.


Download the model

You can download the model weights from the Hugging Face Hub directly from Hugging Face CLI:

# gipity-oss-20b
huggingface-cli download leeroy-jankins/gipity-oss-20b --include "original/*" --local-dir gipity-oss-20b/
pip install gpt-oss
python -m gpt_oss.chat model/

Reasoning levels

You can adjust the reasoning level that suits your task across three levels:

  • Low: Fast responses for general dialogue.
  • Medium: Balanced speed and detail.
  • High: Deep and detailed analysis.

The reasoning level can be set in the system prompts, e.g., "Reasoning: high".

Tool use

The gpt-oss models are excellent for:

  • Web browsing (using built-in browsing tools)
  • Function calling with defined schemas
  • Agentic operations like browser tasks

Fine-tuning

Both gpt-oss models can be fine-tuned for a variety of specialized use cases.

This smaller model gipity-oss-20b can be fine-tuned on consumer hardware, whereas the larger gpt-oss-120b can be fine-tuned on a single H100 node.

Downloads last month
13
GGUF
Model size
21B params
Architecture
gpt-oss
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for leeroy-jankins/gipity

Quantized
(3)
this model

Datasets used to train leeroy-jankins/gipity