Monostich GGUF

GGUF format of Monostich 100M for use with llama.cpp and compatible tools.

File Description
monostich-f16.gguf FP16 (full precision)

Download

# All GGUF files
huggingface-cli download kerzgrr/Monostich-100M --include "*.gguf" --local-dir .

# Or a specific file
huggingface-cli download kerzgrr/Monostich-100M monostich-f16.gguf --local-dir .

Direct URL (for wget/curl):

https://huggingface.co/kerzgrr/Monostich-100M/resolve/main/monostich-f16.gguf

Run with llama.cpp

1. Build llama.cpp

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
cmake -B build -DGGML_CUDA=ON   # optional: GPU
cmake --build build --config Release

2. Interactive chat

./build/bin/llama-cli -m monostich-f16.gguf \
  -c 1024 \
  --temp 0.28 \
  --top-p 0.9 \
  -i
  • -c 1024 — context length (max 1024)
  • --temp 0.28 — sampling temperature
  • --top-p 0.9 — nucleus sampling
  • -i — interactive mode

3. Single prompt (no chat UI)

./build/bin/llama-cli -m monostich-f16.gguf \
  -p "Hello, how are you?" \
  -n 128 \
  -c 1024 \
  --temp 0.28
  • -p — prompt
  • -n — max new tokens

4. Chat template (instruction / assistant style)

For instruction-tuned behavior, use the Llama-3-style chat format:

<|begin_of_text|><|start_header_id|>user<|end_header_id|>

Your question here<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Example prompt:

./build/bin/llama-cli -m monostich-f16.gguf \
  -p "<|begin_of_text|><|start_header_id|>user<|end_header_id|>

What is 2+2?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

" \
  -n 128 -c 1024 --temp 0.28

Run with llama-cpp-python (Python)

pip install llama-cpp-python
from llama_cpp import Llama

llm = Llama(model_path="monostich-f16.gguf", n_ctx=1024)

out = llm(
    "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nHello<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
    max_tokens=128,
    temperature=0.28,
    top_p=0.9,
)
print(out["choices"][0]["text"])

Model card

For architecture, training, and license details, see the main model card in this repo or kerzgrr/Monostich-100M.

Downloads last month
87
GGUF
Model size
0.1B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kerzgrr/Monostich-100M

Unable to build the model tree, the base model loops to the model itself. Learn more.

Datasets used to train kerzgrr/Monostich-100M