HuggingFaceFW/fineweb-edu
Viewer • Updated • 3.5B • 513k • 1.13k
How to use kerzgrr/Monostich-100M with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="kerzgrr/Monostich-100M", filename="monostich-f16.gguf", )
llm.create_chat_completion(
messages = [
{
"role": "user",
"content": "What is the capital of France?"
}
]
)How to use kerzgrr/Monostich-100M with llama.cpp:
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf kerzgrr/Monostich-100M:F16 # Run inference directly in the terminal: llama-cli -hf kerzgrr/Monostich-100M:F16
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf kerzgrr/Monostich-100M:F16 # Run inference directly in the terminal: llama-cli -hf kerzgrr/Monostich-100M:F16
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf kerzgrr/Monostich-100M:F16 # Run inference directly in the terminal: ./llama-cli -hf kerzgrr/Monostich-100M:F16
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf kerzgrr/Monostich-100M:F16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf kerzgrr/Monostich-100M:F16
docker model run hf.co/kerzgrr/Monostich-100M:F16
How to use kerzgrr/Monostich-100M with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "kerzgrr/Monostich-100M"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "kerzgrr/Monostich-100M",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker model run hf.co/kerzgrr/Monostich-100M:F16
How to use kerzgrr/Monostich-100M with Ollama:
ollama run hf.co/kerzgrr/Monostich-100M:F16
How to use kerzgrr/Monostich-100M with Unsloth Studio:
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for kerzgrr/Monostich-100M to start chatting
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for kerzgrr/Monostich-100M to start chatting
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for kerzgrr/Monostich-100M to start chatting
How to use kerzgrr/Monostich-100M with Docker Model Runner:
docker model run hf.co/kerzgrr/Monostich-100M:F16
How to use kerzgrr/Monostich-100M with Lemonade:
# Download Lemonade from https://lemonade-server.ai/ lemonade pull kerzgrr/Monostich-100M:F16
lemonade run user.Monostich-100M-F16
lemonade list
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf kerzgrr/Monostich-100M:F16# Run inference directly in the terminal:
llama-cli -hf kerzgrr/Monostich-100M:F16# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf kerzgrr/Monostich-100M:F16# Run inference directly in the terminal:
./llama-cli -hf kerzgrr/Monostich-100M:F16git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf kerzgrr/Monostich-100M:F16# Run inference directly in the terminal:
./build/bin/llama-cli -hf kerzgrr/Monostich-100M:F16docker model run hf.co/kerzgrr/Monostich-100M:F16GGUF format of Monostich 100M for use with llama.cpp and compatible tools.
| File | Description |
|---|---|
monostich-f16.gguf |
FP16 (full precision) |
# All GGUF files
huggingface-cli download kerzgrr/Monostich-100M --include "*.gguf" --local-dir .
# Or a specific file
huggingface-cli download kerzgrr/Monostich-100M monostich-f16.gguf --local-dir .
Direct URL (for wget/curl):
https://huggingface.co/kerzgrr/Monostich-100M/resolve/main/monostich-f16.gguf
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
cmake -B build -DGGML_CUDA=ON # optional: GPU
cmake --build build --config Release
./build/bin/llama-cli -m monostich-f16.gguf \
-c 1024 \
--temp 0.28 \
--top-p 0.9 \
-i
-c 1024 — context length (max 1024)--temp 0.28 — sampling temperature--top-p 0.9 — nucleus sampling-i — interactive mode./build/bin/llama-cli -m monostich-f16.gguf \
-p "Hello, how are you?" \
-n 128 \
-c 1024 \
--temp 0.28
-p — prompt-n — max new tokensFor instruction-tuned behavior, use the Llama-3-style chat format:
<|begin_of_text|><|start_header_id|>user<|end_header_id|>
Your question here<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Example prompt:
./build/bin/llama-cli -m monostich-f16.gguf \
-p "<|begin_of_text|><|start_header_id|>user<|end_header_id|>
What is 2+2?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
" \
-n 128 -c 1024 --temp 0.28
pip install llama-cpp-python
from llama_cpp import Llama
llm = Llama(model_path="monostich-f16.gguf", n_ctx=1024)
out = llm(
"<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nHello<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
max_tokens=128,
temperature=0.28,
top_p=0.9,
)
print(out["choices"][0]["text"])
For architecture, training, and license details, see the main model card in this repo or kerzgrr/Monostich-100M.
16-bit
Unable to build the model tree, the base model loops to the model itself. Learn more.
Install from brew
# Start a local OpenAI-compatible server with a web UI: llama-server -hf kerzgrr/Monostich-100M:F16# Run inference directly in the terminal: llama-cli -hf kerzgrr/Monostich-100M:F16