Instructions to use NAKSTStudio/chess-gemma-commentary with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use NAKSTStudio/chess-gemma-commentary with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="NAKSTStudio/chess-gemma-commentary",
	filename="chess-commentary-model__f16.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use NAKSTStudio/chess-gemma-commentary with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf NAKSTStudio/chess-gemma-commentary:F16
# Run inference directly in the terminal:
llama-cli -hf NAKSTStudio/chess-gemma-commentary:F16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf NAKSTStudio/chess-gemma-commentary:F16
# Run inference directly in the terminal:
llama-cli -hf NAKSTStudio/chess-gemma-commentary:F16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf NAKSTStudio/chess-gemma-commentary:F16
# Run inference directly in the terminal:
./llama-cli -hf NAKSTStudio/chess-gemma-commentary:F16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf NAKSTStudio/chess-gemma-commentary:F16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf NAKSTStudio/chess-gemma-commentary:F16

Use Docker

docker model run hf.co/NAKSTStudio/chess-gemma-commentary:F16

LM Studio
Jan
Ollama
How to use NAKSTStudio/chess-gemma-commentary with Ollama:
```
ollama run hf.co/NAKSTStudio/chess-gemma-commentary:F16
```

Unsloth Studio new

How to use NAKSTStudio/chess-gemma-commentary with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for NAKSTStudio/chess-gemma-commentary to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for NAKSTStudio/chess-gemma-commentary to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for NAKSTStudio/chess-gemma-commentary to start chatting

Docker Model Runner
How to use NAKSTStudio/chess-gemma-commentary with Docker Model Runner:
```
docker model run hf.co/NAKSTStudio/chess-gemma-commentary:F16
```

Lemonade

How to use NAKSTStudio/chess-gemma-commentary with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull NAKSTStudio/chess-gemma-commentary:F16

Run and chat with the model

lemonade run user.chess-gemma-commentary-F16

List all available models

lemonade list

Typescript support?

by randobobot - opened Nov 9, 2025

Discussion

randobobot

Nov 9, 2025

how can this model be used in typescript?

NAKSTStudio

Owner Nov 10, 2025

Hi! Thanks for your interest in using the chess-gemma-commentary model with TypeScript. Here are your main options:

Browser-Based (.task version + Gemma library)
The .task version of the model is available on Hugging Face and can be integrated directly in the browser using Google's official Gemma documentation. Note that this requires a GPU-enabled browser for loading the model.
Backend Integration (GGUF + Ollama)
For production TypeScript apps, I recommend using the GGUF version via Ollama, which has an official JavaScript/TypeScript client library. You can run the model locally on your backend and connect your TypeScript frontend through REST endpoints.

Example using Ollama's TypeScript library:

import { Ollama } from 'ollama'

const client = new Ollama({ host: 'http://localhost:11434' })

const response = await client.chat({
  model: 'chess-gemma-commentary',
  messages: [
    { role: 'system', content: 'Your system prompt here' },
    { role: 'user', content: 'FEN position and move data' }
  ]
})

console.log(response.message.content)

Hugging Face Inference API
You can also use the Hugging Face Inference API with TypeScript for serverless deployment:

import { HfInference } from '@huggingface/inference'

const hf = new HfInference('your_hf_token')

const result = await hf.textGeneration({
  model: 'NAKSTStudio/chess-gemma-commentary',
  inputs: 'Your chess data input'
})

I'm planning to release GGUF files in different quantizations soon for easier Ollama integration. Let me know if you need more specific guidance!

Best,
NAKST Studio

randobobot

Nov 11, 2025

I tried the gguf models, I tried the f16 one, and it seems to be working. I'm not sure what the difference is between the formats. Which one is most recommended?

NAKSTStudio

Owner Nov 11, 2025

Hi! Great to hear the F16 model is working well for you – thanks for testing it out!

The difference between the quantization formats comes down to a trade-off between precision, speed, and model size:

F16 (Float16): This uses 16-bit floating point precision. It offers the highest quality and accuracy, closest to the original model, but requires more memory and storage. Best for systems with adequate RAM/VRAM where you prioritize maximum accuracy.

INT8_0 (8-bit Integer): This quantizes weights to 8-bit integers, reducing the model size by roughly half compared to F16 while maintaining very good quality. It's a balanced option that works well on most systems – good mix of performance and accuracy.

INT4 (4-bit Integer): This is the most compressed version, using 4-bit quantization. The model becomes significantly smaller and faster, making it ideal for edge devices, mobile deployments, or systems with limited resources. There's a slight precision drop, but for many use cases, the quality remains acceptable.

Recommendation:

If you have sufficient resources (desktop, server, or powerful laptop), start with F16 or INT8_0 for best quality
For edge devices, mobile apps, or resource-constrained environments, use INT8_0 or INT4
For production TypeScript/browser deployments via Ollama, INT8_0 is usually the sweet spot

It really depends on your specific needs – feel free to experiment with different quantizations to find what works best for your use case!

Best,
NAKST Studio

randobobot changed discussion status to closed Nov 15, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment