Typescript support?

#1
by randobobot - opened

how can this model be used in typescript?

Hi! Thanks for your interest in using the chess-gemma-commentary model with TypeScript. Here are your main options:

  1. Browser-Based (.task version + Gemma library)
    The .task version of the model is available on Hugging Face and can be integrated directly in the browser using Google's official Gemma documentation. Note that this requires a GPU-enabled browser for loading the model.

  2. Backend Integration (GGUF + Ollama)
    For production TypeScript apps, I recommend using the GGUF version via Ollama, which has an official JavaScript/TypeScript client library. You can run the model locally on your backend and connect your TypeScript frontend through REST endpoints.

Example using Ollama's TypeScript library:

import { Ollama } from 'ollama'

const client = new Ollama({ host: 'http://localhost:11434' })

const response = await client.chat({
  model: 'chess-gemma-commentary',
  messages: [
    { role: 'system', content: 'Your system prompt here' },
    { role: 'user', content: 'FEN position and move data' }
  ]
})

console.log(response.message.content)
  1. Hugging Face Inference API
    You can also use the Hugging Face Inference API with TypeScript for serverless deployment:
import { HfInference } from '@huggingface/inference'

const hf = new HfInference('your_hf_token')

const result = await hf.textGeneration({
  model: 'NAKSTStudio/chess-gemma-commentary',
  inputs: 'Your chess data input'
})

I'm planning to release GGUF files in different quantizations soon for easier Ollama integration. Let me know if you need more specific guidance!

Best,
NAKST Studio

I tried the gguf models, I tried the f16 one, and it seems to be working. I'm not sure what the difference is between the formats. Which one is most recommended?

Hi! Great to hear the F16 model is working well for you – thanks for testing it out!

The difference between the quantization formats comes down to a trade-off between precision, speed, and model size:

F16 (Float16): This uses 16-bit floating point precision. It offers the highest quality and accuracy, closest to the original model, but requires more memory and storage. Best for systems with adequate RAM/VRAM where you prioritize maximum accuracy.

INT8_0 (8-bit Integer): This quantizes weights to 8-bit integers, reducing the model size by roughly half compared to F16 while maintaining very good quality. It's a balanced option that works well on most systems – good mix of performance and accuracy.

INT4 (4-bit Integer): This is the most compressed version, using 4-bit quantization. The model becomes significantly smaller and faster, making it ideal for edge devices, mobile deployments, or systems with limited resources. There's a slight precision drop, but for many use cases, the quality remains acceptable.

Recommendation:

  • If you have sufficient resources (desktop, server, or powerful laptop), start with F16 or INT8_0 for best quality
  • For edge devices, mobile apps, or resource-constrained environments, use INT8_0 or INT4
  • For production TypeScript/browser deployments via Ollama, INT8_0 is usually the sweet spot

It really depends on your specific needs – feel free to experiment with different quantizations to find what works best for your use case!

Best,
NAKST Studio

randobobot changed discussion status to closed

Sign up or log in to comment