FINSTROM-AI-V1.5

FINSTROM-AI-V1.5 is a Qwen2-style causal language model with both Transformers weights and a GGUF build for local inference.

Files

  • model.safetensors - Transformers model weights.
  • finstrom-ai-v1.f16.gguf - GGUF build for Ollama, llama.cpp, LM Studio, and similar runtimes.
  • tokenizer.json, tokenizer_config.json, chat_template.jinja - tokenizer and chat formatting files.

Run With Ollama

If Ollama is installed and on your PATH:

ollama run hf.co/EREN121232/FINSTROM-AI-V1.5:finstrom-ai-v1.f16.gguf

On Windows, if ollama is installed but not on PATH:

& "$env:LOCALAPPDATA\Programs\Ollama\ollama.exe" run hf.co/EREN121232/FINSTROM-AI-V1.5:finstrom-ai-v1.f16.gguf

You can also pull it first:

ollama pull hf.co/EREN121232/FINSTROM-AI-V1.5:finstrom-ai-v1.f16.gguf

After pulling, you can create a shorter local alias:

hf download EREN121232/FINSTROM-AI-V1.5 Modelfile.hf --local-dir finstrom-ai-v1.5
ollama create finstrom-ai-v1.5 -f finstrom-ai-v1.5/Modelfile.hf
ollama run finstrom-ai-v1.5

Local Modelfile Import

Download the GGUF and Modelfile, then create a shorter local model name:

hf download EREN121232/FINSTROM-AI-V1.5 finstrom-ai-v1.f16.gguf Modelfile --local-dir finstrom-ai-v1.5
cd finstrom-ai-v1.5
ollama create finstrom-ai-v1.5 -f Modelfile
ollama run finstrom-ai-v1.5

The included Modelfile keeps the default context practical for local machines. The model config advertises a larger maximum context, so increase num_ctx only if your hardware has enough memory.

Use In Apps

Ollama serves a local API at http://localhost:11434 after the Ollama app or daemon is running.

Python

import requests

response = requests.post(
    "http://localhost:11434/api/generate",
    json={
        "model": "finstrom-ai-v1.5",
        "prompt": "Hello FINSTROM, introduce yourself.",
        "stream": False,
    },
    timeout=120,
)

print(response.json()["response"])

JavaScript

const response = await fetch("http://localhost:11434/api/generate", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    model: "finstrom-ai-v1.5",
    prompt: "Hello FINSTROM, introduce yourself.",
    stream: false
  })
});

const data = await response.json();
console.log(data.response);

OpenAI-Compatible Local Client

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama",
)

completion = client.chat.completions.create(
    model="finstrom-ai-v1.5",
    messages=[
        {"role": "user", "content": "Write one short test response."}
    ],
)

print(completion.choices[0].message.content)

Notes

  • The GGUF file is F16, so it is higher quality but larger than a Q4/Q5 quantized build.
  • If local memory is tight, publish an additional quantized GGUF such as Q4_K_M.
  • For production apps, use the Ollama API model string exactly as shown above or create a local alias with the included Modelfile.
Downloads last month
82
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support