Instructions to use prithivMLmods/Gliese-4B-OSS-0410 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use prithivMLmods/Gliese-4B-OSS-0410 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="prithivMLmods/Gliese-4B-OSS-0410")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("prithivMLmods/Gliese-4B-OSS-0410")
model = AutoModelForCausalLM.from_pretrained("prithivMLmods/Gliese-4B-OSS-0410", device_map="auto")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use prithivMLmods/Gliese-4B-OSS-0410 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "prithivMLmods/Gliese-4B-OSS-0410"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Gliese-4B-OSS-0410",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/prithivMLmods/Gliese-4B-OSS-0410

SGLang

How to use prithivMLmods/Gliese-4B-OSS-0410 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "prithivMLmods/Gliese-4B-OSS-0410" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Gliese-4B-OSS-0410",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "prithivMLmods/Gliese-4B-OSS-0410" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Gliese-4B-OSS-0410",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use prithivMLmods/Gliese-4B-OSS-0410 with Docker Model Runner:
```
docker model run hf.co/prithivMLmods/Gliese-4B-OSS-0410
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Gliese-4B-OSS-0410

Gliese-4B-OSS-0410 is a reasoning-focused model fine-tuned on Qwen-4B for enhanced reasoning and polished token probability distributions, delivering balanced multilingual generation across mathematics and general-purpose reasoning tasks. The model is fine-tuned on curated GPT-OSS synthetic dataset entries, improving its ability to handle structured reasoning, probabilistic inference, and multilingual tasks with precision.

GGUF: https://huggingface.co/prithivMLmods/Gliese-4B-OSS-0410-GGUF

Key Features

Enhanced Reasoning Precision Refined token probability distributions improve reasoning quality and ensure balanced, context-aware outputs.
Event Simulation and Logical Analysis Capable of modeling random events, probability-driven reasoning, and structured decision-making with strong logical consistency.
Multilingual Mathematical and General-Purpose Problem Solving Delivers robust performance in mathematics, probability, and structured multilingual tasks, enabling broad applicability in research and education.
Hybrid Symbolic–Probabilistic Thinking Combines structured logic, probabilistic inference, and reasoning fluency to improve performance on uncertainty-driven tasks.
Structured Output Generation Generates well-formatted outputs in LaTeX, Markdown, JSON, CSV, and YAML, supporting technical workflows and data-oriented research.
Optimized Lightweight Footprint With 4B parameters, it runs efficiently on mid-range GPUs, offline clusters, and edge devices without compromising reasoning performance.

Quickstart with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "prithivMLmods/Gliese-4B-OSS-0410"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Simulate the probability of rolling two dice and getting a sum greater than 9. Show the reasoning."

messages = [
    {"role": "system", "content": "You are a reasoning tutor skilled in probability, logic, and multilingual problem-solving."},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

Intended Use

Balanced multilingual reasoning and probability modeling
Event simulation, uncertainty analysis, and structured problem solving
Educational and research-focused reasoning tasks
Deployment in mid-resource environments with efficient inference
Structured technical content and data format generation

Limitations

Primarily focused on reasoning and mathematics; less suited for creative writing
Despite its 4B size, extremely complex multi-hop reasoning tasks may remain challenging
Prioritizes structured reasoning and probabilistic accuracy over conversational tone
May produce inconsistent results with very long contexts or cross-domain multi-document inputs