Instructions to use prithivMLmods/Cetus-Qwen3_4B-GeneralThought with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use prithivMLmods/Cetus-Qwen3_4B-GeneralThought with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="prithivMLmods/Cetus-Qwen3_4B-GeneralThought")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("prithivMLmods/Cetus-Qwen3_4B-GeneralThought")
model = AutoModelForCausalLM.from_pretrained("prithivMLmods/Cetus-Qwen3_4B-GeneralThought")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use prithivMLmods/Cetus-Qwen3_4B-GeneralThought with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "prithivMLmods/Cetus-Qwen3_4B-GeneralThought"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Cetus-Qwen3_4B-GeneralThought",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/prithivMLmods/Cetus-Qwen3_4B-GeneralThought

SGLang

How to use prithivMLmods/Cetus-Qwen3_4B-GeneralThought with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "prithivMLmods/Cetus-Qwen3_4B-GeneralThought" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Cetus-Qwen3_4B-GeneralThought",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "prithivMLmods/Cetus-Qwen3_4B-GeneralThought" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Cetus-Qwen3_4B-GeneralThought",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use prithivMLmods/Cetus-Qwen3_4B-GeneralThought with Docker Model Runner:
```
docker model run hf.co/prithivMLmods/Cetus-Qwen3_4B-GeneralThought
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Cetus-Qwen3_4B-GeneralThought

Cetus-Qwen3_4B-GeneralThought is a fine-tuned variant of the Qwen3-4B architecture, trained on the GeneralThought-430K dataset to enhance broad-spectrum reasoning, logical coherence, and structured multi-domain problem solving. This model is optimized for general-purpose tasks including instruction following, technical question answering, and reasoning-based generation across diverse knowledge fields.

[ GGUF ] : https://huggingface.co/prithivMLmods/Cetus-Qwen3_4B-GeneralThought-Q4_K_M-GGUF

Key Features

Broad Reasoning with GeneralThought-430K Trained on a carefully curated 430,000-sample dataset—GeneralThought-430K—spanning:
- Mathematical and logical reasoning
- Scientific and factual QA
- Multistep instructions and problem decomposition
- Abstract and applied reasoning tasks
Multi-Domain Task Versatility Equipped to handle use cases across STEM, humanities, code reasoning, and general knowledge workflows with consistency and structure.
Structured Output Control Outputs well-formatted answers in Markdown, LaTeX, JSON, and tabular formats, suitable for documentation, education, and technical reporting.
Instruction-Following with Multi-Step Fidelity Capable of following detailed prompts involving layered reasoning or procedural guidance with high step-to-step coherence.
Multilingual and Cross-Cultural Understanding Supports over 20 languages for global comprehension tasks and technical translation in education and public sector applications.
Efficient Qwen3-4B Base Offers an optimal balance between intelligence and computational efficiency—ideal for deployment on consumer-grade GPUs and scalable services.

Quickstart with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "prithivMLmods/Cetus-Qwen3_4B-GeneralThought"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Explain the concept of entropy in thermodynamics in simple terms."

messages = [
    {"role": "system", "content": "You are a general-purpose reasoning assistant trained on GeneralThought-430K."},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

Intended Use

General reasoning and educational Q&A
Technical concept explanation and summarization
Structured content generation in Markdown, LaTeX, and JSON
Code and logic support in instruction-rich workflows
Multi-language academic and public knowledge tools

Limitations

Not optimized for purely creative or fictional content
Smaller context window compared to frontier models
May be sensitive to ambiguous or poorly structured prompts
Can occasionally hallucinate in niche or adversarial scenarios

References

Qwen2.5 Technical Report – https://arxiv.org/pdf/2412.15115
YaRN: Context Window Extension – https://arxiv.org/pdf/2309.00071
GeneralThought-430K Dataset – (internal/prepublication dataset source, if applicable)