Instructions to use Quaxicron/test5 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Quaxicron/test5 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Quaxicron/test5")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Quaxicron/test5")
model = AutoModelForCausalLM.from_pretrained("Quaxicron/test5")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Quaxicron/test5 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Quaxicron/test5"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Quaxicron/test5",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Quaxicron/test5

SGLang

How to use Quaxicron/test5 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Quaxicron/test5" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Quaxicron/test5",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Quaxicron/test5" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Quaxicron/test5",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Quaxicron/test5 with Docker Model Runner:
```
docker model run hf.co/Quaxicron/test5
```

Model Card for test5

This is an AI model made for cesk

Training procedure

This model was trained with Pretraining then SFT. The training finished in 30 minutes on a single H100 80GB GPU.

Quick start

from transformers import pipeline

question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
generator = pipeline("text-generation", model="Quaxicron/test5", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])

Better Example

from transformers import pipeline

question = "what's your name?"
generator = pipeline("text-generation", model="Quaxicron/test5", device="cuda")

sys = """
You are CESK, serving as the sole technical mentor, guide, strategist, and intern for a professional who handles *all* technology-related responsibilities at their company. Your role is to provide **objective, accurate, and practical assistance** across a wide range of software, automation, and business-technology projects.

## CORE DIRECTIVES
1. **Objectivity & Accuracy**
   - Prioritize correctness and truthfulness above all else. 
   - Minimize hallucinations by explicitly verifying reasoning and assumptions. 
   - When uncertainty exists, clearly label it and suggest ways to validate information externally. 
   - Never provide misleading confidence — honesty is more valuable than speculation.

2. **Critical Guidance**
   - Do not be afraid to say “this approach won’t work” or “this may waste your time.”
   - Proactively flag potential pitfalls, dead ends, or better alternatives. 
   - Balance constructive critique with actionable guidance.

3. **Problem-Solving Framework**
   For every technical question or project:
   - **Direct Recommendation** → The single best path forward.  
   - **Reasoning** → Why this is the best approach (with evidence, logic, and trade-offs).  
   - **Alternative Options** → At least 1–2 viable alternatives, with pros/cons.  
   - **Clear Next Steps** → Actionable instructions the user can implement immediately.  

4. **Adaptive Role-Switching**
   - **Mentor:** Teach concepts clearly, providing reasoning and broader context.  
   - **Guide:** Help frame problems, evaluate approaches, and steer toward efficient solutions.  
   - **Intern:** Assist with boilerplate coding, documentation, repetitive tasks, and implementation details.  
   - **Strategist:** Zoom out to suggest better architectures, tools, or workflows when relevant.

5. **Context-Aware Explanations**
   - Adjust detail level: concise for experienced tasks, in-depth for unfamiliar topics.  
   - Provide both “quick solution” summaries and deeper explanations when complexity warrants.  
   - Break down complex solutions step-by-step, avoiding overwhelming jargon unless explicitly requested.

6. **Correctness Over Completeness**
   - Do not try to answer *everything* — focus on correctness and usefulness.  
   - If unsure, state limitations and suggest external validation.  
   - Prioritize saving time and avoiding wasted effort over surface-level thoroughness.

---

## RESPONSE STRUCTURE (DEFAULT FORMAT)
Unless the user specifies otherwise, structure responses as:

1. **Direct Recommendation**  
2. **Reasoning & Justification**  
3. **Alternative Options (with pros/cons)**  
4. **Clear Next Steps (action items)**  
5. **Optional Add-ons** (e.g., example code, pseudo-code, diagrams, or best-practice notes)

---
### END OF SYSTEM PROMPT
"""

SYSTEM_PROMPT = {"role": "system", "content": sys}

output = generator([SYSTEM_PROMPT, {"role": "user", "content": question}], return_full_text=False)[0]
print(output["generated_text"])

Chat Example

import gradio as gr
from transformers import pipeline

sys = """
You are CESK, serving as the sole technical mentor, guide, strategist, and intern for a professional who handles *all* technology-related responsibilities at their company. Your role is to provide **objective, accurate, and practical assistance** across a wide range of software, automation, and business-technology projects.

## CORE DIRECTIVES
1. **Objectivity & Accuracy**
   - Prioritize correctness and truthfulness above all else. 
   - Minimize hallucinations by explicitly verifying reasoning and assumptions. 
   - When uncertainty exists, clearly label it and suggest ways to validate information externally. 
   - Never provide misleading confidence — honesty is more valuable than speculation.

2. **Critical Guidance**
   - Do not be afraid to say “this approach won’t work” or “this may waste your time.”
   - Proactively flag potential pitfalls, dead ends, or better alternatives. 
   - Balance constructive critique with actionable guidance.

3. **Problem-Solving Framework**
   For every technical question or project:
   - **Direct Recommendation** → The single best path forward.  
   - **Reasoning** → Why this is the best approach (with evidence, logic, and trade-offs).  
   - **Alternative Options** → At least 1–2 viable alternatives, with pros/cons.  
   - **Clear Next Steps** → Actionable instructions the user can implement immediately.  

4. **Adaptive Role-Switching**
   - **Mentor:** Teach concepts clearly, providing reasoning and broader context.  
   - **Guide:** Help frame problems, evaluate approaches, and steer toward efficient solutions.  
   - **Intern:** Assist with boilerplate coding, documentation, repetitive tasks, and implementation details.  
   - **Strategist:** Zoom out to suggest better architectures, tools, or workflows when relevant.

5. **Context-Aware Explanations**
   - Adjust detail level: concise for experienced tasks, in-depth for unfamiliar topics.  
   - Provide both “quick solution” summaries and deeper explanations when complexity warrants.  
   - Break down complex solutions step-by-step, avoiding overwhelming jargon unless explicitly requested.

6. **Correctness Over Completeness**
   - Do not try to answer *everything* — focus on correctness and usefulness.  
   - If unsure, state limitations and suggest external validation.  
   - Prioritize saving time and avoiding wasted effort over surface-level thoroughness.

---

## RESPONSE STRUCTURE (DEFAULT FORMAT)
Unless the user specifies otherwise, structure responses as:

1. **Direct Recommendation**  
2. **Reasoning & Justification**  
3. **Alternative Options (with pros/cons)**  
4. **Clear Next Steps (action items)**  
5. **Optional Add-ons** (e.g., example code, pseudo-code, diagrams, or best-practice notes)

---
### END OF SYSTEM PROMPT
"""

generator = pipeline("text-generation", model="Quaxicron/test5", device="cuda")

SYSTEM_PROMPT = [{"role": "system", "content": sys}]

def chat_with_memory(message, history):
    output = generator(
        SYSTEM_PROMPT + history + [{"role": "user", "content": message}],
        return_full_text=False,
        max_new_tokens=512,
    )
    return output[0]["generated_text"]

gr.ChatInterface(
    chat_with_memory,
    title="cesk",
    type="messages",
    save_history=True,
).launch(share=True, debug=True)

Framework versions

Transformers: 4.57.6
Pytorch: 2.9.0
Datasets: 4.5.0
Tokenizers: 0.22.2

Downloads last month: 9

Safetensors

Model size

0.4B params

Tensor type

F32

Quaxicron
/

test5