Instructions to use qrk-labs/akeel-cot with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use qrk-labs/akeel-cot with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="qrk-labs/akeel-cot")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("qrk-labs/akeel-cot", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use qrk-labs/akeel-cot with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "qrk-labs/akeel-cot" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "qrk-labs/akeel-cot", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/qrk-labs/akeel-cot
- SGLang
How to use qrk-labs/akeel-cot with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "qrk-labs/akeel-cot" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "qrk-labs/akeel-cot", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "qrk-labs/akeel-cot" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "qrk-labs/akeel-cot", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use qrk-labs/akeel-cot with Docker Model Runner:
docker model run hf.co/qrk-labs/akeel-cot
Overview
akeel-cot is a research prototype exploring Thought Injection โ a novel approach to retrieval-augmented generation where the model learns to request external knowledge mid-generation using explicit <knowledge> tags.
Unlike traditional RAG (which retrieves before generation) or tool-use (which requires special training), Thought Injection allows the model to:
- Reason about what it knows and doesn't know
- Request specific information at the moment it's needed
- Integrate retrieved context seamlessly into its response
This model is part of QRK Labs' research into human-centric AI systems.
How It Works
User: What is the capital of France?
Model: <think>
The user is asking about the capital of France. This is common knowledge.
</think>
The capital of France is Paris.
User: What were QRK Labs' Q4 2025 revenues?
Model: <think>
This is asking for specific financial data I don't have. I need to retrieve this.
<knowledge>QRK Labs Q4 2025 revenue financial results</knowledge>
[Retrieved: QRK Labs reported Q4 2025 revenues of $2.3M, up 45% YoY...]
</k_res>
Based on the retrieved information, I can now answer.
</think>
QRK Labs reported Q4 2025 revenues of $2.3 million, representing a 45% year-over-year increase.
Architecture
- Base Model: Qwen3-0.6B
- Training: Fine-tuned on thought injection reasoning traces
- Format: ChatML with
<think>,<knowledge>, and</k_res>tags
Intended Use
This is a research prototype for exploring thought injection techniques. It is intended for:
- Academic research on RAG and reasoning
- Experimentation with knowledge-grounded generation
- Understanding model uncertainty and knowledge boundaries
Not intended for production use.
Limitations
- Small model size (0.6B) limits general capabilities
- Requires compatible inference infrastructure to inject retrieved content
- Research prototype โ not optimized for real-world deployment
- May hallucinate or generate incorrect
<knowledge>queries
Citation
If you use this model in your research, please cite:
@misc{akeel-cot-2026,
author = {QRK Labs},
title = {Akeel-CoT: Thought Injection for Grounded Reasoning},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/qrk-labs/akeel-cot}
}
Links
- QRK Labs: qrk.ng
- Research: Coming soon
- Contact: research@qrk.ng