Instructions to use czyarl/CogFlow-Llama3.1-8b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use czyarl/CogFlow-Llama3.1-8b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="czyarl/CogFlow-Llama3.1-8b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("czyarl/CogFlow-Llama3.1-8b")
model = AutoModelForCausalLM.from_pretrained("czyarl/CogFlow-Llama3.1-8b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use czyarl/CogFlow-Llama3.1-8b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "czyarl/CogFlow-Llama3.1-8b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "czyarl/CogFlow-Llama3.1-8b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/czyarl/CogFlow-Llama3.1-8b

SGLang

How to use czyarl/CogFlow-Llama3.1-8b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "czyarl/CogFlow-Llama3.1-8b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "czyarl/CogFlow-Llama3.1-8b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "czyarl/CogFlow-Llama3.1-8b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "czyarl/CogFlow-Llama3.1-8b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use czyarl/CogFlow-Llama3.1-8b with Docker Model Runner:
```
docker model run hf.co/czyarl/CogFlow-Llama3.1-8b
```

CogFlow-Llama3.1-8B

CogFlow-Llama3.1-8B is a language model fine-tuned with Cognitive Reasoning, a paradigm modeled on human social cognition. It enables the model to navigate complex social situations by generating a structured "cognitive flow" of interconnected cognitive units (e.g., observation, attribution, behavior).

This model is based on Llama-3.1-8B-Instruct and trained via Supervised Fine-Tuning (SFT) followed by Reinforcement Learning (RL) with a preference-based reward model.

📄 Paper | 💻 Code

How to Use

Generation Model

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("thu-coai/CogFlow-Llama3.1-8b")
model = AutoModelForCausalLM.from_pretrained("thu-coai/CogFlow-Llama3.1-8b")

prompt = """On a Saturday morning, Mark and Sarah joined a neighborhood garden cleanup. Mark borrowed Sarah's shovel to dig near the oak tree, returning it with dirt on the handle. As rain clouds gathered, Sarah rushed to store tools in the community shed while Mark checked tomato plants. Young Tom approached Sarah, eyeing the muddy shovel she was wiping with a blue rag.
What does Tom say to Sarah?
Answer in one short sentence."""

messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))

Example Output

Input:

On a Saturday morning, Mark and Sarah joined a neighborhood garden cleanup. Mark borrowed Sarah's shovel to dig near the oak tree, returning it with dirt on the handle. As rain clouds gathered, Sarah rushed to store tools in the community shed while Mark checked tomato plants. Young Tom approached Sarah, eyeing the muddy shovel she was wiping with a blue rag. What does Tom say to Sarah? Answer in one short sentence.

Output:

Mark borrowed Sarah's shovel and returned it with dirt on its handle after digging near an oak tree. Sarah is wiping the muddy shovel with a blue rag while rushing to store tools in the community shed as rain clouds gather, simultaneously with Mark checking tomato plants. Young Tom approaches Sarah while eyeing the muddy shovel she is holding. Tom's attention is causally attributable to the shovel's highly visible mud deposit and its ongoing active handling by Sarah, as the dirt—being the shovel's most prominent feature and focus of her cleaning action—serves as the primary sensory trigger for a young observer approaching this scene. This specific stimulus drives evaluation of his motivation toward inquiry about the mess, since conspicuous dirt naturally prompts questions from curious children encountering tools mid-cleaning, particularly in a communal setting where ownership or responsibility might concern them.

</Attribution and Evaluation>

To express curiosity about the muddy shovel's origin since young children typically ask simple direct questions when noticing unusual dirt on objects, Tom might ask Sarah about the source of the mud while she cleans it.

Based on the stimulus triggering his curiosity about the shovel's prominent mud, Tom asks a simple, direct question about its origin: "Where did all that mud come from?"

Cognitive Flow Tags

The model generates reasoning through structured cognitive units:

Tag	Description
`<Observation>`	Identifies key elements and facts from the social scene
`<Attribution and Evaluation>`	Infers causes, motivations, and emotional states of characters
`<Behavior>`	Predicts likely actions or responses
`<Self-regulatory>`	Reflects on and refines previous reasoning steps

Training Pipeline

SFT: Fine-tuned on cognitive reasoning demonstrations generated via tree-structured simulation
Reward Model: Trained a comparative preference RM to rank response quality
RL: Optimized with GRPO using the preference RM as a reward signal

For full training details, see the CogFlow repository.

Citation

If you use this model, please cite our paper:

@article{cogflow2025,
  title={Think Socially via Cognitive Reasoning},
  author={CogFlow Team},
  journal={arXiv preprint arXiv:2509.22546},
  year={2025},
  url={https://arxiv.org/abs/2509.22546}
}