Instructions to use czyarl/CogFlow-Llama3.1-8b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use czyarl/CogFlow-Llama3.1-8b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="czyarl/CogFlow-Llama3.1-8b") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("czyarl/CogFlow-Llama3.1-8b") model = AutoModelForCausalLM.from_pretrained("czyarl/CogFlow-Llama3.1-8b") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use czyarl/CogFlow-Llama3.1-8b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "czyarl/CogFlow-Llama3.1-8b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "czyarl/CogFlow-Llama3.1-8b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/czyarl/CogFlow-Llama3.1-8b
- SGLang
How to use czyarl/CogFlow-Llama3.1-8b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "czyarl/CogFlow-Llama3.1-8b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "czyarl/CogFlow-Llama3.1-8b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "czyarl/CogFlow-Llama3.1-8b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "czyarl/CogFlow-Llama3.1-8b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use czyarl/CogFlow-Llama3.1-8b with Docker Model Runner:
docker model run hf.co/czyarl/CogFlow-Llama3.1-8b
CogFlow-Llama3.1-8B
CogFlow-Llama3.1-8B is a language model fine-tuned with Cognitive Reasoning, a paradigm modeled on human social cognition. It enables the model to navigate complex social situations by generating a structured "cognitive flow" of interconnected cognitive units (e.g., observation, attribution, behavior).
This model is based on Llama-3.1-8B-Instruct and trained via Supervised Fine-Tuning (SFT) followed by Reinforcement Learning (RL) with a preference-based reward model.
How to Use
Generation Model
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("thu-coai/CogFlow-Llama3.1-8b")
model = AutoModelForCausalLM.from_pretrained("thu-coai/CogFlow-Llama3.1-8b")
prompt = """On a Saturday morning, Mark and Sarah joined a neighborhood garden cleanup. Mark borrowed Sarah's shovel to dig near the oak tree, returning it with dirt on the handle. As rain clouds gathered, Sarah rushed to store tools in the community shed while Mark checked tomato plants. Young Tom approached Sarah, eyeing the muddy shovel she was wiping with a blue rag.
What does Tom say to Sarah?
Answer in one short sentence."""
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
Example Output
Input:
On a Saturday morning, Mark and Sarah joined a neighborhood garden cleanup. Mark borrowed Sarah's shovel to dig near the oak tree, returning it with dirt on the handle. As rain clouds gathered, Sarah rushed to store tools in the community shed while Mark checked tomato plants. Young Tom approached Sarah, eyeing the muddy shovel she was wiping with a blue rag. What does Tom say to Sarah? Answer in one short sentence.
Output:
``
Mark borrowed Sarah's shovel and returned it with dirt on its handle after digging near an oak tree. Sarah is wiping the muddy shovel with a blue rag while rushing to store tools in the community shed as rain clouds gather, simultaneously with Mark checking tomato plants. Young Tom approaches Sarah while eyeing the muddy shovel she is holding. Tom's attention is causally attributable to the shovel's highly visible mud deposit and its ongoing active handling by Sarah, as the dirt—being the shovel's most prominent feature and focus of her cleaning action—serves as the primary sensory trigger for a young observer approaching this scene. This specific stimulus drives evaluation of his motivation toward inquiry about the mess, since conspicuous dirt naturally prompts questions from curious children encountering tools mid-cleaning, particularly in a communal setting where ownership or responsibility might concern them.</Attribution and Evaluation>
To express curiosity about the muddy shovel's origin since young children typically ask simple direct questions when noticing unusual dirt on objects, Tom might ask Sarah about the source of the mud while she cleans it.Based on the stimulus triggering his curiosity about the shovel's prominent mud, Tom asks a simple, direct question about its origin: "Where did all that mud come from?"
Cognitive Flow Tags
The model generates reasoning through structured cognitive units:
| Tag | Description |
|---|---|
<Observation> |
Identifies key elements and facts from the social scene |
<Attribution and Evaluation> |
Infers causes, motivations, and emotional states of characters |
<Behavior> |
Predicts likely actions or responses |
<Self-regulatory> |
Reflects on and refines previous reasoning steps |
Training Pipeline
- SFT: Fine-tuned on cognitive reasoning demonstrations generated via tree-structured simulation
- Reward Model: Trained a comparative preference RM to rank response quality
- RL: Optimized with GRPO using the preference RM as a reward signal
For full training details, see the CogFlow repository.
Citation
If you use this model, please cite our paper:
@article{cogflow2025,
title={Think Socially via Cognitive Reasoning},
author={CogFlow Team},
journal={arXiv preprint arXiv:2509.22546},
year={2025},
url={https://arxiv.org/abs/2509.22546}
}
- Downloads last month
- 3
Model tree for czyarl/CogFlow-Llama3.1-8b
Base model
meta-llama/Llama-3.1-8B