Improve model card: Add comprehensive information and usage

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +105 -3
README.md CHANGED
@@ -1,3 +1,105 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:e06a4870fe02aa52095717ce69d4dea985e6f10849ffbdb472864ce8ba43b259
3
- size 85
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: cognitive-kernel-pro
4
+ license_link: LICENSE
5
+ pipeline_tag: image-text-to-text
6
+ library_name: transformers
7
+ ---
8
+
9
+ # Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training
10
+
11
+ This repository hosts the **Qwen3-8B-CK-Pro** model, an 8B-parameter open-source agent foundation model developed as part of the **Cognitive Kernel-Pro** framework. Cognitive Kernel-Pro is designed to democratize the development and evaluation of advanced AI agents, focusing on open-source and free tools to enable complex reasoning, web interaction, coding, and autonomous research capabilities. It explores high-quality training data curation for Agent Foundation Models and novel strategies for agent test-time reflection and voting, achieving state-of-the-art results on GAIA.
12
+
13
+ - 📚 **Paper**: [Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training](https://huggingface.co/papers/2508.00414)
14
+ - 🌐 **Project Page**: [https://osatlas.github.io/](https://osatlas.github.io/)
15
+ - 💻 **Code**: [https://github.com/OS-Copilot/OS-Atlas](https://github.com/OS-Copilot/OS-Atlas)
16
+
17
+ <p align="center"><img src="https://github.com/OS-Copilot/OS-Atlas/raw/main/results.png" alt="Cognitive Kernel-Pro Overview" width="90%"/></p>
18
+
19
+ ## Quick Start
20
+
21
+ This model processes GUI screenshots along with text instructions to produce grounded actions or text responses. It is compatible with the Hugging Face `transformers` library.
22
+
23
+ First, ensure you have the necessary dependencies installed:
24
+
25
+ ```bash
26
+ pip install transformers torch Pillow
27
+ ```
28
+
29
+ Here is a Python code snippet demonstrating how to perform inference with the model:
30
+
31
+ ```python
32
+ import torch
33
+ from PIL import Image
34
+ from transformers import AutoModelForCausalLM, AutoProcessor
35
+
36
+ # Load the model and processor
37
+ model_id = "CognitiveKernel/Qwen3-8B-CK-Pro"
38
+ model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto", trust_remote_code=True)
39
+ processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
40
+
41
+ # Example image and question
42
+ # Replace with your actual image path or use a dummy image for testing
43
+ # image_path = "./examples/images/web_dfacd48d-d2c2-492f-b94c-41e6a34ea99f.png" # Example from GitHub repo
44
+ # image = Image.open(image_path).convert('RGB')
45
+
46
+ # Or use a dummy image:
47
+ dummy_image = Image.new('RGB', (500, 500), color = 'red') # For testing without a file
48
+ image = dummy_image
49
+
50
+ question = "In the screenshot of this web page, please give me the coordinates of the element I want to click on according to my instructions(with point).\"'Champions League' link\""
51
+
52
+ # Prepare messages for chat template
53
+ messages = [
54
+ {
55
+ "role": "user",
56
+ "content": [
57
+ {"type": "image", "image": image},
58
+ {"type": "text", "text": question},
59
+ ],
60
+ }
61
+ ]
62
+
63
+ # Apply chat template and process inputs
64
+ text = processor.apply_chat_template(
65
+ messages,
66
+ tokenize=False,
67
+ add_generation_prompt=True
68
+ )
69
+
70
+ inputs = processor(
71
+ text=[text],
72
+ images=[image],
73
+ padding=True,
74
+ return_tensors="pt"
75
+ )
76
+ inputs = {k: v.to(model.device) for k, v in inputs.items()}
77
+
78
+ # Generate response
79
+ generated_ids = model.generate(**inputs, max_new_tokens=1024, do_sample=False)
80
+
81
+ # Decode and print the output
82
+ generated_ids_trimmed = [
83
+ out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
84
+ ]
85
+ output_text = processor.batch_decode(generated_ids_trimmed, skip_special_tokens=False, clean_up_tokenization_spaces=False)[0]
86
+
87
+ print(f"User: {question}
88
+ Assistant: {output_text}")
89
+ ```
90
+
91
+ ## Citation
92
+
93
+ If you find this work helpful, please cite our paper:
94
+
95
+ ```bibtex
96
+ @misc{fang2025cognitivekernelpro,
97
+ title={Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training},
98
+ author={Tianqing Fang and Zhisong Zhang and Xiaoyang Wang and Rui Wang and Can Qin and Yuxuan Wan and Jun-Yu Ma and Ce Zhang and Jiaqi Chen and Xiyun Li and Hongming Zhang and Haitao Mi and Dong Yu},
99
+ year={2025},
100
+ eprint={2508.00414},
101
+ archivePrefix={arXiv},
102
+ primaryClass={cs.AI},
103
+ url={https://arxiv.org/abs/2508.00414},
104
+ }
105
+ ```