nielsr HF Staff commited on
Commit
b8b8d74
·
verified ·
1 Parent(s): 9be3025

Improve model card: Add comprehensive information and usage

Browse files

This PR significantly enhances the model card for the `CognitiveKernel/Qwen3-8B-CK-Pro` model by adding crucial metadata and comprehensive information:

- **Metadata**:
- Adds `pipeline_tag: image-text-to-text`, which correctly categorizes the model for tasks involving both images and text, improving discoverability on the Hugging Face Hub.
- Adds `library_name: transformers`, indicating the model's compatibility with the Hugging Face Transformers library.
- **Paper Link**: Includes a direct link to the research paper: [Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training](https://huggingface.co/papers/2508.00414).
- **Project Page**: Adds a link to the official project homepage: [https://osatlas.github.io/](https://osatlas.github.io/).
- **Code Repository**: Provides a direct link to the GitHub repository: [https://github.com/OS-Copilot/OS-Atlas](https://github.com/OS-Copilot/OS-Atlas).
- **Model Description and Usage**: Adds a clear overview of the model's capabilities and a practical Python code snippet for inference using the `transformers` library, complete with an example image input and question.
- **Visual Aid**: Includes an image from the project's GitHub repository for better visual context.
- **Citation**: Adds the official BibTeX citation for the paper.

These additions will greatly improve the visibility, usability, and reproducibility of the model for the community.

Files changed (1) hide show
  1. README.md +105 -3
README.md CHANGED
@@ -1,3 +1,105 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:e06a4870fe02aa52095717ce69d4dea985e6f10849ffbdb472864ce8ba43b259
3
- size 85
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: cognitive-kernel-pro
4
+ license_link: LICENSE
5
+ pipeline_tag: image-text-to-text
6
+ library_name: transformers
7
+ ---
8
+
9
+ # Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training
10
+
11
+ This repository hosts the **Qwen3-8B-CK-Pro** model, an 8B-parameter open-source agent foundation model developed as part of the **Cognitive Kernel-Pro** framework. Cognitive Kernel-Pro is designed to democratize the development and evaluation of advanced AI agents, focusing on open-source and free tools to enable complex reasoning, web interaction, coding, and autonomous research capabilities. It explores high-quality training data curation for Agent Foundation Models and novel strategies for agent test-time reflection and voting, achieving state-of-the-art results on GAIA.
12
+
13
+ - 📚 **Paper**: [Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training](https://huggingface.co/papers/2508.00414)
14
+ - 🌐 **Project Page**: [https://osatlas.github.io/](https://osatlas.github.io/)
15
+ - 💻 **Code**: [https://github.com/OS-Copilot/OS-Atlas](https://github.com/OS-Copilot/OS-Atlas)
16
+
17
+ <p align="center"><img src="https://github.com/OS-Copilot/OS-Atlas/raw/main/results.png" alt="Cognitive Kernel-Pro Overview" width="90%"/></p>
18
+
19
+ ## Quick Start
20
+
21
+ This model processes GUI screenshots along with text instructions to produce grounded actions or text responses. It is compatible with the Hugging Face `transformers` library.
22
+
23
+ First, ensure you have the necessary dependencies installed:
24
+
25
+ ```bash
26
+ pip install transformers torch Pillow
27
+ ```
28
+
29
+ Here is a Python code snippet demonstrating how to perform inference with the model:
30
+
31
+ ```python
32
+ import torch
33
+ from PIL import Image
34
+ from transformers import AutoModelForCausalLM, AutoProcessor
35
+
36
+ # Load the model and processor
37
+ model_id = "CognitiveKernel/Qwen3-8B-CK-Pro"
38
+ model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto", trust_remote_code=True)
39
+ processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
40
+
41
+ # Example image and question
42
+ # Replace with your actual image path or use a dummy image for testing
43
+ # image_path = "./examples/images/web_dfacd48d-d2c2-492f-b94c-41e6a34ea99f.png" # Example from GitHub repo
44
+ # image = Image.open(image_path).convert('RGB')
45
+
46
+ # Or use a dummy image:
47
+ dummy_image = Image.new('RGB', (500, 500), color = 'red') # For testing without a file
48
+ image = dummy_image
49
+
50
+ question = "In the screenshot of this web page, please give me the coordinates of the element I want to click on according to my instructions(with point).\"'Champions League' link\""
51
+
52
+ # Prepare messages for chat template
53
+ messages = [
54
+ {
55
+ "role": "user",
56
+ "content": [
57
+ {"type": "image", "image": image},
58
+ {"type": "text", "text": question},
59
+ ],
60
+ }
61
+ ]
62
+
63
+ # Apply chat template and process inputs
64
+ text = processor.apply_chat_template(
65
+ messages,
66
+ tokenize=False,
67
+ add_generation_prompt=True
68
+ )
69
+
70
+ inputs = processor(
71
+ text=[text],
72
+ images=[image],
73
+ padding=True,
74
+ return_tensors="pt"
75
+ )
76
+ inputs = {k: v.to(model.device) for k, v in inputs.items()}
77
+
78
+ # Generate response
79
+ generated_ids = model.generate(**inputs, max_new_tokens=1024, do_sample=False)
80
+
81
+ # Decode and print the output
82
+ generated_ids_trimmed = [
83
+ out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
84
+ ]
85
+ output_text = processor.batch_decode(generated_ids_trimmed, skip_special_tokens=False, clean_up_tokenization_spaces=False)[0]
86
+
87
+ print(f"User: {question}
88
+ Assistant: {output_text}")
89
+ ```
90
+
91
+ ## Citation
92
+
93
+ If you find this work helpful, please cite our paper:
94
+
95
+ ```bibtex
96
+ @misc{fang2025cognitivekernelpro,
97
+ title={Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training},
98
+ author={Tianqing Fang and Zhisong Zhang and Xiaoyang Wang and Rui Wang and Can Qin and Yuxuan Wan and Jun-Yu Ma and Ce Zhang and Jiaqi Chen and Xiyun Li and Hongming Zhang and Haitao Mi and Dong Yu},
99
+ year={2025},
100
+ eprint={2508.00414},
101
+ archivePrefix={arXiv},
102
+ primaryClass={cs.AI},
103
+ url={https://arxiv.org/abs/2508.00414},
104
+ }
105
+ ```