Improve model card: Add comprehensive information and usage

This PR significantly enhances the model card for the `CognitiveKernel/Qwen3-8B-CK-Pro` model by adding crucial metadata and comprehensive information:

- **Metadata**:
- Adds `pipeline_tag: image-text-to-text`, which correctly categorizes the model for tasks involving both images and text, improving discoverability on the Hugging Face Hub.
- Adds `library_name: transformers`, indicating the model's compatibility with the Hugging Face Transformers library.
- **Paper Link**: Includes a direct link to the research paper: [Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training](https://huggingface.co/papers/2508.00414).
- **Project Page**: Adds a link to the official project homepage: [https://osatlas.github.io/](https://osatlas.github.io/).
- **Code Repository**: Provides a direct link to the GitHub repository: [https://github.com/OS-Copilot/OS-Atlas](https://github.com/OS-Copilot/OS-Atlas).
- **Model Description and Usage**: Adds a clear overview of the model's capabilities and a practical Python code snippet for inference using the `transformers` library, complete with an example image input and question.
- **Visual Aid**: Includes an image from the project's GitHub repository for better visual context.
- **Citation**: Adds the official BibTeX citation for the paper.

These additions will greatly improve the visibility, usability, and reproducibility of the model for the community.

Files changed (1) hide show

README.md +105 -3

README.md CHANGED Viewed

@@ -1,3 +1,105 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:e06a4870fe02aa52095717ce69d4dea985e6f10849ffbdb472864ce8ba43b259
-size 85

+---
+license: other
+license_name: cognitive-kernel-pro
+license_link: LICENSE
+pipeline_tag: image-text-to-text
+library_name: transformers
+---
+# Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training
+This repository hosts the **Qwen3-8B-CK-Pro** model, an 8B-parameter open-source agent foundation model developed as part of the **Cognitive Kernel-Pro** framework. Cognitive Kernel-Pro is designed to democratize the development and evaluation of advanced AI agents, focusing on open-source and free tools to enable complex reasoning, web interaction, coding, and autonomous research capabilities. It explores high-quality training data curation for Agent Foundation Models and novel strategies for agent test-time reflection and voting, achieving state-of-the-art results on GAIA.
+- 📚 **Paper**: [Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training](https://huggingface.co/papers/2508.00414)
+- 🌐 **Project Page**: [https://osatlas.github.io/](https://osatlas.github.io/)
+- 💻 **Code**: [https://github.com/OS-Copilot/OS-Atlas](https://github.com/OS-Copilot/OS-Atlas)
+<p align="center"><img src="https://github.com/OS-Copilot/OS-Atlas/raw/main/results.png" alt="Cognitive Kernel-Pro Overview" width="90%"/></p>
+## Quick Start
+This model processes GUI screenshots along with text instructions to produce grounded actions or text responses. It is compatible with the Hugging Face `transformers` library.
+First, ensure you have the necessary dependencies installed:
+```bash
+pip install transformers torch Pillow
+```
+Here is a Python code snippet demonstrating how to perform inference with the model:
+```python
+import torch
+from PIL import Image
+from transformers import AutoModelForCausalLM, AutoProcessor
+# Load the model and processor
+model_id = "CognitiveKernel/Qwen3-8B-CK-Pro"
+model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto", trust_remote_code=True)
+processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
+# Example image and question
+# Replace with your actual image path or use a dummy image for testing
+# image_path = "./examples/images/web_dfacd48d-d2c2-492f-b94c-41e6a34ea99f.png" # Example from GitHub repo
+# image = Image.open(image_path).convert('RGB')
+# Or use a dummy image:
+dummy_image = Image.new('RGB', (500, 500), color = 'red') # For testing without a file
+image = dummy_image
+question = "In the screenshot of this web page, please give me the coordinates of the element I want to click on according to my instructions(with point).\"'Champions League' link\""
+# Prepare messages for chat template
+messages = [
+    {
+        "role": "user",
+        "content": [
+            {"type": "image", "image": image},
+            {"type": "text", "text": question},
+        ],
+    }
+]
+# Apply chat template and process inputs
+text = processor.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+inputs = processor(
+    text=[text],
+    images=[image],
+    padding=True,
+    return_tensors="pt"
+)
+inputs = {k: v.to(model.device) for k, v in inputs.items()}
+# Generate response
+generated_ids = model.generate(**inputs, max_new_tokens=1024, do_sample=False)
+# Decode and print the output
+generated_ids_trimmed = [
+    out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
+]
+output_text = processor.batch_decode(generated_ids_trimmed, skip_special_tokens=False, clean_up_tokenization_spaces=False)[0]
+print(f"User: {question}
+Assistant: {output_text}")
+```
+## Citation
+If you find this work helpful, please cite our paper:
+```bibtex
+@misc{fang2025cognitivekernelpro,
+      title={Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training},
+      author={Tianqing Fang and Zhisong Zhang and Xiaoyang Wang and Rui Wang and Can Qin and Yuxuan Wan and Jun-Yu Ma and Ce Zhang and Jiaqi Chen and Xiyun Li and Hongming Zhang and Haitao Mi and Dong Yu},
+      year={2025},
+      eprint={2508.00414},
+      archivePrefix={arXiv},
+      primaryClass={cs.AI},
+      url={https://arxiv.org/abs/2508.00414},
+}
+```