Text Generation
Transformers
Safetensors
English
Russian
qwen3
rag
faithful-qa
occ
conversational
text-generation-inference
Instructions to use useitone/OCC-RAG-0.6B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use useitone/OCC-RAG-0.6B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="useitone/OCC-RAG-0.6B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("useitone/OCC-RAG-0.6B") model = AutoModelForCausalLM.from_pretrained("useitone/OCC-RAG-0.6B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use useitone/OCC-RAG-0.6B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "useitone/OCC-RAG-0.6B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "useitone/OCC-RAG-0.6B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/useitone/OCC-RAG-0.6B
- SGLang
How to use useitone/OCC-RAG-0.6B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "useitone/OCC-RAG-0.6B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "useitone/OCC-RAG-0.6B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "useitone/OCC-RAG-0.6B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "useitone/OCC-RAG-0.6B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use useitone/OCC-RAG-0.6B with Docker Model Runner:
docker model run hf.co/useitone/OCC-RAG-0.6B
Commit ·
42e031d
0
Parent(s):
Duplicate from occ-ai/OCC-RAG-0.6B
Browse filesCo-authored-by: Andrey Galichin <andreuka18@users.noreply.huggingface.co>
- .gitattributes +36 -0
- README.md +159 -0
- chat_template.jinja +20 -0
- config.json +63 -0
- figures/github-mark.png +0 -0
- figures/occ.png +0 -0
- generation_config.json +12 -0
- model.safetensors +3 -0
- tokenizer.json +3 -0
- tokenizer_config.json +44 -0
.gitattributes
ADDED
|
@@ -0,0 +1,36 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.7z filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
+
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
+
*.gz filter=lfs diff=lfs merge=lfs -text
|
| 8 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 9 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 10 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 11 |
+
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
| 12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 14 |
+
*.npy filter=lfs diff=lfs merge=lfs -text
|
| 15 |
+
*.npz filter=lfs diff=lfs merge=lfs -text
|
| 16 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 17 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 18 |
+
*.parquet filter=lfs diff=lfs merge=lfs -text
|
| 19 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 20 |
+
*.pickle filter=lfs diff=lfs merge=lfs -text
|
| 21 |
+
*.pkl filter=lfs diff=lfs merge=lfs -text
|
| 22 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 23 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 24 |
+
*.rar filter=lfs diff=lfs merge=lfs -text
|
| 25 |
+
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
| 26 |
+
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
| 27 |
+
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
| 28 |
+
*.tar filter=lfs diff=lfs merge=lfs -text
|
| 29 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 30 |
+
*.tgz filter=lfs diff=lfs merge=lfs -text
|
| 31 |
+
*.wasm filter=lfs diff=lfs merge=lfs -text
|
| 32 |
+
*.xz filter=lfs diff=lfs merge=lfs -text
|
| 33 |
+
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
+
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
|
@@ -0,0 +1,159 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
- ru
|
| 6 |
+
library_name: transformers
|
| 7 |
+
pipeline_tag: text-generation
|
| 8 |
+
base_model: Qwen/Qwen3-0.6B-Base
|
| 9 |
+
tags:
|
| 10 |
+
- rag
|
| 11 |
+
- faithful-qa
|
| 12 |
+
- occ
|
| 13 |
+
---
|
| 14 |
+
|
| 15 |
+
# OCC-RAG-0.6B
|
| 16 |
+
|
| 17 |
+
<p align="center">
|
| 18 |
+
<img src="figures/occ.png" alt="OCC-RAG" width="320"/>
|
| 19 |
+
</p>
|
| 20 |
+
|
| 21 |
+
<p align="center">
|
| 22 |
+
<a href="https://github.com/optimal-cognitive-core/OCC-RAG"><b>GitHub</b></a> |
|
| 23 |
+
<a href="https://arxiv.org/abs/2606.00683"><b>Technical Report</b></a> |
|
| 24 |
+
<a href="https://cloud.ru/products/evolution-ml-inference"><b>Cloud</b></a>
|
| 25 |
+
</p>
|
| 26 |
+
|
| 27 |
+
**OCC-RAG-0.6B** is a 0.6B-parameter small language model specialized for **faithful, context-grounded question answering**. Along with OCC-RAG-1.7B, it belongs to the first generation of **Optimal Cognitive Core (OCC)** specialized reasoning models. Given a question and a set of sources, it produces a structured reasoning trace with explicit source citations, decides whether the context actually supports an answer, and either answers from the context or abstains.
|
| 28 |
+
|
| 29 |
+
Despite its size, OCC-RAG-0.6B matches or exceeds general-purpose models **2–6× larger** on multi-hop reasoning, faithfulness, and refusal benchmarks. It is mid-trained from `Qwen/Qwen3-0.6B-Base` on a large synthetic corpus of multi-context, multi-hop QA with citation-anchored reasoning traces.
|
| 30 |
+
|
| 31 |
+
## Highlights
|
| 32 |
+
|
| 33 |
+
- **Faithful by design** — answers only from the supplied context; achieves the best faithfulness (lowest memorization ratio) across all evaluated scales, including 32B models.
|
| 34 |
+
- **Calibrated abstention** — outputs `Not enough information` when the context does not support an answer.
|
| 35 |
+
- **Structured, citable reasoning** — every answer comes with a transparent trace (query analysis → source analysis → reasoning → status → answer) that cites sources by id.
|
| 36 |
+
- **Compact** — a small model that delivers chain-of-thought-level transparency at a fraction of full thinking-mode inference cost.
|
| 37 |
+
|
| 38 |
+
## Model overview
|
| 39 |
+
|
| 40 |
+
OCC-RAG-0.6B is mid-trained from `Qwen/Qwen3-0.6B-Base` via supervised fine-tuning on a synthetic corpus of **~3.25M QA pairs** (~2.78M single-hop, ~262k multi-hop single-context, ~165k multi-hop multi-context, and ~43k abstain examples), distilled from a larger teacher with citation-anchored reasoning traces. Multi-hop and multi-context subsets are oversampled to emphasize compositional reasoning. The prompt/response format is identical at training and inference time, so no train–test mismatch is introduced.
|
| 41 |
+
|
| 42 |
+
## Evaluation
|
| 43 |
+
|
| 44 |
+
Evaluated across multi-hop reasoning (HotpotQA, MuSiQue, TAT-QA), faithfulness (ConFiQA), and refusal (MuSiQue-Un). In-Acc = the gold answer appears as a substring of the prediction; F1 = token-level overlap between prediction and gold answer; M_R = memorization ratio (lower = more faithful); R-Acc = refusal accuracy.
|
| 45 |
+
|
| 46 |
+
| Model | HotpotQA<br>In-Acc | MuSiQue<br>In-Acc | TAT-QA<br>F1 | ConFiQA<br>In-Acc | ConFiQA<br>M_R ↓ | MuSiQue-Un<br>R-Acc |
|
| 47 |
+
|---|---|---|---|---|---|---|
|
| 48 |
+
| gemma-3-4b-it | 55.8 | 30.1 | 65.3 | 69.8 | 8.9 | 55.8 |
|
| 49 |
+
| Qwen3-1.7B (think) | 60.9 | 30.7 | 74.8 | 70.4 | 8.3 | 82.8 |
|
| 50 |
+
| Qwen3-4B (think) | 67.1 | 41.5 | 79.1 | 74.1 | 7.5 | 84.0 |
|
| 51 |
+
| Pleias-RAG-1.2B | 48.5 | 15.0 | 8.4 | 37.3 | 25.3 | 21.9 |
|
| 52 |
+
| **OCC-RAG-0.6B** | **57.6** | **36.6** | **75.0** | **79.9** | **5.2** | **86.9** |
|
| 53 |
+
|
| 54 |
+
OCC-RAG-0.6B exceeds Gemma-3-4B and SmolLM-3-3B on every dimension and attains the strongest faithfulness (highest ConFiQA In-Acc, lowest M_R) among all evaluated models.
|
| 55 |
+
|
| 56 |
+
## Input / output format
|
| 57 |
+
|
| 58 |
+
OCC-RAG uses a **structured prompt format with special tokens**. The question is wrapped in `<|query_start|> … <|query_end|>` and each source in `<|source_start|><|source_id|>N … <|source_end|>`.
|
| 59 |
+
|
| 60 |
+
The response is split into five sections, each delimited by special tokens:
|
| 61 |
+
|
| 62 |
+
| Section | Tokens | Content |
|
| 63 |
+
|---|---|---|
|
| 64 |
+
| Query analysis | `<\|query_analysis_start\|> … <\|query_analysis_end\|>` | Decomposes the question into what must be found. |
|
| 65 |
+
| Source analysis | `<\|source_analysis_start\|> … <\|source_analysis_end\|>` | Assesses each source's relevance, citing by `<\|source_id\|>N`. |
|
| 66 |
+
| Reasoning | `<\|reasoning_start\|> … <\|reasoning_end\|>` | Composes evidence across sources into a multi-hop chain. |
|
| 67 |
+
| Status | `<\|status_start\|> … <\|status_end\|>` | `ANSWERABLE` / `UNANSWERABLE` verdict. |
|
| 68 |
+
| Answer | `<\|answer_start\|> … <\|answer_end\|>` | The final answer span, or the refusal phrase. |
|
| 69 |
+
|
| 70 |
+
## Quickstart (Transformers)
|
| 71 |
+
|
| 72 |
+
The chat template accepts a `documents=` kwarg and emits the structural tokens for the query and sources automatically — pass the user message as plain text and the sources as a list of dicts.
|
| 73 |
+
|
| 74 |
+
```python
|
| 75 |
+
import re
|
| 76 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 77 |
+
|
| 78 |
+
MODEL = "occ-ai/OCC-RAG-0.6B"
|
| 79 |
+
|
| 80 |
+
tokenizer = AutoTokenizer.from_pretrained(MODEL)
|
| 81 |
+
model = AutoModelForCausalLM.from_pretrained(MODEL, torch_dtype="auto", device_map="auto")
|
| 82 |
+
|
| 83 |
+
question = "Which country is the inventor of the telephone, Alexander Graham Bell, buried in?"
|
| 84 |
+
documents = [
|
| 85 |
+
{"text": "Alexander Graham Bell was a Scottish-born inventor best known for patenting the first practical telephone."},
|
| 86 |
+
{"text": "Bell died on August 2, 1922, at his estate Beinn Bhreagh, near Baddeck, Nova Scotia, and was buried there."},
|
| 87 |
+
{"text": "Nova Scotia is a province on the east coast of Canada."},
|
| 88 |
+
]
|
| 89 |
+
|
| 90 |
+
text = tokenizer.apply_chat_template(
|
| 91 |
+
[{"role": "user", "content": question}],
|
| 92 |
+
documents=documents,
|
| 93 |
+
tokenize=False,
|
| 94 |
+
add_generation_prompt=True,
|
| 95 |
+
enable_thinking=False,
|
| 96 |
+
)
|
| 97 |
+
|
| 98 |
+
# Alternative: assemble the structural tokens yourself.
|
| 99 |
+
#
|
| 100 |
+
# query_start, query_end = "<|query_start|>", "<|query_end|>"
|
| 101 |
+
# source_start, source_end, source_id = "<|source_start|>", "<|source_end|>", "<|source_id|>"
|
| 102 |
+
#
|
| 103 |
+
# def build_user_content(question, sources):
|
| 104 |
+
# content = f"{query_start}{question}{query_end}\n"
|
| 105 |
+
# for i, s in enumerate(sources, start=1):
|
| 106 |
+
# content += f"{source_start}{source_id}{i} {s}{source_end}\n"
|
| 107 |
+
# return content
|
| 108 |
+
#
|
| 109 |
+
# messages = [{"role": "user", "content": build_user_content(question, [d["text"] for d in documents])}]
|
| 110 |
+
# text = tokenizer.apply_chat_template(
|
| 111 |
+
# messages, tokenize=False, add_generation_prompt=True, enable_thinking=False
|
| 112 |
+
# )
|
| 113 |
+
|
| 114 |
+
inputs = tokenizer([text], return_tensors="pt").to(model.device)
|
| 115 |
+
outputs = model.generate(**inputs, max_new_tokens=2048)
|
| 116 |
+
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=False)
|
| 117 |
+
print(response)
|
| 118 |
+
|
| 119 |
+
m = re.findall(r"<\|answer_start\|>(.*?)(?:<\|answer_end\|>|\Z)", response, re.DOTALL)
|
| 120 |
+
print("Answer:", m[-1].strip() if m else "") # -> Canada
|
| 121 |
+
```
|
| 122 |
+
|
| 123 |
+
> [!NOTE]
|
| 124 |
+
> We recommend greedy decoding (`do_sample=False`), which is the training/evaluation default and is baked into `generation_config.json`. Qwen3's default sampling parameters ([best practices](https://huggingface.co/Qwen/Qwen3-0.6B#best-practices)) also work fine.
|
| 125 |
+
|
| 126 |
+
## Deployment
|
| 127 |
+
|
| 128 |
+
OCC-RAG-0.6B is a standard Qwen3 causal LM and is compatible with vLLM, SGLang, and other Transformers-based serving stacks. With only 0.6B parameters, it can be readily deployed in constrained infrastructure, including desktop systems running on CPU RAM. When serving, keep `skip_special_tokens=False` if you need to parse the structural tokens out of the raw output.
|
| 129 |
+
|
| 130 |
+
When using an OpenAI-compatible server (vLLM ≥0.6, SGLang ≥0.4.7), the `documents=` kwarg is reachable from the client via `chat_template_kwargs`:
|
| 131 |
+
|
| 132 |
+
```python
|
| 133 |
+
client.chat.completions.create(
|
| 134 |
+
model="occ-ai/OCC-RAG-0.6B",
|
| 135 |
+
messages=[{"role": "user", "content": question}],
|
| 136 |
+
extra_body={"chat_template_kwargs": {"documents": documents}},
|
| 137 |
+
)
|
| 138 |
+
```
|
| 139 |
+
|
| 140 |
+
## Limitations
|
| 141 |
+
|
| 142 |
+
- **Context-grounded only.** The model is trained to answer from the supplied sources and to ignore parametric knowledge. It is not a general-purpose chat or knowledge model.
|
| 143 |
+
- **Reasoning depth.** Training and evaluation are capped at three-hop reasoning; longer chains are out of distribution.
|
| 144 |
+
|
| 145 |
+
## Citation
|
| 146 |
+
|
| 147 |
+
If you find our work helpful, feel free to give us a cite.
|
| 148 |
+
|
| 149 |
+
```bibtex
|
| 150 |
+
@misc{savkin2026occragoptimalcognitivecore,
|
| 151 |
+
title = {OCC-RAG: Optimal Cognitive Core for Faithful Question Answering},
|
| 152 |
+
author = {Maksim Savkin and Mikhail Goncharov and Alexander Gambashidze and Alla Chepurova and Dmitrii Tarasov and Nikita Andriianov and Daria Pugacheva and Vasily Konovalov and Andrey Galichin and Ivan Oseledets},
|
| 153 |
+
year = {2026},
|
| 154 |
+
eprint = {2606.00683},
|
| 155 |
+
archivePrefix = {arXiv},
|
| 156 |
+
primaryClass = {cs.CL},
|
| 157 |
+
url = {https://arxiv.org/abs/2606.00683}
|
| 158 |
+
}
|
| 159 |
+
```
|
chat_template.jinja
ADDED
|
@@ -0,0 +1,20 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{%- for message in messages -%}
|
| 2 |
+
{%- if message['role'] == 'system' -%}
|
| 3 |
+
{{ '<|im_start|>system\n' + message['content'] + '<|im_end|>\n' }}
|
| 4 |
+
{%- elif message['role'] == 'user' -%}
|
| 5 |
+
{%- if documents and loop.last -%}
|
| 6 |
+
{{ '<|im_start|>user\n<|query_start|>' + message['content'] + '<|query_end|>\n' }}
|
| 7 |
+
{%- for doc in documents -%}
|
| 8 |
+
{{ '<|source_start|><|source_id|>' + (loop.index | string) + ' ' + doc['text'] + '<|source_end|>\n' }}
|
| 9 |
+
{%- endfor -%}
|
| 10 |
+
{{ '<|im_end|>\n' }}
|
| 11 |
+
{%- else -%}
|
| 12 |
+
{{ '<|im_start|>user\n' + message['content'] + '<|im_end|>\n' }}
|
| 13 |
+
{%- endif -%}
|
| 14 |
+
{%- elif message['role'] == 'assistant' -%}
|
| 15 |
+
{{ '<|im_start|>assistant\n<think>\n\n</think>\n\n' + message['content'] + '<|im_end|>\n' }}
|
| 16 |
+
{%- endif -%}
|
| 17 |
+
{%- endfor -%}
|
| 18 |
+
{%- if add_generation_prompt -%}
|
| 19 |
+
{{ '<|im_start|>assistant\n<think>\n\n</think>\n\n<|query_analysis_start|>\n' }}
|
| 20 |
+
{%- endif -%}
|
config.json
ADDED
|
@@ -0,0 +1,63 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"architectures": [
|
| 3 |
+
"Qwen3ForCausalLM"
|
| 4 |
+
],
|
| 5 |
+
"attention_bias": false,
|
| 6 |
+
"attention_dropout": 0.0,
|
| 7 |
+
"bos_token_id": null,
|
| 8 |
+
"dtype": "bfloat16",
|
| 9 |
+
"eos_token_id": 151643,
|
| 10 |
+
"head_dim": 128,
|
| 11 |
+
"hidden_act": "silu",
|
| 12 |
+
"hidden_size": 1024,
|
| 13 |
+
"initializer_range": 0.02,
|
| 14 |
+
"intermediate_size": 3072,
|
| 15 |
+
"layer_types": [
|
| 16 |
+
"full_attention",
|
| 17 |
+
"full_attention",
|
| 18 |
+
"full_attention",
|
| 19 |
+
"full_attention",
|
| 20 |
+
"full_attention",
|
| 21 |
+
"full_attention",
|
| 22 |
+
"full_attention",
|
| 23 |
+
"full_attention",
|
| 24 |
+
"full_attention",
|
| 25 |
+
"full_attention",
|
| 26 |
+
"full_attention",
|
| 27 |
+
"full_attention",
|
| 28 |
+
"full_attention",
|
| 29 |
+
"full_attention",
|
| 30 |
+
"full_attention",
|
| 31 |
+
"full_attention",
|
| 32 |
+
"full_attention",
|
| 33 |
+
"full_attention",
|
| 34 |
+
"full_attention",
|
| 35 |
+
"full_attention",
|
| 36 |
+
"full_attention",
|
| 37 |
+
"full_attention",
|
| 38 |
+
"full_attention",
|
| 39 |
+
"full_attention",
|
| 40 |
+
"full_attention",
|
| 41 |
+
"full_attention",
|
| 42 |
+
"full_attention",
|
| 43 |
+
"full_attention"
|
| 44 |
+
],
|
| 45 |
+
"max_position_embeddings": 32768,
|
| 46 |
+
"max_window_layers": 28,
|
| 47 |
+
"model_type": "qwen3",
|
| 48 |
+
"num_attention_heads": 16,
|
| 49 |
+
"num_hidden_layers": 28,
|
| 50 |
+
"num_key_value_heads": 8,
|
| 51 |
+
"pad_token_id": 151643,
|
| 52 |
+
"rms_norm_eps": 1e-06,
|
| 53 |
+
"rope_parameters": {
|
| 54 |
+
"rope_theta": 1000000,
|
| 55 |
+
"rope_type": "default"
|
| 56 |
+
},
|
| 57 |
+
"sliding_window": null,
|
| 58 |
+
"tie_word_embeddings": true,
|
| 59 |
+
"transformers_version": "5.5.4",
|
| 60 |
+
"use_cache": true,
|
| 61 |
+
"use_sliding_window": false,
|
| 62 |
+
"vocab_size": 151936
|
| 63 |
+
}
|
figures/github-mark.png
ADDED
|
figures/occ.png
ADDED
|
generation_config.json
ADDED
|
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"do_sample": false,
|
| 3 |
+
"temperature": 0.0,
|
| 4 |
+
"eos_token_id": [
|
| 5 |
+
151643,
|
| 6 |
+
151645,
|
| 7 |
+
151683
|
| 8 |
+
],
|
| 9 |
+
"max_new_tokens": 2048,
|
| 10 |
+
"pad_token_id": 151643,
|
| 11 |
+
"transformers_version": "5.5.4"
|
| 12 |
+
}
|
model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:f8f1d583afd08756cc40273d9c63d63580000852e47aa64d535bc77c872533ee
|
| 3 |
+
size 1192135096
|
tokenizer.json
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:672e331460a05e2ea9888810a7a37f0c775429fe05fddc6330ee0dc9147a1370
|
| 3 |
+
size 11425566
|
tokenizer_config.json
ADDED
|
@@ -0,0 +1,44 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"add_prefix_space": false,
|
| 3 |
+
"backend": "tokenizers",
|
| 4 |
+
"bos_token": null,
|
| 5 |
+
"clean_up_tokenization_spaces": false,
|
| 6 |
+
"eos_token": "<|endoftext|>",
|
| 7 |
+
"errors": "replace",
|
| 8 |
+
"extra_special_tokens": [
|
| 9 |
+
"<|im_start|>",
|
| 10 |
+
"<|im_end|>",
|
| 11 |
+
"<|object_ref_start|>",
|
| 12 |
+
"<|object_ref_end|>",
|
| 13 |
+
"<|box_start|>",
|
| 14 |
+
"<|box_end|>",
|
| 15 |
+
"<|quad_start|>",
|
| 16 |
+
"<|quad_end|>",
|
| 17 |
+
"<|vision_start|>",
|
| 18 |
+
"<|vision_end|>",
|
| 19 |
+
"<|vision_pad|>",
|
| 20 |
+
"<|image_pad|>",
|
| 21 |
+
"<|video_pad|>",
|
| 22 |
+
"<|query_start|>",
|
| 23 |
+
"<|query_end|>",
|
| 24 |
+
"<|source_start|>",
|
| 25 |
+
"<|source_end|>",
|
| 26 |
+
"<|source_id|>",
|
| 27 |
+
"<|query_analysis_start|>",
|
| 28 |
+
"<|query_analysis_end|>",
|
| 29 |
+
"<|source_analysis_start|>",
|
| 30 |
+
"<|source_analysis_end|>",
|
| 31 |
+
"<|reasoning_start|>",
|
| 32 |
+
"<|reasoning_end|>",
|
| 33 |
+
"<|status_start|>",
|
| 34 |
+
"<|status_end|>",
|
| 35 |
+
"<|answer_start|>",
|
| 36 |
+
"<|answer_end|>"
|
| 37 |
+
],
|
| 38 |
+
"is_local": false,
|
| 39 |
+
"model_max_length": 131072,
|
| 40 |
+
"pad_token": "<|endoftext|>",
|
| 41 |
+
"split_special_tokens": false,
|
| 42 |
+
"tokenizer_class": "Qwen2Tokenizer",
|
| 43 |
+
"unk_token": null
|
| 44 |
+
}
|