baaderso36 commited on Apr 28

Commit

0bd2e10

verified ·

1 Parent(s): 59302f5

Upload folder using huggingface_hub

Browse files

Files changed (23) hide show

.gitattributes +1 -0
README.md +208 -0
chat_template.jinja +154 -0
config.json +131 -0
model-00001-of-00015.safetensors +3 -0
model-00002-of-00015.safetensors +3 -0
model-00003-of-00015.safetensors +3 -0
model-00004-of-00015.safetensors +3 -0
model-00005-of-00015.safetensors +3 -0
model-00006-of-00015.safetensors +3 -0
model-00007-of-00015.safetensors +3 -0
model-00008-of-00015.safetensors +3 -0
model-00009-of-00015.safetensors +3 -0
model-00010-of-00015.safetensors +3 -0
model-00011-of-00015.safetensors +3 -0
model-00012-of-00015.safetensors +3 -0
model-00013-of-00015.safetensors +3 -0
model-00014-of-00015.safetensors +3 -0
model-00015-of-00015.safetensors +3 -0
model.safetensors.index.json +0 -0
tokenizer.json +3 -0
tokenizer_config.json +34 -0
vision_tower.safetensors +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,208 @@

+---
+license: apache-2.0
+language:
+  - en
+  - de
+base_model: andrzejmontano/Qwen3.5-122B-A10B-Vision-MLX-Mixed-4bit
+tags:
+  - chimera
+  - qwen3.5
+  - moe
+  - lora
+  - fine-tuned
+  - mlx
+  - apple-silicon
+  - coding
+  - function-calling
+  - reasoning
+  - vision
+pipeline_tag: text-generation
+library_name: mlx
+model-index:
+  - name: Chimera-122B
+    results:
+      - task:
+          type: text-generation
+          name: Code Generation
+        dataset:
+          name: HumanEval
+          type: openai/openai_humaneval
+        metrics:
+          - name: pass@1
+            type: pass@1
+            value: 95.7
+            verified: true
+---
+# 🐉 Chimera-122B
+**A 122B-parameter MoE model fine-tuned entirely on Apple Silicon (M5 Max 128GB) through 3 sequential LoRA training rounds — Reasoning, Coding, and Function Calling.**
+Chimera-122B achieves **95.7% on HumanEval** (up from 86% base), **10/10 on Function Calling**, and **zero repetition loops** — all trained locally on a single Mac in ~6 hours.
+---
+## Benchmark Results
+| Metric | Chimera-122B | Base (Qwen3.5-122B) | Improvement |
+|---|---|---|---|
+| **HumanEval pass@1** | **95.7%** (157/164) | 86.0% (141/164) | **+9.7%** |
+| **FC/Tool Calling** | **100%** (10/10) | — | — |
+| **Repetition** | **0 loops** (5/5 clean) | — | — |
+| **MMLU (20-question)** | **95%** (19/20) | — | — |
+### HumanEval Error Breakdown
+| Problem | Error | Root Cause |
+|---|---|---|
+| #38, #50 | NameError: encode_* not defined | Test harness issue — helper function not included in prompt |
+| #39, #129 | SyntaxError: unterminated string | Thinking tokens leaked into code output |
+| #132, #145, #163 | AssertionError | Logic errors on edge cases |
+**Adjusted score (excluding test harness issues): 159/164 = 97.0%**
+---
+## Architecture
+- **Base Model:** [Qwen3.5-122B-A10B-Vision-MLX-Mixed-4bit](https://huggingface.co/andrzejmontano/Qwen3.5-122B-A10B-Vision-MLX-Mixed-4bit)
+- **Type:** Mixture-of-Experts (MoE) — 122B total / 10B active parameters
+- **Quantization:** Mixed 4-bit (experts compressed, attention + vision tower at full precision)
+- **Context Window:** 262,144 tokens
+- **Vision:** Preserved (full-precision vision tower from base model)
+- **Thinking:** Native `<think>` reasoning traces supported
+---
+## Training
+### Sequential 3-Round LoRA Fine-Tuning
+All training performed on a single **Apple M5 Max (128GB unified memory)** using `mlx-lm lora`. Each round resumes from the best checkpoint of the previous round with decreasing learning rate.
+| Round | Focus | Dataset | Samples | LR | Iters | Best Val Loss |
+|---|---|---|---|---|---|---|
+| **1** | Reasoning | [TeichAI/lordx64-claude-opus-4.7-max-cleaned](https://huggingface.co/datasets/TeichAI/lordx64-claude-opus-4.7-max-cleaned) | 4,313 | 1e-5 | 400 | **0.920** |
+| **2** | Coding | [AlicanKiraz0/Agentic-CoT-Coding-SFT-v1.1](https://huggingface.co/datasets/AlicanKiraz0/Agentic-Chain-of-Thought-Coding-SFT-Dataset-v1.1) | 3,318 | 5e-6 | 200 | **0.585** |
+| **3** | Function Calling | [zake7749/Qwen-3.6-plus-agent-tool-calling-trajectory](https://huggingface.co/datasets/zake7749/Qwen-3.6-plus-agent-tool-calling-trajectory) | 3,555 | 2e-6 | 150 | **0.070** |
+**Total: ~11,186 training samples, ~6 hours wall time on M5 Max**
+### Val Loss Journey
+Round 1 (Reasoning):     1.393 → 0.920
+Round 2 (+ Coding):      0.995 → 0.585
+Round 3 (+ FC):          1.873 → 0.070
+### LoRA Configuration
+```yaml
+num_layers: 4
+batch_size: 1
+max_seq_length: 768
+grad_checkpoint: true
+clear_cache_threshold: 0.9
+trainable_parameters: 102.6M / 122,111.5M (0.084%)
+```
+### Sequential Resume Strategy
+Round 1 → Best checkpoint at Iter 275 (Val 0.920)
+Round 2 → Resumes from Round 1 best, new best at Iter 125 (Val 0.585)
+Round 3 → Resumes from Round 2 best, new best at Iter 125 (Val 0.070)
+Final model fused from Round 3 best checkpoint
+### Hardware
+| | |
+|---|---|
+| **Device** | Apple M5 Max, 128GB unified memory |
+| **Peak Memory** | 111.96 GB during training |
+| **Training Framework** | [mlx-lm](https://github.com/ml-explore/mlx-examples) (Apple MLX) |
+| **Serving** | [vMLX](https://github.com/AugmentCode/vmlx) (OpenAI-compatible) |
+| **Model Size on Disk** | ~72 GB (15 safetensor shards) |
+---
+## Usage
+### With mlx-lm
+```python
+from mlx_lm import load, generate
+model, tokenizer = load("baaderso36/Chimera-122B")
+response = generate(
+    model, tokenizer,
+    prompt="Write a Python function to merge two sorted lists.",
+    max_tokens=2048,
+    temp=0.6,
+    top_p=0.95,
+)
+```
+### With vMLX (OpenAI-compatible server)
+```bash
+vmlx serve baaderso36/Chimera-122B --host 127.0.0.1 --port 11434
+```
+```python
+import httpx
+r = httpx.post("http://127.0.0.1:11434/v1/chat/completions", json={
+    "model": "Chimera-122B",
+    "messages": [{"role": "user", "content": "Debug this Python traceback..."}],
+    "max_tokens": 4096,
+    "temperature": 0.6,
+    "top_p": 0.95,
+})
+```
+---
+## What Makes Chimera Different
+**Sequential skill stacking without catastrophic forgetting.** Each training round builds on the previous with decreasing learning rate:
+1. **Round 1 (1e-5):** Learns Claude-style structured reasoning from Opus 4.7 traces
+2. **Round 2 (5e-6):** Adds agentic coding with chain-of-thought from real GitHub data
+3. **Round 3 (2e-6):** Adds multi-turn tool calling with reasoning from Qwen 3.6+ trajectories
+The result is a model that thinks before it acts, writes working code, and knows when to use tools — trained on a desktop Mac in an afternoon.
+---
+## Intended Use
+Chimera-122B is designed as a **local development assistant** for:
+- Code generation and debugging with step-by-step reasoning
+- Function calling and tool use in agentic workflows
+- Document generation (PDF, DOCX, XLSX, PPTX via Python)
+- Technical Q&A with structured thinking
+## Limitations
+- Mixed 4-bit quantized — some precision loss vs full-precision weights
+- Training limited to 768 token sequences due to Metal GPU memory constraints
+- 72GB model size requires high-memory Apple Silicon (M4 Pro 48GB minimum)
+- HumanEval tested with pass@1 only (greedy/low-temp, no pass@10)
+- Vision capability preserved but not yet benchmarked
+---
+## Citation
+```bibtex
+@misc{chimera122b2026,
+  title={Chimera-122B: Sequential LoRA Fine-Tuning of Qwen3.5-122B-A10B on Apple Silicon},
+  author={baaderso36},
+  year={2026},
+  howpublished={\url{https://huggingface.co/baaderso36/Chimera-122B}},
+}
+```
+## Acknowledgments
+- **Base Model:** [andrzejmontano](https://huggingface.co/andrzejmontano) for the surgical mixed-4bit quantization preserving the vision tower
+- **Datasets:** [TeichAI](https://huggingface.co/TeichAI), [AlicanKiraz0](https://huggingface.co/AlicanKiraz0), [zake7749](https://huggingface.co/zake7749) for high-quality open training data
+- **Framework:** Apple MLX team for making local LLM training on Apple Silicon possible
+- **Serving:** [AugmentCode](https://github.com/AugmentCode/vmlx) for the vMLX inference server

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,154 @@

+{%- set image_count = namespace(value=0) %}
+{%- set video_count = namespace(value=0) %}
+{%- macro render_content(content, do_vision_count, is_system_content=false) %}
+    {%- if content is string %}
+        {{- content }}
+    {%- elif content is iterable and content is not mapping %}
+        {%- for item in content %}
+            {%- if 'image' in item or 'image_url' in item or item.type == 'image' %}
+                {%- if is_system_content %}
+                    {{- raise_exception('System message cannot contain images.') }}
+                {%- endif %}
+                {%- if do_vision_count %}
+                    {%- set image_count.value = image_count.value + 1 %}
+                {%- endif %}
+                {%- if add_vision_id %}
+                    {{- 'Picture ' ~ image_count.value ~ ': ' }}
+                {%- endif %}
+                {{- '<|vision_start|><|image_pad|><|vision_end|>' }}
+            {%- elif 'video' in item or item.type == 'video' %}
+                {%- if is_system_content %}
+                    {{- raise_exception('System message cannot contain videos.') }}
+                {%- endif %}
+                {%- if do_vision_count %}
+                    {%- set video_count.value = video_count.value + 1 %}
+                {%- endif %}
+                {%- if add_vision_id %}
+                    {{- 'Video ' ~ video_count.value ~ ': ' }}
+                {%- endif %}
+                {{- '<|vision_start|><|video_pad|><|vision_end|>' }}
+            {%- elif 'text' in item %}
+                {{- item.text }}
+            {%- else %}
+                {{- raise_exception('Unexpected item type in content.') }}
+            {%- endif %}
+        {%- endfor %}
+    {%- elif content is none or content is undefined %}
+        {{- '' }}
+    {%- else %}
+        {{- raise_exception('Unexpected content type.') }}
+    {%- endif %}
+{%- endmacro %}
+{%- if not messages %}
+    {{- raise_exception('No messages provided.') }}
+{%- endif %}
+{%- if tools and tools is iterable and tools is not mapping %}
+    {{- '<|im_start|>system\n' }}
+    {{- "# Tools\n\nYou have access to the following functions:\n\n<tools>" }}
+    {%- for tool in tools %}
+        {{- "\n" }}
+        {{- tool | tojson }}
+    {%- endfor %}
+    {{- "\n</tools>" }}
+    {{- '\n\nIf you choose to call a function ONLY reply in the following format with NO suffix:\n\n<tool_call>\n<function=example_function_name>\n<parameter=example_parameter_1>\nvalue_1\n</parameter>\n<parameter=example_parameter_2>\nThis is the value for the second parameter\nthat can span\nmultiple lines\n</parameter>\n</function>\n</tool_call>\n\n<IMPORTANT>\nReminder:\n- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags\n- Required parameters MUST be specified\n- You may provide optional reasoning for your function call in natural language BEFORE the function call, but NOT after\n- If there is no function call available, answer the question like normal with your current knowledge and do not tell the user about function calls\n</IMPORTANT>' }}
+    {%- if messages[0].role == 'system' %}
+        {%- set content = render_content(messages[0].content, false, true)|trim %}
+        {%- if content %}
+            {{- '\n\n' + content }}
+        {%- endif %}
+    {%- endif %}
+    {{- '<|im_end|>\n' }}
+{%- else %}
+    {%- if messages[0].role == 'system' %}
+        {%- set content = render_content(messages[0].content, false, true)|trim %}
+        {{- '<|im_start|>system\n' + content + '<|im_end|>\n' }}
+    {%- endif %}
+{%- endif %}
+{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
+{%- for message in messages[::-1] %}
+    {%- set index = (messages|length - 1) - loop.index0 %}
+    {%- if ns.multi_step_tool and message.role == "user" %}
+        {%- set content = render_content(message.content, false)|trim %}
+        {%- if not(content.startswith('<tool_response>') and content.endswith('</tool_response>')) %}
+            {%- set ns.multi_step_tool = false %}
+            {%- set ns.last_query_index = index %}
+        {%- endif %}
+    {%- endif %}
+{%- endfor %}
+{%- if ns.multi_step_tool %}
+    {{- raise_exception('No user query found in messages.') }}
+{%- endif %}
+{%- for message in messages %}
+    {%- set content = render_content(message.content, true)|trim %}
+    {%- if message.role == "system" %}
+        {%- if not loop.first %}
+            {{- raise_exception('System message must be at the beginning.') }}
+        {%- endif %}
+    {%- elif message.role == "user" %}
+        {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
+    {%- elif message.role == "assistant" %}
+        {%- set reasoning_content = '' %}
+        {%- if message.reasoning_content is string %}
+            {%- set reasoning_content = message.reasoning_content %}
+        {%- else %}
+            {%- if '</think>' in content %}
+                {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
+                {%- set content = content.split('</think>')[-1].lstrip('\n') %}
+            {%- endif %}
+        {%- endif %}
+        {%- set reasoning_content = reasoning_content|trim %}
+        {%- if loop.index0 > ns.last_query_index %}
+            {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content + '\n</think>\n\n' + content }}
+        {%- else %}
+            {{- '<|im_start|>' + message.role + '\n' + content }}
+        {%- endif %}
+        {%- if message.tool_calls and message.tool_calls is iterable and message.tool_calls is not mapping %}
+            {%- for tool_call in message.tool_calls %}
+                {%- if tool_call.function is defined %}
+                    {%- set tool_call = tool_call.function %}
+                {%- endif %}
+                {%- if loop.first %}
+                    {%- if content|trim %}
+                        {{- '\n\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
+                    {%- else %}
+                        {{- '<tool_call>\n<function=' + tool_call.name + '>\n' }}
+                    {%- endif %}
+                {%- else %}
+                    {{- '\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
+                {%- endif %}
+                {%- if tool_call.arguments is defined %}
+                    {%- for args_name, args_value in tool_call.arguments|items %}
+                        {{- '<parameter=' + args_name + '>\n' }}
+                        {%- set args_value = args_value | tojson | safe if args_value is mapping or (args_value is sequence and args_value is not string) else args_value | string %}
+                        {{- args_value }}
+                        {{- '\n</parameter>\n' }}
+                    {%- endfor %}
+                {%- endif %}
+                {{- '</function>\n</tool_call>' }}
+            {%- endfor %}
+        {%- endif %}
+        {{- '<|im_end|>\n' }}
+    {%- elif message.role == "tool" %}
+        {%- if loop.previtem and loop.previtem.role != "tool" %}
+            {{- '<|im_start|>user' }}
+        {%- endif %}
+        {{- '\n<tool_response>\n' }}
+        {{- content }}
+        {{- '\n</tool_response>' }}
+        {%- if not loop.last and loop.nextitem.role != "tool" %}
+            {{- '<|im_end|>\n' }}
+        {%- elif loop.last %}
+            {{- '<|im_end|>\n' }}
+        {%- endif %}
+    {%- else %}
+        {{- raise_exception('Unexpected message role.') }}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '<|im_start|>assistant\n' }}
+    {%- if enable_thinking is defined and enable_thinking is false %}
+        {{- '<think>\n\n</think>\n\n' }}
+    {%- else %}
+        {{- '<think>\n' }}
+    {%- endif %}
+{%- endif %}

config.json ADDED Viewed

	@@ -0,0 +1,131 @@

+{
+  "architectures": [
+    "Qwen3_5MoeForConditionalGeneration"
+  ],
+  "image_token_id": 248056,
+  "model_type": "qwen3_5_moe",
+  "quantization": {
+    "group_size": 64,
+    "bits": 4
+  },
+  "quantization_config": {
+    "group_size": 64,
+    "bits": 4
+  },
+  "text_config": {
+    "attention_bias": false,
+    "attention_dropout": 0.0,
+    "attn_output_gate": true,
+    "dtype": "bfloat16",
+    "eos_token_id": 248044,
+    "full_attention_interval": 4,
+    "head_dim": 256,
+    "hidden_act": "silu",
+    "hidden_size": 3072,
+    "initializer_range": 0.02,
+    "layer_types": [
+      "linear_attention",
+      "linear_attention",
+      "linear_attention",
+      "full_attention",
+      "linear_attention",
+      "linear_attention",
+      "linear_attention",
+      "full_attention",
+      "linear_attention",
+      "linear_attention",
+      "linear_attention",
+      "full_attention",
+      "linear_attention",
+      "linear_attention",
+      "linear_attention",
+      "full_attention",
+      "linear_attention",
+      "linear_attention",
+      "linear_attention",
+      "full_attention",
+      "linear_attention",
+      "linear_attention",
+      "linear_attention",
+      "full_attention",
+      "linear_attention",
+      "linear_attention",
+      "linear_attention",
+      "full_attention",
+      "linear_attention",
+      "linear_attention",
+      "linear_attention",
+      "full_attention",
+      "linear_attention",
+      "linear_attention",
+      "linear_attention",
+      "full_attention",
+      "linear_attention",
+      "linear_attention",
+      "linear_attention",
+      "full_attention",
+      "linear_attention",
+      "linear_attention",
+      "linear_attention",
+      "full_attention",
+      "linear_attention",
+      "linear_attention",
+      "linear_attention",
+      "full_attention"
+    ],
+    "linear_conv_kernel_dim": 4,
+    "linear_key_head_dim": 128,
+    "linear_num_key_heads": 16,
+    "linear_num_value_heads": 64,
+    "linear_value_head_dim": 128,
+    "max_position_embeddings": 262144,
+    "mlp_only_layers": [],
+    "model_type": "qwen3_5_moe_text",
+    "moe_intermediate_size": 1024,
+    "mtp_num_hidden_layers": 1,
+    "mtp_use_dedicated_embeddings": false,
+    "num_attention_heads": 32,
+    "num_experts": 256,
+    "num_experts_per_tok": 8,
+    "num_hidden_layers": 48,
+    "num_key_value_heads": 2,
+    "rms_norm_eps": 1e-06,
+    "router_aux_loss_coef": 0.001,
+    "shared_expert_intermediate_size": 1024,
+    "use_cache": true,
+    "vocab_size": 248320,
+    "mamba_ssm_dtype": "float32",
+    "rope_parameters": {
+      "mrope_interleaved": true,
+      "mrope_section": [
+        11,
+        11,
+        10
+      ],
+      "rope_theta": 10000000,
+      "partial_rotary_factor": 0.25,
+      "type": "default"
+    }
+  },
+  "tie_word_embeddings": false,
+  "transformers_version": "4.57.0.dev0",
+  "video_token_id": 248057,
+  "vision_end_token_id": 248054,
+  "vision_start_token_id": 248053,
+  "vision_config": {
+    "deepstack_visual_indexes": [],
+    "depth": 27,
+    "hidden_act": "gelu_pytorch_tanh",
+    "hidden_size": 1152,
+    "in_channels": 3,
+    "initializer_range": 0.02,
+    "intermediate_size": 4304,
+    "model_type": "qwen3_5_moe",
+    "num_heads": 16,
+    "num_position_embeddings": 2304,
+    "out_hidden_size": 3072,
+    "patch_size": 16,
+    "spatial_merge_size": 2,
+    "temporal_patch_size": 2
+  }
+}

model-00001-of-00015.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ae5b22da4f512cbfdbb5def157c07b48ac949ab64fd914e0a0ae2f8bfaa6a6f5
+size 5243106254

model-00002-of-00015.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f962ff2c3f9bac1d4d5d8d6afd41f639a98bbe0817043c4724820f6f6e76f5b4
+size 5061940401

model-00003-of-00015.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e85d9317ccf42deb438eed10a9035ef389b44c1c5707f36475f48cfbb6de5c08
+size 5245889444

model-00004-of-00015.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fae2a75543a1d01dfad4dbfd5a9a10bd5dad8afc37f86eec1408c8218ecae673
+size 5061940473

model-00005-of-00015.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6aeeba049a1f9fd2dc44de1c0451149c7f96112c3fd3fcdad83c7d43ec3365ca
+size 5061940484

model-00006-of-00015.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:813ec834396e2a0c3a04546a1c0d970d93507542978c0911facd9945b6132df3
+size 5245889543

model-00007-of-00015.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:64318a79f14e4117ec4e95a3fc911dd307f3a7954317fadd934d1219f1a04d28
+size 5081699877

model-00008-of-00015.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d3dfebf43e542d0da5935182d53dffd00a8ec0b7c6bb07a42e4326ee115be857
+size 5061940482

model-00009-of-00015.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cdb1cd8ffd657c7827194ce4acdb7106d47887ee4cbeeb3bfd4545e815aea3e6
+size 5245889521

model-00010-of-00015.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0a6301e61f910262aba22201bffebabaf8ab6f84e9cdc668e39c9599fc8a3323
+size 5061940490

model-00011-of-00015.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:afc4eee41573320aadfa6e956899835a60f624365a692b4963578595a6c8bb47
+size 5061940468

model-00012-of-00015.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f09a25de5fe84652f59e759a75d88d23a8b80cacc32d15e18836dd8c8b22aeeb
+size 5245889565

model-00013-of-00015.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6ad7173242d20ed2704dd6e0b7413c551e6ea3a07f23ee83022be5af1a3df401
+size 5081699877

model-00014-of-00015.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ba4f8adb19d6d88d31cbd9f426868958791194bf5b1d62fec064671a4500065b
+size 5061940482

model-00015-of-00015.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b80c571258bee4a3b7e83847afcdc9c1644c3fd67f94794fd9434bc6b20b9c57
+size 5050036030

model.safetensors.index.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:87a7830d63fcf43bf241c3c5242e96e62dd3fdc29224ca26fed8ea333db72de4
+size 19989343

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "add_prefix_space": false,
+  "audio_bos_token": "<|audio_start|>",
+  "audio_eos_token": "<|audio_end|>",
+  "audio_token": "<|audio_pad|>",
+  "backend": "tokenizers",
+  "bos_token": null,
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "image_token": "<|image_pad|>",
+  "is_local": true,
+  "local_files_only": false,
+  "model_max_length": 262144,
+  "model_specific_special_tokens": {
+    "audio_bos_token": "<|audio_start|>",
+    "audio_eos_token": "<|audio_end|>",
+    "audio_token": "<|audio_pad|>",
+    "image_token": "<|image_pad|>",
+    "video_token": "<|video_pad|>",
+    "vision_bos_token": "<|vision_start|>",
+    "vision_eos_token": "<|vision_end|>"
+  },
+  "pad_token": "<|endoftext|>",
+  "pretokenize_regex": "(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?[\\p{L}\\p{M}]+|\\p{N}| ?[^\\s\\p{L}\\p{M}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+",
+  "processor_class": "Qwen3VLProcessor",
+  "split_special_tokens": false,
+  "tokenizer_class": "TokenizersBackend",
+  "tool_parser_type": "qwen3_coder",
+  "unk_token": null,
+  "video_token": "<|video_pad|>",
+  "vision_bos_token": "<|vision_start|>",
+  "vision_eos_token": "<|vision_end|>"
+}

vision_tower.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6807d691d2363e44f36e5ef17c78dcb5348d70b1e44e63f1c553ca843f84d76d
+size 902618707