Crystalcareai commited on
Commit
feba8db
·
verified ·
1 Parent(s): 0492aea

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +65 -17
README.md CHANGED
@@ -1,39 +1,87 @@
1
-
2
  ---
3
  language:
4
  - en
5
  license: apache-2.0
6
  library_name: transformers
7
  base_model:
8
- - mistralai/Mistral-Nemo-Base-2407
9
- - Qwen/Qwen3-235B-A22B
 
 
 
 
 
 
10
  ---
11
 
12
  ![Homunculus Logo](https://huggingface.co/arcee-ai/Homunculus/resolve/main/logo.jpg)
13
 
14
- ## Overview
15
 
16
- Arcee Homunculus is a 12B parameter model developed by Arcee.ai, based on the Mistral Nemo architecture.
 
17
 
18
- It was produced by distilling of Qwen3 235B logits onto Mistral Nemo after tokenizer replacement.
19
 
20
- Like Qwen3, it features both thinking and non-thinking modes. As suggested by the name, it is a weird little guy produced through alchemy.
21
 
22
- Homunculus is a surprisingly powerful model for its size, showing strong performance in real-world applications despite being able to fit on consumer GPUs.
 
 
 
 
 
23
 
 
24
 
25
- ## Basic use
26
 
27
- ```
 
28
 
29
- # Use a pipeline as a high-level helper
 
 
 
 
 
 
30
 
31
- from transformers import pipeline
32
-
33
- pipe = pipeline("text-generation", model="arcee-ai/Homunculus")
34
- messages = \[
35
- {"role": "user", "content": "Who are you?"},
36
  ]
37
- pipe(messages)
 
 
 
 
 
38
 
 
 
 
 
 
 
 
 
 
 
 
39
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  language:
3
  - en
4
  license: apache-2.0
5
  library_name: transformers
6
  base_model:
7
+ - mistralai/Mistral-Nemo-Base-2407 # lightweight student
8
+ - Qwen/Qwen3-235B-A22B # thinking + non-thinking teacher
9
+ tags:
10
+ - distillation
11
+ - /think
12
+ - /nothink
13
+ - reasoning-transfer
14
+ - arcee-ai
15
  ---
16
 
17
  ![Homunculus Logo](https://huggingface.co/arcee-ai/Homunculus/resolve/main/logo.jpg)
18
 
19
+ # Arcee **Homunculus-12B**
20
 
21
+ **Homunculus** is a 12 billion-parameter instruction model distilled from **Qwen3-235B** onto the **Mistral-Nemo** backbone.
22
+ It was purpose-built to preserve Qwen’s two-mode interaction style—`/think` (deliberate chain-of-thought) and `/nothink` (concise answers)—while running on a single consumer GPU.
23
 
24
+ ---
25
 
26
+ ## What’s special?
27
 
28
+ | Feature | Detail |
29
+ | --------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- |
30
+ | **Reasoning-trace transfer** | Instead of copying just final probabilities, we align *full* logit trajectories, yielding more faithful reasoning. |
31
+ | **Total-Variation-Distance loss** | To better match the teacher’s confidence distribution and smooth the loss landscape. |
32
+ | **Tokenizer replacement** | The original Mistral tokenizer was swapped for Qwen3's tokenizer. |
33
+ | **Dual interaction modes** | Use `/think` when you want transparent step-by-step reasoning (good for analysis & debugging). Use `/nothink` for terse, production-ready answers. Most reliable in the system role field. | |
34
 
35
+ ---
36
 
37
+ ## 🔧 Quick Start
38
 
39
+ ```python
40
+ from transformers import AutoTokenizer, AutoModelForCausalLM
41
 
42
+ model_id = "arcee-ai/Homunculus"
43
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
44
+ model = AutoModelForCausalLM.from_pretrained(
45
+ model_id,
46
+ torch_dtype="auto",
47
+ device_map="auto"
48
+ )
49
 
50
+ # /think mode - Chain-of-thought reasoning
51
+ messages = [
52
+ {"role": "system", "content": "You are a helpful assistant. /think"},
53
+ {"role": "user", "content": "Why is the sky blue?"},
 
54
  ]
55
+ output = model.generate(
56
+ tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt"),
57
+ max_new_tokens=512,
58
+ temperature=0.7
59
+ )
60
+ print(tokenizer.decode(output[0], skip_special_tokens=True))
61
 
62
+ # /nothink mode - Direct answers
63
+ messages = [
64
+ {"role": "system", "content": "You are a helpful assistant. /nothink"},
65
+ {"role": "user", "content": "Summarize the plot of Hamlet in two sentences."},
66
+ ]
67
+ output = model.generate(
68
+ tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt"),
69
+ max_new_tokens=128,
70
+ temperature=0.7
71
+ )
72
+ print(tokenizer.decode(output[0], skip_special_tokens=True))
73
  ```
74
+
75
+ ## 💡 Intended Use & Limitations
76
+
77
+ Homunculus is designed for:
78
+
79
+ * **Research** on reasoning-trace distillation, Logit Imitation, and mode-switchable assistants.
80
+ * **Lightweight production** deployments that need strong reasoning at <12 GB VRAM.
81
+
82
+ ### Known limitations
83
+
84
+ * May inherit biases from the Qwen3 teacher and internet-scale pretraining data.
85
+ * Long-context (>32 k tokens) use is experimental—expect latency & memory overhead.
86
+
87
+ ---