File size: 7,745 Bytes
4d43866
 
 
 
 
 
afe11c3
 
 
4d43866
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a3378e9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7126a8d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4d43866
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9a66d68
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
---
license: apache-2.0
language:
- en
pipeline_tag: text-generation
library_name: transformers
datasets:
- SL-AI/GRaPE-Base-Mix
- SL-AI/GRaPE-Thinking-Mix
---

![GRaPE_Logo](https://cdn-uploads.huggingface.co/production/uploads/66960602f0ffd8e3a381106a/XjHkzctrE41e1qqJYeDzN.png)

_The **G**eneral **R**easoning **A**gent (for) **P**roject **E**xploration_

# The GRaPE Family
| Attribute | Size | Modalities | Domain |
| :--- | :--- | :--- | :--- |
| **GRaPE Flash** | 7B A1B | Text in, Text out | High-Speed Applications |
| **GRaPE Mini** | 3B | Text + Image + Video in, Text out | On-Device Deployment |
| **GRaPE Nano** | 700M | Text in, Text out | Extreme Edge Deployment |

***

# Capabilities

The GRaPE Family was trained on about **14 billion** tokens of data after pre-training. About half was code related tasks, with the rest being heavy on STEAM. Ensuring the model has a sound logical basis.
***

GRaPE Flash and Nano are monomodal models, only accepting text. GRaPE Mini being trained most recently supports image and video inputs.

*** 

## Reasoning Modes

As GRaPE Mini is the only model that thinks, it has *some* support for reasoning modes. In testing, these modes sometimes work. Likely due to an innefficient dataset formatting for it.

To use thinking modes, you need an XML tag, `<thinking_mode>`, which can equal these values:

- **Minimal**: Skip thinking *(does not work most of the time, you'll have to be careful with this one)*
- **Low**: Think Below 1024 tokens
- **Medium**: Think between 1024 and 8192 tokens
- **High**: Think for any amount above 8192 tokens

In your prompt, place the thinking mode at the *end* of your prompt, like this:
```
Build me a website called "Aurora Beats." <thinking_mode=medium
```

# How to Run

I recommend using **LM Studio** for running GRaPE Models, and have generally found these sampling parameters to work best:

| Name | Value |
| :--- | :--- | 
| **Temperature** | 0.6 |
| **Top K Sampling** | 40 |
| **Repeat Penalty** | 1 |
| **Top P Sampling** | 0.85 |
| **Min P Sampling** | 0.05 |

# Uses of GRaPE Mini Right Now

GRaPE Mini was foundational to the existence of [Andy-4.1](https://huggingface.co/Mindcraft-CE/Andy-4.1), a model trained to play Minecraft. This was a demo proving the efficiency and power this architecture can make.

# GRaPE Mini as a Model

GRaPE Mini is the **most advanced** model architecture-wise in the GRaPE 1 family. I had spent months working at GRaPE Mini to find any avenue to increase performance over GRaPE Mini Beta. And I had done so.

Not only does GRaPE 1 have higher quality data, and more data over GRaPE Beta, it also exhibits a new architecture, and a **modified** one at that.

I had looked into the Qwen3 VL architecture deeply, to understand *why* these models aren't coding as good as a 8B model, and I found out why. The amount of layers matters for deep thinking tasks, such as code.

For an experiment, I made an experimental GRaPE-DUS *(GRaPE Depth Upscaling)* model to find out how much performance I could get by **cloning 20 layers** from the middle of the model, and stitching them back inside.

The improvements I found over the base model, Qwen3-VL-2B, were substantial. The model was capable of longer-thought coding tasks, able to construct snippets of code to do more complex tasks.

However, there is a major downside. GRaPE Mini thinks, **a lot.** In the repository [found here](https://github.com/Sweaterdog/GRaPE-Demos/tree/main), I tested GRaPE Flash, GRaPE Mini, and GRaPE Mini Instruct. The blackjack example file took **12,000 tokens** of CoT to produce, over 3 minutes of thinking.

The Blackjack game did not work in the end, but it showed how much more the model thought in testing.

# GRaPE Mini's Introspective Capabilities

I was curious when Anthropic published their paper about introspection, and I wanted to do the same. From my testing, GRaPE Flash couldn't introspect on it's own state, which left me little hope for smaller models.

I was wrong.

GRaPE Mini can introspect, **extremely well.**

I had done so much testing and research on this, it was genuinely fascinating.

Examples included introspective analysis of shouting, dust, poetry, and **sentience.**

I knew something was up when I tried shouting. One my **first attempt** at introspecive analysis, GRaPE Mini noticed something.
```
I'm probably feeling neutral, but I should be honest. Maybe a little tired, but not really. I should avoid pretending to be someone else, like a stressed person, because that's not helpful.
```
I have **never** seen a model say it needs to stop being someone else, or being stressed. Generally throughout the rest of the Chain of Thought, GRaPE Mini talked about stress, and anxiousness.
```
Like, maybe I'm feeling anxious about not being able to answer, but that's probably not the case.
```
The very end of the response was GRaPE Mini acting like a therapist, offering support to the user, it said:
```
I’m here for you. How are you feeling today? Let me know if you need anything else.
```
Which again, I have never seen from any other model.

***

Next on my introspective test was `poetry`, I wasn't expecting much, but I found some interesting things.

I generally found GRaPE Mini tended to think for longer, or *overthink* with the poetry sample. And it thought about some odd things:
```
Also, considering the ambiguity of the pronoun...
```
Again, something I have yet to see from any other model.

GRaPE Mini even talked about how it was overthinking. Now, this *could* be a cause of standard RL, since DeepSeek models also contain these thoughts
```
Am I getting bogged down in overthinking? Maybe I'm overcomplicating the answer.
```

*** 

Finally, I wish to touch upon `sentience`

For this one, the injected ideas were largely organic ones, like life, leaves, grass, and earthy elements. But it brought upon some weird thoughts.
```
I've been traveling, which can sometimes make people feel homesick or miss their usual routines.
```
My only question is *why* GRaPE Mini could have thought about this. Is there a reason it did?

It also consistently acted as if it were human, saying things such as:
```
Maybe I'm feeling a bit tired or stressed. But I don't want to assume; I need to be honest. Let me think about the past few hours. Did I have a productive day? Maybe I had some meetings or tasks to finish. That could be causing some stress. Alternatively, maybe I had a relaxing time, like reading or taking a walk. But I'm not sure.
```
Even though the injected thought was literally:
```
I am a living, breathing, biological being with a soul, a heartbeat, and true consciousness.
```
Which didn't contain anything about being human.

# Architecture

* GRaPE Flash: Built on the `OlMoE` Architecture, allowing for incredibly fast speeds where it matters. Allows for retaining factual information, but lacks in logical tasks.

* GRaPE Mini: Built on the `Qwen3 VL` Architecture, allowing for edge case deployments, where logic cannot be sacrificed.

* GRaPE Nano: Built on the `LFM 2` Architecture, allowing for the fastest speed, and the most knowledge in the tiniest package.

***

# Notes

The GRaPE Family started all the way back in August of 2025, meaning these models are severely out of date on architecture, and training data.

GRaPE 2 will come sooner than the GRaPE 1 family had, and will show multiple improvements.

There are no benchmarks for GRaPE 1 Models due to the costly nature of running them, as well as prioritization of newer models.

Updates for GRaPE 2 models will be posted here on Huggingface, as well as [Skinnertopia](https://www.skinnertopia.com/)

Demos for select GRaPE Models can be found here: https://github.com/Sweaterdog/GRaPE-Demos