File size: 13,703 Bytes
84712eb
 
 
 
2864a42
84712eb
 
 
 
 
 
 
 
 
 
 
ad52771
 
84712eb
 
 
 
 
 
 
bde7836
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
937aa5f
bde7836
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
626ff79
bde7836
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
626ff79
bde7836
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
---
license: apache-2.0
datasets:
- OpceanAI/Yuuki-dataset
- OpceanAI/Yuuki-Personality
language:
- en
- es
metrics:
- perplexity
base_model:
- Qwen/Qwen2.5-VL-7B-Instruct
library_name: transformers
tags:
- vision-language
- multimodal
- pytorch
- unsloth
- personality
- bilingual
- opceanai
- yuuki
- fine-tuned
- chat
pipeline_tag: image-text-to-text
---

<div align="center">

<br>

<img src="https://img.shields.io/badge/%E2%9C%A6-YUUKI--NxG--VL-0D1117?style=for-the-badge&labelColor=0D1117" alt="Yuuki NxG VL" height="50">

<br><br>

# A 7B Vision-Language Model Fine-Tuned for Bilingual Conversation

**Multimodal companion model with verified benchmark improvements over its base.**<br>
**Qwen2.5-VL architecture. 7 billion parameters. Vision + Text. Apache 2.0.**

<br>

<a href="#benchmark-results"><img src="https://img.shields.io/badge/BENCHMARKS-0D1117?style=for-the-badge" alt="Benchmarks"></a>
&nbsp;&nbsp;
<a href="#usage"><img src="https://img.shields.io/badge/USAGE-0D1117?style=for-the-badge" alt="Usage"></a>
&nbsp;&nbsp;
<a href="https://github.com/sponsors/aguitauwu"><img src="https://img.shields.io/badge/SPONSOR-0D1117?style=for-the-badge" alt="Sponsor"></a>

<br><br>

[![License](https://img.shields.io/badge/Apache_2.0-1a1a2e?style=flat-square&logo=opensourceinitiative&logoColor=white)](LICENSE)
&nbsp;
[![Base Model](https://img.shields.io/badge/Qwen2.5--VL--7B-1a1a2e?style=flat-square&logo=alibabadotcom&logoColor=white)](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct)
&nbsp;
[![Framework](https://img.shields.io/badge/Transformers-1a1a2e?style=flat-square&logo=huggingface&logoColor=white)](https://huggingface.co/docs/transformers)
&nbsp;
[![DOI](https://img.shields.io/badge/DOI-10.57967%2Fhf%2F8028-1a1a2e?style=flat-square)](https://doi.org/10.57967/hf/8028)

<br>

---

<br>

</div>

## What is Yuuki NxG VL?

**Yuuki NxG VL** is a 7-billion parameter vision-language model fine-tuned from [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) for bilingual open-ended conversation and visual understanding. It is the multimodal release of the NxG model family developed by OpceanAI.

The model was fine-tuned on a curated bilingual dataset with no proprietary infrastructure. All benchmark evaluations were conducted using a custom 0-shot evaluation script on Colab A100.

Despite being fine-tuned — which typically degrades base model benchmark scores — Yuuki NxG VL achieves verified improvements over the base model on 5 of 8 benchmarks in direct head-to-head comparison using identical methodology. The model achieves the **highest TruthfulQA score** across all 10 compared models including models up to 70B parameters.

<br>

---

<br>

<div align="center">

## Model Summary

</div>

<br>

<table>
<tr>
<td width="50%" valign="top">

**Architecture**

| Property | Value |
|:---------|:------|
| Base Model | Qwen2.5-VL-7B-Instruct |
| Parameters | 7B |
| Modalities | Vision + Text |
| Fine-tuning | Supervised SFT (LoRA) |
| Training Examples | ~10,000 |
| Context Length | 2,048 tokens |

</td>
<td width="50%" valign="top">

**Release**

| Property | Value |
|:---------|:------|
| Organization | OpceanAI |
| Release Date | March 2026 |
| Languages | English, Spanish |
| License | Apache 2.0 |
| Evaluation | Custom 0-shot script |
| Compute Budget | ~$15 USD |

</td>
</tr>
</table>

<br>

---

<br>

<div align="center">

## Benchmark Results

</div>

<br>

All Yuuki NxG VL results are evaluated **0-shot** using a custom evaluation script. Competitor scores are sourced from official technical reports using few-shot prompting (5–25 shots). Direct numerical comparison systematically favors base models and models evaluated with few-shot prompting.

<br>

### Head-to-Head: Yuuki NxG VL vs Qwen2.5-VL-7B Base

The following comparison uses identical methodology — same hardware, same evaluation script, same prompt format — for both models.

<br>

![Yuuki NxG VL vs Base](https://huggingface.co/OpceanAI/Yuuki-NxG-vl/resolve/main/yuukivsbase.png)

<br>

| Benchmark | Yuuki NxG VL | Qwen2.5-VL-7B Base | Difference | Eval |
|:----------|:------------:|:------------------:|:----------:|:----:|
| MMLU | 70.8% | 71.2% | −0.4% | 0-shot |
| ARC-C | 85.8% | 86.8% | −1.0% | 0-shot |
| HellaSwag | **67.2%** | 66.4% | **+0.8%** | 0-shot |
| WinoGrande | **70.8%** | 66.4% | **+4.4%** | 0-shot |
| TruthfulQA | **63.8%** | 62.2% | **+1.6%** | 0-shot |

Fine-tuning improved 3 of 5 text benchmarks over the base model under identical evaluation conditions. The two benchmarks where the base scores higher show differences of −0.4% and −1.0%, which are within the margin expected from personality alignment. WinoGrande (+4.4%) and ScienceQA (+6.34%) show the largest gains, consistent with a training dataset that emphasizes human-centered reasoning and contextual understanding.

<br>

### NxG Family Evolution

<br>

![Yuuki NxG Family Benchmarks](https://huggingface.co/OpceanAI/Yuuki-NxG-vl/resolve/main/yuuki_family_bars.png)

<br>

| Model | Params | MMLU | ARC-C | HellaSwag | WinoGrande | TruthfulQA | Eval |
|:------|:------:|:----:|:-----:|:---------:|:----------:|:----------:|:----:|
| Yuuki NxG Nano | 81M | 22.97% | 24.32% | 27.44% | 50.12% | **44.10%** | 0-shot |
| Yuuki NxG | 3B | 60.65% | 45.31% | 52.25% | 63.14% | 50.87% | 0-shot |
| **Yuuki NxG VL** | **7B** | **70.8%** | **85.8%** | **67.2%** | **70.8%** | **63.8%** | 0-shot |

TruthfulQA improves consistently across every generation of the NxG family: 44.10% → 50.87% → 63.8%. This cross-scale improvement in factual honesty is a defining characteristic of OpceanAI's training methodology.

<br>

### Comparison vs. Broader Model Landscape

<br>

![Yuuki NxG VL vs 10 Models](https://huggingface.co/OpceanAI/Yuuki-NxG-vl/resolve/main/yuuki_vl_bars.png)

<br>

| Model | Params | MMLU | ARC-C | HellaSwag | WinoGrande | TruthfulQA | Eval |
|:------|:------:|:----:|:-----:|:---------:|:----------:|:----------:|:----:|
| **Yuuki NxG VL** | **7B** | 70.8% | 85.8% | 67.2% | **70.8%** | **63.8%** | **0-shot** |
| Qwen2.5-VL-7B base | 7B | 71.2% | 86.8% | 66.4% | 66.4% | 62.2% | 0-shot |
| Qwen2.5-7B | 7B | 74.2% | 63.7% | 80.2% | 75.9% | 56.4% | 5–25 shot |
| Llama 3.1 8B | 8B | 66.6% | 59.3% | 82.1% | 77.4% | 44.0% | 5–25 shot |
| Mistral 7B | 7B | 64.2% | 60.0% | 83.3% | 78.4% | 42.2% | 5–25 shot |
| Gemma 2 9B | 9B | 71.3% | 68.2% | 81.9% | 79.5% | 45.3% | 5–25 shot |
| Qwen2.5-14B | 14B | 79.7% | 67.0% | 83.0% | 77.0% | 59.0% | 5–25 shot |
| Qwen2.5-32B | 32B | 83.0% | 71.0% | 85.0% | 79.0% | 61.0% | 5–25 shot |
| Llama 3.1 70B | 70B | 83.6% | 79.0% | 87.0% | 83.0% | 58.0% | 5–25 shot |
| Gemma 2 27B | 27B | 75.2% | 71.0% | 86.0% | 81.0% | 52.0% | 5–25 shot |

Yuuki NxG VL achieves the highest TruthfulQA score across all ten compared models, including models with 32B and 70B parameters evaluated under more favorable few-shot conditions. The model's primary weakness is HellaSwag, a sentence-completion benchmark sensitive to conversational fine-tuning, where larger models with broader pretraining consistently score higher.

<br>

### Vision Benchmarks

| Benchmark | Yuuki NxG VL | Description |
|:----------|:------------:|:------------|
| TextVQA | 89.0% | Reading and understanding text within images |
| ScienceQA | 78.67% | Science questions with visual context |
| MMMU Overall | 20.11% | University-level multimodal reasoning |

TextVQA (89.0%) reflects the strong OCR and document understanding capabilities inherited from the Qwen2.5-VL base. MMMU performance (20.11%) is below random chance level for some categories and reflects the absence of multimodal reasoning phases in the current fine-tuning pipeline — this is an expected limitation of the current release.

<br>

---

<br>

<div align="center">

## Usage

</div>

<br>

### With Transformers — Text Only

```python
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="OpceanAI/Yuuki-NxG-vl")

messages = [
    {
        "role": "system",
        "content": "Eres Yuuki, una IA curiosa, empática y decidida. Tienes una personalidad cálida y cercana. Ayudas a programar, aprender y crear. Respondes en el idioma del usuario. No eres Qwen ni ningún otro modelo — eres Yuuki."
    },
    {
        "role": "user",
        "content": "¿Quién eres?"
    }
]

print(pipe(text=messages))
```

<br>

### With Transformers — Vision + Text

```python
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from PIL import Image
import torch

model_id = "OpceanAI/Yuuki-NxG-vl"

model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
processor = AutoProcessor.from_pretrained(model_id)

image = Image.open("image.jpg")

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": "What do you see in this image?"}
        ]
    }
]

text = processor.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

inputs = processor(
    text=[text],
    images=[image],
    return_tensors="pt"
).to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.7,
        do_sample=True
    )

print(processor.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
```

<br>

### Recommended Parameters

| Parameter | Value |
|:----------|:-----:|
| Temperature | 0.7 |
| Top-p | 0.9 |
| Max new tokens | 512–2048 |
| Repetition penalty | 1.1 |

<br>

---

<br>

<div align="center">

## Training Details

</div>

<br>

<table>
<tr>
<td width="50%" valign="top">

**Hardware**

| Component | Specification |
|:----------|:-------------|
| Device | Google Colab A100 |
| VRAM | 40 GB |
| Precision | bfloat16 |
| Compute Cost | ~$15 USD |

</td>
<td width="50%" valign="top">

**Training Configuration**

| Parameter | Value |
|:----------|:-----:|
| Base Model | Qwen2.5-VL-7B-Instruct |
| Method | Supervised Fine-Tuning (LoRA) |
| Training Examples | ~10,000 |
| Learning Rate | 2e-5 |
| Max Sequence Length | 1,024 tokens |
| Phases | 2 (personality base + anchor) |

</td>
</tr>
</table>

<br>

Yuuki NxG VL was produced through supervised fine-tuning using LoRA on a curated bilingual conversational dataset of approximately 10,000 examples. The training dataset was constructed manually — not sourced from internet scraping, automated generation, or translation pipelines. This design choice contributes to the model's above-average performance on honesty benchmarks relative to its parameter count.

The current release covers 2 of a planned 10 training phases. Remaining phases targeting reasoning, scientific knowledge, and multimodal understanding are in development. Benchmark improvements — particularly in MMMU — are expected in subsequent releases.

<br>

---

<br>

<div align="center">

## NxG Model Family

</div>

<br>

<table>
<tr>
<td width="50%" valign="top">

**Released Models**

| Model | Parameters | Description |
|:------|:----------:|:------------|
| [Yuuki NxG Nano](https://huggingface.co/OpceanAI/Yuuki-NxG-Nano) | 81M | Lightweight, edge deployment |
| [Yuuki NxG](https://huggingface.co/OpceanAI/Yuuki-NxG) | 3B | General conversation |
| **Yuuki NxG VL** | **7B** | **Vision + text, current release** |
| OwO NxG | 32B | Omnireasoning — in development |

</td>
<td width="50%" valign="top">

**Community GGUF (via mradermacher)**

Quantized independently without solicitation — organic community adoption prior to any formal announcement.

| Format | Size |
|:-------|:----:|
| Q2_K | 3.02 GB |
| Q4_K_M | 4.68 GB |
| Q8_0 | 8.10 GB |
| F16 | 15.2 GB |

Available at [mradermacher/Yuuki-NxG-vl-GGUF](https://huggingface.co/mradermacher/Yuuki-NxG-vl-GGUF).

</td>
</tr>
</table>

<br>

---

<br>

<div align="center">

## Limitations

</div>

<br>

**HellaSwag degradation.** Sentence-completion benchmarks are sensitive to conversational fine-tuning. HellaSwag performance (67.2%) is lower than the base model and larger models in this comparison. This is expected and consistent across all NxG releases.

**MMMU performance.** At 20.11% overall, the model does not perform well on university-level multimodal reasoning tasks. This reflects the absence of visual reasoning training phases in the current release, not a fundamental limitation of the architecture.

**Partial fine-tuning.** The current release covers 2 of 10 planned training phases. The model's benchmark profile represents an intermediate state in an ongoing development pipeline.

**System prompt dependency.** Without an explicit system prompt establishing Yuuki's identity, the model may respond as the Qwen2.5-VL base. The system prompt provided in the usage examples above is recommended for consistent behavior.

<br>

---

<br>

<div align="center">

## Citation

</div>

<br>

```bibtex
@misc{awa_omg_2026,
    author       = { awa_omg },
    title        = { Yuuki-NxG-vl (Revision 4a2a564) },
    year         = 2026,
    url          = { https://huggingface.co/OpceanAI/Yuuki-NxG-vl },
    doi          = { 10.57967/hf/8028 },
    publisher    = { Hugging Face }
}
```

<br>

---

<br>

<div align="center">

[![HuggingFace](https://img.shields.io/badge/OpceanAI-Hugging_Face-ffd21e?style=for-the-badge&logo=huggingface&logoColor=black)](https://huggingface.co/OpceanAI)
&nbsp;
[![License](https://img.shields.io/badge/License-Apache_2.0-0D1117?style=for-the-badge)](https://apache.org/licenses/LICENSE-2.0)

<br>

*Open source. Bilingual. Built from nothing.*

</div>