File size: 8,624 Bytes
b6a7330
 
 
 
 
 
 
 
 
 
 
 
 
9c296b2
b6a7330
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
---
license: apache-2.0
language:
  - en
tags:
  - causal-lm
  - text-generation
  - transformer
  - custom-code
  - kv-cache
  - pytorch
pipeline_tag: text-generation
library_name: transformers

---

# summerMC/summerV2

`summerMC/summerV2` is an experimental causal language model based on a custom `VanFastForCausalLM` architecture.

This model was developed by a first-year vocational school student in Japan, age 18, as an independent research and engineering project.

The project focuses on building and testing a custom fast causal language model with:

- custom Hugging Face-compatible model code
- KV-cache enabled autoregressive inference
- streaming decode support
- anti-repetition sampling utilities
- NaN/Inf guarded logits handling
- local `modeling_van_fast.py` loading support

The model is primarily intended for research and experimentation, not production deployment.

---

## Model Details

| Item | Value |
|---|---|
| Model name | `summerMC/summerV2` |
| Architecture | `VanFastForCausalLM` |
| Task | Causal language modeling |
| Framework | PyTorch / Hugging Face Transformers |
| Inference style | Autoregressive text generation |
| Cache support | KV-cache enabled |
| Primary language | English |
| Developer | First-year vocational school student, age 18 |
| Status | Experimental |

---

## Developer Note

This model was developed by an 18-year-old first-year vocational school student as part of an independent AI research project.

The goal is to explore practical custom language-model architecture design, Hugging Face compatibility, fast inference, and KV-cache decoding. The project is experimental, but it is designed to be reproducible and inspectable for other researchers, students, and engineers.

---

## Intended Use

This model is intended for:

- language-model architecture research
- custom Transformer inference experiments
- KV-cache decoding tests
- sampling strategy experiments
- small-to-mid scale causal LM prototyping
- comparison against GPT-style baselines
- student-led AI research demonstrations

This model is not intended for:

- safety-critical use
- medical, legal, or financial advice
- autonomous decision-making
- deployment without additional evaluation
- factual answering without retrieval or verification

---

## Installation

```bash
pip install -U torch transformers accelerate safetensors
```

For GPU inference, install a CUDA-compatible PyTorch build.

---

## Basic Usage

```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "summerMC/summerV2"

device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.float32

tokenizer = AutoTokenizer.from_pretrained(
    model_id,
    trust_remote_code=True,
)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype=dtype,
)

model.to(device)
model.eval()

if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

prompt = "Explain Transformer models in simple terms.\n\nAnswer:"

inputs = tokenizer(
    prompt,
    return_tensors="pt",
    add_special_tokens=False,
).to(device)

with torch.inference_mode():
    outputs = model.generate(
        **inputs,
        max_new_tokens=120,
        do_sample=True,
        temperature=0.85,
        top_k=80,
        top_p=0.92,
        repetition_penalty=1.25,
        pad_token_id=tokenizer.pad_token_id,
        eos_token_id=tokenizer.eos_token_id,
    )

text = tokenizer.decode(
    outputs[0],
    skip_special_tokens=True,
    clean_up_tokenization_spaces=False,
)

print(text)
```

---

## Direct Local Import Inference

If remote-code loading causes cache or import issues, the model can be loaded by directly importing `modeling_van_fast.py`.

```python
import os
import sys
import json
import importlib.util
import torch
from transformers import AutoTokenizer

HF_OUT_DIR = "/content/van_fast_transformer/hf_compatible"
MODELING_PATH = os.path.join(HF_OUT_DIR, "modeling_van_fast.py")
CONFIG_PATH = os.path.join(HF_OUT_DIR, "config.json")

DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
DTYPE = torch.float32

module_name = "modeling_van_fast_runtime"

if module_name in sys.modules:
    del sys.modules[module_name]

spec = importlib.util.spec_from_file_location(module_name, MODELING_PATH)
mod = importlib.util.module_from_spec(spec)
sys.modules[module_name] = mod
spec.loader.exec_module(mod)

VanFastConfig = mod.VanFastConfig
VanFastForCausalLM = mod.VanFastForCausalLM

with open(CONFIG_PATH, "r", encoding="utf-8") as f:
    cfg_json = json.load(f)

cfg_json["use_cache"] = True
cfg_json["tie_word_embeddings"] = False

config = VanFastConfig(**cfg_json)
config.use_cache = True

tokenizer = AutoTokenizer.from_pretrained(
    HF_OUT_DIR,
    use_fast=True,
    trust_remote_code=True,
)

if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

model = VanFastForCausalLM.from_pretrained(
    HF_OUT_DIR,
    config=config,
    torch_dtype=DTYPE,
)

model.to(DEVICE)
model.eval()
```

---

## KV-cache Test

```python
import torch

@torch.inference_mode()
def test_kv_cache(prompt="Hello world"):
    input_ids = tokenizer(
        prompt,
        return_tensors="pt",
        add_special_tokens=False,
    ).input_ids.to(model.device)

    out = model(
        input_ids=input_ids,
        use_cache=True,
        return_dict=True,
    )

    print("input shape:", tuple(input_ids.shape))
    print("logits:", tuple(out.logits.shape))
    print("past_key_values is None:", out.past_key_values is None)

    if out.past_key_values is None:
        raise RuntimeError("KV cache is inactive.")

    print("layers:", len(out.past_key_values))

    k0, v0 = out.past_key_values[0]
    print("layer0 k:", tuple(k0.shape))
    print("layer0 v:", tuple(v0.shape))

    next_id = torch.argmax(out.logits[:, -1, :], dim=-1, keepdim=True)

    out2 = model(
        input_ids=next_id,
        past_key_values=out.past_key_values,
        use_cache=True,
        return_dict=True,
    )

    k1, v1 = out2.past_key_values[0]
    print("after decode layer0 k:", tuple(k1.shape))
    print("after decode layer0 v:", tuple(v1.shape))
    print("KV cache OK")

test_kv_cache()
```

---

## Recommended Sampling Settings

The following settings were used during local KV-cache inference testing:

```python
max_new_tokens = 160
temperature = 0.85
top_k = 80
top_p = 0.92
repetition_penalty = 1.35
no_repeat_ngram_size = 3
```

For more stable output, try:

```python
temperature = 0.7
top_k = 50
top_p = 0.9
repetition_penalty = 1.4
```

For more diverse output, try:

```python
temperature = 1.0
top_k = 100
top_p = 0.95
repetition_penalty = 1.2
```

---

## Example Prompt

```text
Explain Transformer models in simple terms.

Answer:
```

---

## Current Limitations

This is an experimental model. Output quality may include:

- repetition
- grammatical instability
- factual hallucination
- incomplete reasoning
- degraded long-form coherence
- unstable behavior with very high temperature
- weak instruction following compared with instruction-tuned models

The model should be evaluated carefully before any downstream use.

---

## Safety Notice

This model may generate incorrect, biased, unsafe, or misleading content.

Do not use it as the sole source of truth for high-stakes decisions.

Recommended mitigations:

- use retrieval for factual tasks
- apply output filtering
- evaluate on task-specific benchmarks
- use human review for sensitive outputs
- avoid deployment without safety tuning

---

## Research Notes

`summerV2` is part of an experimental model-development line focused on fast training and inference for custom causal language models.

The current implementation emphasizes:

- Hugging Face compatibility
- direct model-code import fallback
- KV-cache streaming decode
- custom sampling controls
- inference stability checks

Future work may include:

- better pretraining data mixture
- instruction tuning
- DPO or preference optimization
- stronger tokenizer/model alignment
- long-context stability improvements
- benchmark reporting
- model card expansion with training details

---

## Citation

If you use this model in experiments, cite the repository:

```bibtex
@misc{summerV2,
  title        = {summerMC/summerV2},
  author       = {summerMC},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/summerMC/summerV2}}
}
```

---

## Disclaimer

This repository contains an experimental research model.

No warranty is provided regarding factuality, safety, performance, or fitness for a particular use case.