File size: 8,624 Bytes

---
license: apache-2.0
language:
  - en
tags:
  - causal-lm
  - text-generation
  - transformer
  - custom-code
  - kv-cache
  - pytorch
pipeline_tag: text-generation
library_name: transformers

---

# summerMC/summerV2

`summerMC/summerV2` is an experimental causal language model based on a custom `VanFastForCausalLM` architecture.

This model was developed by a first-year vocational school student in Japan, age 18, as an independent research and engineering project.

The project focuses on building and testing a custom fast causal language model with:

- custom Hugging Face-compatible model code
- KV-cache enabled autoregressive inference
- streaming decode support
- anti-repetition sampling utilities
- NaN/Inf guarded logits handling
- local `modeling_van_fast.py` loading support

The model is primarily intended for research and experimentation, not production deployment.

---

## Model Details

| Item | Value |
|---|---|
| Model name | `summerMC/summerV2` |
| Architecture | `VanFastForCausalLM` |
| Task | Causal language modeling |
| Framework | PyTorch / Hugging Face Transformers |
| Inference style | Autoregressive text generation |
| Cache support | KV-cache enabled |
| Primary language | English |
| Developer | First-year vocational school student, age 18 |
| Status | Experimental |

---

## Developer Note

This model was developed by an 18-year-old first-year vocational school student as part of an independent AI research project.

The goal is to explore practical custom language-model architecture design, Hugging Face compatibility, fast inference, and KV-cache decoding. The project is experimental, but it is designed to be reproducible and inspectable for other researchers, students, and engineers.

---

## Intended Use

This model is intended for:

- language-model architecture research
- custom Transformer inference experiments
- KV-cache decoding tests
- sampling strategy experiments
- small-to-mid scale causal LM prototyping
- comparison against GPT-style baselines
- student-led AI research demonstrations

This model is not intended for:

- safety-critical use
- medical, legal, or financial advice
- autonomous decision-making
- deployment without additional evaluation
- factual answering without retrieval or verification

---

## Installation

```bash
pip install -U torch transformers accelerate safetensors
```

For GPU inference, install a CUDA-compatible PyTorch build.

---

## Basic Usage

```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "summerMC/summerV2"

device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.float32

tokenizer = AutoTokenizer.from_pretrained(
    model_id,
    trust_remote_code=True,
)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype=dtype,
)

model.to(device)
model.eval()

if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

prompt = "Explain Transformer models in simple terms.\n\nAnswer:"

inputs = tokenizer(
    prompt,
    return_tensors="pt",
    add_special_tokens=False,
).to(device)

with torch.inference_mode():
    outputs = model.generate(
        **inputs,
        max_new_tokens=120,
        do_sample=True,
        temperature=0.85,
        top_k=80,
        top_p=0.92,
        repetition_penalty=1.25,
        pad_token_id=tokenizer.pad_token_id,
        eos_token_id=tokenizer.eos_token_id,
    )

text = tokenizer.decode(
    outputs[0],
    skip_special_tokens=True,
    clean_up_tokenization_spaces=False,
)

print(text)
```

---

## Direct Local Import Inference

If remote-code loading causes cache or import issues, the model can be loaded by directly importing `modeling_van_fast.py`.

```python
import os
import sys
import json
import importlib.util
import torch
from transformers import AutoTokenizer

HF_OUT_DIR = "/content/van_fast_transformer/hf_compatible"
MODELING_PATH = os.path.join(HF_OUT_DIR, "modeling_van_fast.py")
CONFIG_PATH = os.path.join(HF_OUT_DIR, "config.json")

DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
DTYPE = torch.float32

module_name = "modeling_van_fast_runtime"

if module_name in sys.modules:
    del sys.modules[module_name]

spec = importlib.util.spec_from_file_location(module_name, MODELING_PATH)
mod = importlib.util.module_from_spec(spec)
sys.modules[module_name] = mod
spec.loader.exec_module(mod)

VanFastConfig = mod.VanFastConfig
VanFastForCausalLM = mod.VanFastForCausalLM

with open(CONFIG_PATH, "r", encoding="utf-8") as f:
    cfg_json = json.load(f)

cfg_json["use_cache"] = True
cfg_json["tie_word_embeddings"] = False

config = VanFastConfig(**cfg_json)
config.use_cache = True

tokenizer = AutoTokenizer.from_pretrained(
    HF_OUT_DIR,
    use_fast=True,
    trust_remote_code=True,
)

if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

model = VanFastForCausalLM.from_pretrained(
    HF_OUT_DIR,
    config=config,
    torch_dtype=DTYPE,
)

model.to(DEVICE)
model.eval()
```

---

## KV-cache Test

```python
import torch

@torch.inference_mode()
def test_kv_cache(prompt="Hello world"):
    input_ids = tokenizer(
        prompt,
        return_tensors="pt",
        add_special_tokens=False,
    ).input_ids.to(model.device)

    out = model(
        input_ids=input_ids,
        use_cache=True,
        return_dict=True,
    )

    print("input shape:", tuple(input_ids.shape))
    print("logits:", tuple(out.logits.shape))
    print("past_key_values is None:", out.past_key_values is None)

    if out.past_key_values is None:
        raise RuntimeError("KV cache is inactive.")

    print("layers:", len(out.past_key_values))

    k0, v0 = out.past_key_values[0]
    print("layer0 k:", tuple(k0.shape))
    print("layer0 v:", tuple(v0.shape))

    next_id = torch.argmax(out.logits[:, -1, :], dim=-1, keepdim=True)

    out2 = model(
        input_ids=next_id,
        past_key_values=out.past_key_values,
        use_cache=True,
        return_dict=True,
    )

    k1, v1 = out2.past_key_values[0]
    print("after decode layer0 k:", tuple(k1.shape))
    print("after decode layer0 v:", tuple(v1.shape))
    print("KV cache OK")

test_kv_cache()
```

---

## Recommended Sampling Settings

The following settings were used during local KV-cache inference testing:

```python
max_new_tokens = 160
temperature = 0.85
top_k = 80
top_p = 0.92
repetition_penalty = 1.35
no_repeat_ngram_size = 3
```

For more stable output, try:

```python
temperature = 0.7
top_k = 50
top_p = 0.9
repetition_penalty = 1.4
```

For more diverse output, try:

```python
temperature = 1.0
top_k = 100
top_p = 0.95
repetition_penalty = 1.2
```

---

## Example Prompt

```text
Explain Transformer models in simple terms.

Answer:
```

---

## Current Limitations

This is an experimental model. Output quality may include:

- repetition
- grammatical instability
- factual hallucination
- incomplete reasoning
- degraded long-form coherence
- unstable behavior with very high temperature
- weak instruction following compared with instruction-tuned models

The model should be evaluated carefully before any downstream use.

---

## Safety Notice

This model may generate incorrect, biased, unsafe, or misleading content.

Do not use it as the sole source of truth for high-stakes decisions.

Recommended mitigations:

- use retrieval for factual tasks
- apply output filtering
- evaluate on task-specific benchmarks
- use human review for sensitive outputs
- avoid deployment without safety tuning

---

## Research Notes

`summerV2` is part of an experimental model-development line focused on fast training and inference for custom causal language models.

The current implementation emphasizes:

- Hugging Face compatibility
- direct model-code import fallback
- KV-cache streaming decode
- custom sampling controls
- inference stability checks

Future work may include:

- better pretraining data mixture
- instruction tuning
- DPO or preference optimization
- stronger tokenizer/model alignment
- long-context stability improvements
- benchmark reporting
- model card expansion with training details

---

## Citation

If you use this model in experiments, cite the repository:

```bibtex
@misc{summerV2,
  title        = {summerMC/summerV2},
  author       = {summerMC},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/summerMC/summerV2}}
}
```

---

## Disclaimer

This repository contains an experimental research model.

No warranty is provided regarding factuality, safety, performance, or fitness for a particular use case.