summerV2 / README.md

Update README.md

9c296b2 verified 17 days ago

8.62 kB

	---
	license: apache-2.0
	language:
	- en
	tags:
	- causal-lm
	- text-generation
	- transformer
	- custom-code
	- kv-cache
	- pytorch
	pipeline_tag: text-generation
	library_name: transformers

	---

	# summerMC/summerV2

	`summerMC/summerV2` is an experimental causal language model based on a custom `VanFastForCausalLM` architecture.

	This model was developed by a first-year vocational school student in Japan, age 18, as an independent research and engineering project.

	The project focuses on building and testing a custom fast causal language model with:

	- custom Hugging Face-compatible model code
	- KV-cache enabled autoregressive inference
	- streaming decode support
	- anti-repetition sampling utilities
	- NaN/Inf guarded logits handling
	- local `modeling_van_fast.py` loading support

	The model is primarily intended for research and experimentation, not production deployment.

	---

	## Model Details

	\| Item \| Value \|
	\|---\|---\|
	\| Model name \| `summerMC/summerV2` \|
	\| Architecture \| `VanFastForCausalLM` \|
	\| Task \| Causal language modeling \|
	\| Framework \| PyTorch / Hugging Face Transformers \|
	\| Inference style \| Autoregressive text generation \|
	\| Cache support \| KV-cache enabled \|
	\| Primary language \| English \|
	\| Developer \| First-year vocational school student, age 18 \|
	\| Status \| Experimental \|

	---

	## Developer Note

	This model was developed by an 18-year-old first-year vocational school student as part of an independent AI research project.

	The goal is to explore practical custom language-model architecture design, Hugging Face compatibility, fast inference, and KV-cache decoding. The project is experimental, but it is designed to be reproducible and inspectable for other researchers, students, and engineers.

	---

	## Intended Use

	This model is intended for:

	- language-model architecture research
	- custom Transformer inference experiments
	- KV-cache decoding tests
	- sampling strategy experiments
	- small-to-mid scale causal LM prototyping
	- comparison against GPT-style baselines
	- student-led AI research demonstrations

	This model is not intended for:

	- safety-critical use
	- medical, legal, or financial advice
	- autonomous decision-making
	- deployment without additional evaluation
	- factual answering without retrieval or verification

	---

	## Installation

	```bash
	pip install -U torch transformers accelerate safetensors
	```

	For GPU inference, install a CUDA-compatible PyTorch build.

	---

	## Basic Usage

	```python
	import torch
	from transformers import AutoTokenizer, AutoModelForCausalLM

	model_id = "summerMC/summerV2"

	device = "cuda" if torch.cuda.is_available() else "cpu"
	dtype = torch.float32

	tokenizer = AutoTokenizer.from_pretrained(
	model_id,
	trust_remote_code=True,
	)

	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	trust_remote_code=True,
	torch_dtype=dtype,
	)

	model.to(device)
	model.eval()

	if tokenizer.pad_token is None:
	tokenizer.pad_token = tokenizer.eos_token

	prompt = "Explain Transformer models in simple terms.\n\nAnswer:"

	inputs = tokenizer(
	prompt,
	return_tensors="pt",
	add_special_tokens=False,
	).to(device)

	with torch.inference_mode():
	outputs = model.generate(
	**inputs,
	max_new_tokens=120,
	do_sample=True,
	temperature=0.85,
	top_k=80,
	top_p=0.92,
	repetition_penalty=1.25,
	pad_token_id=tokenizer.pad_token_id,
	eos_token_id=tokenizer.eos_token_id,
	)

	text = tokenizer.decode(
	outputs[0],
	skip_special_tokens=True,
	clean_up_tokenization_spaces=False,
	)

	print(text)
	```

	---

	## Direct Local Import Inference

	If remote-code loading causes cache or import issues, the model can be loaded by directly importing `modeling_van_fast.py`.

	```python
	import os
	import sys
	import json
	import importlib.util
	import torch
	from transformers import AutoTokenizer

	HF_OUT_DIR = "/content/van_fast_transformer/hf_compatible"
	MODELING_PATH = os.path.join(HF_OUT_DIR, "modeling_van_fast.py")
	CONFIG_PATH = os.path.join(HF_OUT_DIR, "config.json")

	DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
	DTYPE = torch.float32

	module_name = "modeling_van_fast_runtime"

	if module_name in sys.modules:
	del sys.modules[module_name]

	spec = importlib.util.spec_from_file_location(module_name, MODELING_PATH)
	mod = importlib.util.module_from_spec(spec)
	sys.modules[module_name] = mod
	spec.loader.exec_module(mod)

	VanFastConfig = mod.VanFastConfig
	VanFastForCausalLM = mod.VanFastForCausalLM

	with open(CONFIG_PATH, "r", encoding="utf-8") as f:
	cfg_json = json.load(f)

	cfg_json["use_cache"] = True
	cfg_json["tie_word_embeddings"] = False

	config = VanFastConfig(**cfg_json)
	config.use_cache = True

	tokenizer = AutoTokenizer.from_pretrained(
	HF_OUT_DIR,
	use_fast=True,
	trust_remote_code=True,
	)

	if tokenizer.pad_token is None:
	tokenizer.pad_token = tokenizer.eos_token

	model = VanFastForCausalLM.from_pretrained(
	HF_OUT_DIR,
	config=config,
	torch_dtype=DTYPE,
	)

	model.to(DEVICE)
	model.eval()
	```

	---

	## KV-cache Test

	```python
	import torch

	@torch.inference_mode()
	def test_kv_cache(prompt="Hello world"):
	input_ids = tokenizer(
	prompt,
	return_tensors="pt",
	add_special_tokens=False,
	).input_ids.to(model.device)

	out = model(
	input_ids=input_ids,
	use_cache=True,
	return_dict=True,
	)

	print("input shape:", tuple(input_ids.shape))
	print("logits:", tuple(out.logits.shape))
	print("past_key_values is None:", out.past_key_values is None)

	if out.past_key_values is None:
	raise RuntimeError("KV cache is inactive.")

	print("layers:", len(out.past_key_values))

	k0, v0 = out.past_key_values[0]
	print("layer0 k:", tuple(k0.shape))
	print("layer0 v:", tuple(v0.shape))

	next_id = torch.argmax(out.logits[:, -1, :], dim=-1, keepdim=True)

	out2 = model(
	input_ids=next_id,
	past_key_values=out.past_key_values,
	use_cache=True,
	return_dict=True,
	)

	k1, v1 = out2.past_key_values[0]
	print("after decode layer0 k:", tuple(k1.shape))
	print("after decode layer0 v:", tuple(v1.shape))
	print("KV cache OK")

	test_kv_cache()
	```

	---

	## Recommended Sampling Settings

	The following settings were used during local KV-cache inference testing:

	```python
	max_new_tokens = 160
	temperature = 0.85
	top_k = 80
	top_p = 0.92
	repetition_penalty = 1.35
	no_repeat_ngram_size = 3
	```

	For more stable output, try:

	```python
	temperature = 0.7
	top_k = 50
	top_p = 0.9
	repetition_penalty = 1.4
	```

	For more diverse output, try:

	```python
	temperature = 1.0
	top_k = 100
	top_p = 0.95
	repetition_penalty = 1.2
	```

	---

	## Example Prompt

	```text
	Explain Transformer models in simple terms.

	Answer:
	```

	---

	## Current Limitations

	This is an experimental model. Output quality may include:

	- repetition
	- grammatical instability
	- factual hallucination
	- incomplete reasoning
	- degraded long-form coherence
	- unstable behavior with very high temperature
	- weak instruction following compared with instruction-tuned models

	The model should be evaluated carefully before any downstream use.

	---

	## Safety Notice

	This model may generate incorrect, biased, unsafe, or misleading content.

	Do not use it as the sole source of truth for high-stakes decisions.

	Recommended mitigations:

	- use retrieval for factual tasks
	- apply output filtering
	- evaluate on task-specific benchmarks
	- use human review for sensitive outputs
	- avoid deployment without safety tuning

	---

	## Research Notes

	`summerV2` is part of an experimental model-development line focused on fast training and inference for custom causal language models.

	The current implementation emphasizes:

	- Hugging Face compatibility
	- direct model-code import fallback
	- KV-cache streaming decode
	- custom sampling controls
	- inference stability checks

	Future work may include:

	- better pretraining data mixture
	- instruction tuning
	- DPO or preference optimization
	- stronger tokenizer/model alignment
	- long-context stability improvements
	- benchmark reporting
	- model card expansion with training details

	---

	## Citation

	If you use this model in experiments, cite the repository:

	```bibtex
	@misc{summerV2,
	title = {summerMC/summerV2},
	author = {summerMC},
	year = {2026},
	publisher = {Hugging Face},
	howpublished = {\url{https://huggingface.co/summerMC/summerV2}}
	}
	```

	---

	## Disclaimer

	This repository contains an experimental research model.

	No warranty is provided regarding factuality, safety, performance, or fitness for a particular use case.