HumanV (Transformers Integration) + Nilla-Story Checkpoint

This repository contains:

HumanV: a lightweight, decoder-only Transformer architecture integrated into the 🤗 Transformers codebase.
Nilla-Story: a small HumanV checkpoint trained for short story generation (TinyStories-style).

Goal: upstream the HumanV architecture into huggingface/transformers so it can be loaded with standard AutoModel* classes (without trust_remote_code=True).

Model: Nilla-Story

Hub: nebularesearchtrain/nilla-story
Tokenizer: GPT-2 tokenizer (gpt2), vocab size 50,257
Context length: 1,024 (trained with sequence length 512)

Quickstart (from the Hub)

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "nebularesearchtrain/nilla-story"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

prompt = "Once upon a time,"
inputs = tokenizer(prompt, return_tensors="pt")

out = model.generate(
    **inputs,
    max_new_tokens=120,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
    repetition_penalty=1.1,
)
print(tokenizer.decode(out[0], skip_special_tokens=True))

If you are using a development version that still requires custom code on the Hub, load with:

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)

Architecture: HumanV

HumanV is a decoder-only Transformer inspired by modern LLaMA-style blocks:

Causal self-attention with Rotary Position Embeddings (RoPE)
RMSNorm
SiLU / SwiGLU-style MLP
Optional grouped-query attention via num_key_value_heads (can be equal to num_attention_heads for standard MHA)

Precision policy (recommended)

For TPU-friendly stability and speed:

BF16 for most matmul operations
FP32 for numerically sensitive steps (attention softmax + attention mask add, RMSNorm, logits/loss)

Training (Nilla-Story)

Dataset: TinyStories (subset)
Sequence length: 512
Precision: BF16 (with FP32 softmax/norm/loss as described above)
Hardware: Google TPU v5e-1

Example generation (sample)

Prompts like:

Once upon a time,
The little bird wanted to

produce short story continuations suitable for toy storytelling tasks.

Contributing / Upstreaming to Transformers

This repository is prepared for an upstream PR to 🤗 Transformers. A typical PR includes:

src/transformers/models/humanv/ implementation (configuration_*.py, modeling_*.py)
Auto-class registration (so AutoModelForCausalLM works)
Unit tests in tests/models/humanv/
Documentation page: docs/source/en/model_doc/humanv.md

Transformers recommends a modular approach for new model contributions, and CI may validate generated files when using modular modeling.

Limitations

This is a small model trained on a limited dataset. It may repeat phrases, hallucinate details, or generate simplistic stories.
Not intended for safety-critical use cases.

License

Code: Apache-2.0 (compatible with 🤗 Transformers)

Citation

If you use this work, please cite the repository and the Hugging Face model page.

Downloads last month: 9

Safetensors

Model size

19.4M params

Tensor type

F32

humanvprojectceo
/

nilla-story