---
language:
- en
library_name: transformers
license: apache-2.0
tags:
- sparknet
- causal-lm
- text-generation
- gpt
- pytorch
- 70m
pipeline_tag: text-generation
model-index:
- name: SparkNet-70M-v5
  results: []
datasets:
- codelion/finepdfs-1B
- codelion/dclm-baseline-1B
- codelion/fineweb-edu-1B
---

# SparkNet 70M v5

SparkNet 70M v5 is the final 70M-parameter checkpoint from the SparkNet research run by **DienerTech**. It is a compact GPT-2–style decoder (12 layers, 512 hidden size, 8 attention heads, 1024-token context) that was trained for ~1B tokens on a custom mixture of high-quality web and document corpora. The release ships with the SparkNet v5 tokenizer and weights stored in `model.safetensors`, ready for direct use via 🤗 Transformers.

Special thanks to [CodeLion](https://huggingface.co/codelion) for inspiring the **One Billion Token Challenge**, and for providing the high-quality datasets used in this training run.

## Model Details

- **Developer**: DienerTech  
- **Architecture**: GPT-2–style causal decoder (approx. 70M parameters), dropout 0.1, cosine LR schedule, AdamW (fused).  
- **Context length**: 1,024 tokens.  
- **Tokenizer**: SparkNet v5 byte-level BPE (vocab size 50,257, EOS = `` and `<|pad|>` padding).  
- **Framework**: PyTorch / 🤗 Transformers 4.46+.  
- **Checkpoint**: Converted to `model.safetensors` for safe loading; no `pytorch_model.bin` left in the repo.

## Intended Use

- Lightweight text generation experiments, story/note drafting, or as a base for instruction-tuning / domain adaptation (LoRA, QLoRA, etc.).
- Research on small-model scaling laws or tokenizer experimentation.

## Limitations & Risks

- No RLHF / instruction tuning; outputs will be generic next-token predictions and may require prompting tricks.
- Training data is predominantly public web/document text, so bias, toxicity, or outdated information may surface.
- Not evaluated for safety-critical deployments—perform your own alignment and red-teaming before production use.

## Training Data

- 1B tokens packed into 1,024-token blocks (`datasets/sparknet-v5-1b`).
- Sources sampled uniformly across: `codelion/finepdfs-1B`, `codelion/dclm-baseline-1B`, `codelion/fineweb-edu-1B`, plus curated DienerTech blog data.
- Validation set: `wikitext-2-raw-v1` (standard Hugging Face split).

## Training Procedure

- **Optimizer**: AdamW (fused) with β₁=0.9, β₂=0.95, weight decay 0.1, gradient clipping at 1.0.  
- **Learning rate**: 1e-4 peak with 3% warmup then cosine decay.  
- **Batching**: per-device batch size 32, gradient accumulation 2 → 65,536 tokens/step.  
- **Budget**: 1,000,000,000 effective tokens (≈15,259 steps).  
- **Hardware**: Single 24GB+ NVIDIA GPU with TF32 + Flash Attention enabled.  
- **Best checkpoint**: step 14,000 with eval loss 4.99 on WikiText-2 (logged via `trainer_state.json`).

## Evaluation

Formal downstream evaluation has not been run yet. Inside `trainer_state.json`, the best validation (WikiText-2) cross-entropy reached **4.9869** at step 14k. If you benchmark the model (e.g., with lm-eval-harness), please consider contributing results back to the card via a PR.

## Usage

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "DienerTech/sparknet-70m-v5"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,  # or torch.float16 on older GPUs
    device_map="auto",
)

prompt = "In a distant research lab, a tiny transformer model awakened and"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output = model.generate(
    **inputs,
    max_new_tokens=120,
    temperature=0.9,
    top_p=0.9,
    do_sample=True,
)
print(tokenizer.decode(output[0], skip_special_tokens=True))
```

## Citation

```
@software{sparknet70mv5,
  author = {DienerTech},
  title = {SparkNet 70M v5},
  year = {2025},
  url = {https://huggingface.co/DienerTech/sparknet-70m-v5}
}
```

Please open an issue or PR on the DienerTech Hugging Face repo if you have feedback, evaluations, or fine-tuned variants to share.