Violet 1B4 Completion

Violet

Model Summary

Violet is a GPT-NeoX language model trained primarily on period texts (1800–1899). This is the completion version of the model, so if you were looking for the Chat version, you should check out Violet 1b4 Chat

It is intended for creative writing, roleplay, period-appropriate correspondence, and Victorian etiquette.

  • Architecture: GPTNeoXForCausalLM
  • Parameters: ~1.41B
  • Context length: 4096
  • Vocab size: 24014
  • Tokenizer: PreTrainedTokenizerFast

Intended Use

Good for

  • Victorian-flavored narrative completions

Not good for

  • Contemporary factual Q&A
  • Medical/legal/financial advice

Known Issues / Limitations

  • Ages and dates can be unreliable (even within 1800–1899).
  • Because parts of the corpus were derived from OCR, occasional stray modern tokens may appear (e.g., “http”, “Google”, “Internet Archive”).
  • Training data includes UK and US English from the era.

Notes

Violet is not the first LLM trained on a historical-only pretraining corpus; to the author’s knowledge that distinction belongs to TimeCapsuleLLM. Violet was developed independently, and differs in:

  • Different (but somewhat overlapping) pretraining corpus and a different range of dates -- Violet focuses specifically on 1800-1899
  • A custom Victorian tokenizer

Violet was built on a corpus spanning 1800–1899 sourced from Project Gutenberg, the Internet Archive, the British National Library, and other archives.

This project began as an attempt to build a local LLM without relying on copyrighted training sources. The author also values local models that can run on a user’s machine without sending data to the cloud.

Demo Resources

Related repos

  • Zakarth/violet-1b4 (base/completion)
  • Zakarth/violet-1b4-chat-onnx (WebGPU INT8)

Prompt Format (Chat)

This model was trained to generate a mood line + assistant tag + response after <|violet_mood|>.

Use this structure:

The morning fog had scarcely lifted when

The model will then generate:

{response...}

Tokenization and Special Tokens

Violet 1b4 was trained on a custom tokenizer specific for Victorian text.

Recommended IDs for generation:

  • eos_token_id: 0
  • pad_token_id: 1

Special tokens used during training (typical IDs from training config):

  • <|system|>: 24000
  • <|user|>: 24001
  • <|assistant|>: 24002
  • <|violet_mood|>: 24005

!! Do not mix tokenizers from other Violet variants (e.g. 160M) with this model.

How to use (Transformers.js)

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

repo = "Zakarth/violet-1b4-chat"
tok = AutoTokenizer.from_pretrained(repo, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(repo, device_map="auto")

prompt = "The morning fog had scarcely lifted when"""

inputs = tok(prompt, return_tensors="pt", add_special_tokens=False)
# Filter out token_type_ids if present
inputs = {k: v.to(model.device) for k, v in inputs.items() if k in ['input_ids', 'attention_mask']}

out = model.generate(
    **inputs,
    max_new_tokens=180,
    do_sample=True,
    temperature=0.8,
    top_p=0.9,
    top_k=40,
    repetition_penalty=1.15,
    eos_token_id=0,  
    pad_token_id=1,  
)
print(tok.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=False))

Sample Outputs

The morning fog had scarcely lifted when

The morning fog had scarcely lifted whenthe first
light streamed through the window, and before it was quite light the
flood of rain came on with a suddenness which seemed to scorch up the
roof. The lightning was as bright as ever, but there were only three or
four flashes in the sky--a bright flash like a meteor--and the thunder
was not so loud nor so deafening as usual.

At last the storm ceased. The storm was over; the stars shone out; the
thunder rolled away, leaving the clouds behind it in an impenetrable
haze, which at once became visible, and soon they disappeared. The wind
blew with fury, driving the snow and ice from off the roofs of the houses

License

Model weights and code in this repository are released under CC0 1.0 (public domain dedication).

Artwork

violet.png is © @rose.grtqndl (Instagram). Used and redistributed with permission; copyright remains with the artist.

Contact

You may contact me on X or anywhere else by searching for my handle

Citation

@misc{violet2026,
  author = Zakarth,
  title = {Violet: Victorian Language Models},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/Zakarth/violet-1b4-chat}
}
Downloads last month
27
Safetensors
Model size
1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support