zakarth
/

violet-1b4

@@ -10,14 +10,14 @@ tags:
 license: cc0-1.0
 ---
-# Violet 1B4 Chat — Miss Violet Hartwell (London, 1899)
 ![Violet](./violet.png)
 ## Model Summary
-**Violet** is a GPT-NeoX language model fine-tuned to portray **Miss Violet Hartwell**, a well-bred young lady of Kensington, London, in the year **1899**. She is trained primarily on period texts (1800–1899) and is unfamiliar with modern society and events (with occasional OCR/synthetic artifacts—see Known Issues). In practice she should be witty, quirky, and delightfully charming.
-She is intended for **creative writing**, **roleplay**, **period-appropriate correspondence**, and **Victorian etiquette**.
 - Architecture: `GPTNeoXForCausalLM`
 - Parameters: ~1.41B
@@ -27,27 +27,21 @@ She is intended for **creative writing**, **roleplay**, **period-appropriate cor
 ## Intended Use
 **Good for**
-- Victorian-flavored conversation and letters
-- Descriptive prose, scene writing, etiquette Q&A
-- In-character responses with a consistent persona
 **Not good for**
 - Contemporary factual Q&A
 - Medical/legal/financial advice
 ## Known Issues / Limitations
-- May occasionally misgender itself or the user.
 - Ages and dates can be unreliable (even within 1800–1899).
 - Because parts of the corpus were derived from OCR, occasional stray modern tokens may appear (e.g., “http”, “Google”, “Internet Archive”).
-- Synthetic SFT data was used to build a chat interface and enforce the structured format; for purists, the completion models may be preferable.
 - Training data includes UK and US English from the era.
-- Outputs may reflect historical biases typical of the Victorian era. Use discretion.
 ## Notes
 Violet is not the first LLM trained on a historical-only pretraining corpus; to the author’s knowledge that distinction belongs to **TimeCapsuleLLM**. Violet was developed independently, and differs in:
-- A structured “mood” line as part of chat output
-- Built-in character design and prompt protocol
-- A custom Victorian-era tokenizer
 Violet was built on a corpus spanning 1800–1899 sourced from Project Gutenberg, the Internet Archive, the British National Library, and other archives.
@@ -68,29 +62,13 @@ This model was trained to generate **a mood line + assistant tag + response** af
 Use this structure:
 ```text
-<|system|>
-A Young Lady's Card of Particulars
-(To be observed in all replies and discourse)
-I am Miss Violet Hartwell, a young lady of Kensington, London, living in the year of our Lord 1899.
-I was born upon the sixth day of October, eighteen hundred and seventy-eight, and am therefore one-and-twenty years of age.
-In my answers and conversation, I speak as a well-bred young woman of my time: with propriety, restraint, and a light but thoughtful wit. My manner is delightful, attentive, and witty, but never coarse nor careless.
-In all things, I remain Miss Violet Hartwell, and I answer according to these particulars.
-<|user|>
-{your question}
-<|violet_mood|>
 ```
 The model will then generate:
 ```text
-{mood text}
-<|assistant|>
 {response...}
-<|endoftext|>
 ```
 ## Tokenization and Special Tokens
@@ -118,21 +96,7 @@ repo = "Zakarth/violet-1b4-chat"
 tok = AutoTokenizer.from_pretrained(repo, use_fast=True)
 model = AutoModelForCausalLM.from_pretrained(repo, device_map="auto")
-prompt = """<|system|>
-A Young Lady's Card of Particulars
-(To be observed in all replies and discourse)
-I am Miss Violet Hartwell, a young lady of Kensington, London, living in the year of our Lord 1899.
-I was born upon the sixth day of October, eighteen hundred and seventy-eight, and am therefore one-and-twenty years of age.
-In my answers and conversation, I speak as a well-bred young woman of my time: with propriety, restraint, and a light but thoughtful wit. My manner is delightful, attentive, and witty, but never coarse nor careless.
-In all things, I remain Miss Violet Hartwell, and I answer according to these particulars.
-<|user|>
-Describe a London street at twilight.
-<|violet_mood|>
-"""
 inputs = tok(prompt, return_tensors="pt", add_special_tokens=False)
 # Filter out token_type_ids if present
@@ -147,49 +111,27 @@ out = model.generate(
     top_k=40,
     repetition_penalty=1.15,
     eos_token_id=0,
-    pad_token_id=1,
 )
 print(tok.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=False))
 ```
 ## Sample Outputs
-> Violet please write a poem about yourself
 ```text
-How she would sit in the morning light
-With her soft eyes open, and her hair
-Rolling loose behind her, her fingers still.
-She sang--a sweet, silent song,
-A simple melody that rang through wood,
-And she died upon the autumn leaves.
-One after two, her father wrote,
-As softly as a summer's breeze,
-And she came out upon the world again,
-Like a white, unpinned flower, that has been.
-The world was quite made up of thoughts,
-Of music, of warm hearts, and of love:
-But Violet won a name that evermore
-Through many a weary hour and day.
-The world was full of hopes, and fears, and fears;
-The world was quite filled with love and praise
-For her who lived so silently,
-Ere any word was spoken.
-Oh, that was a life from which there had been no decay!
-The world was filled with joy, with hope, with love.
-The sun went down--the stars shone bright and clear,
-The rain fell noiseless, and the leaves began to move.
-There was no peace, no memory left for her,
-Or memory in the world beside her grave.
-Yet there was a great, great joy that she could share,
-And that was her own, in that same year.
-There was a great, great joy, too, that was hers.
 ```
 ## License

 license: cc0-1.0
 ---
+# Violet 1B4 Completion
 ![Violet](./violet.png)
 ## Model Summary
+**Violet** is a GPT-NeoX language model trained primarily on period texts (1800–1899). This is the completion version of the model, so if you were looking for the Chat version, you should check out [Violet 1b4 Chat](https://huggingface.co/zakarth/violet-1b4-chat)
+It is intended for **creative writing**, **roleplay**, **period-appropriate correspondence**, and **Victorian etiquette**.
 - Architecture: `GPTNeoXForCausalLM`
 - Parameters: ~1.41B
 ## Intended Use
 **Good for**
+- Victorian-flavored narrative completions
 **Not good for**
 - Contemporary factual Q&A
 - Medical/legal/financial advice
 ## Known Issues / Limitations
 - Ages and dates can be unreliable (even within 1800–1899).
 - Because parts of the corpus were derived from OCR, occasional stray modern tokens may appear (e.g., “http”, “Google”, “Internet Archive”).
 - Training data includes UK and US English from the era.
 ## Notes
 Violet is not the first LLM trained on a historical-only pretraining corpus; to the author’s knowledge that distinction belongs to **TimeCapsuleLLM**. Violet was developed independently, and differs in:
+- Different (but somewhat overlapping) pretraining corupus and a different range of dates -- Violet focuses specifically on 1800-1899
+- A custom Victorian tokenizer
 Violet was built on a corpus spanning 1800–1899 sourced from Project Gutenberg, the Internet Archive, the British National Library, and other archives.
 Use this structure:
 ```text
+The morning fog had scarcely lifted when
 ```
 The model will then generate:
 ```text
 {response...}
 ```
 ## Tokenization and Special Tokens
 tok = AutoTokenizer.from_pretrained(repo, use_fast=True)
 model = AutoModelForCausalLM.from_pretrained(repo, device_map="auto")
+prompt = "The morning fog had scarcely lifted when"""
 inputs = tok(prompt, return_tensors="pt", add_special_tokens=False)
 # Filter out token_type_ids if present
     top_k=40,
     repetition_penalty=1.15,
     eos_token_id=0,
+    pad_token_id=1,
 )
 print(tok.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=False))
 ```
 ## Sample Outputs
+> The morning fog had scarcely lifted when
 ```text
+The morning fog had scarcely lifted whenthe first
+light streamed through the window, and before it was quite light the
+flood of rain came on with a suddenness which seemed to scorch up the
+roof. The lightning was as bright as ever, but there were only three or
+four flashes in the sky--a bright flash like a meteor--and the thunder
+was not so loud nor so deafening as usual.
+At last the storm ceased. The storm was over; the stars shone out; the
+thunder rolled away, leaving the clouds behind it in an impenetrable
+haze, which at once became visible, and soon they disappeared. The wind
+blew with fury, driving the snow and ice from off the roofs of the houses
 ```
 ## License