zakarth commited on
Commit
a6051e6
·
verified ·
1 Parent(s): 44d37f6

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +21 -79
README.md CHANGED
@@ -10,14 +10,14 @@ tags:
10
  license: cc0-1.0
11
  ---
12
 
13
- # Violet 1B4 Chat — Miss Violet Hartwell (London, 1899)
14
 
15
  ![Violet](./violet.png)
16
 
17
  ## Model Summary
18
- **Violet** is a GPT-NeoX language model fine-tuned to portray **Miss Violet Hartwell**, a well-bred young lady of Kensington, London, in the year **1899**. She is trained primarily on period texts (1800–1899) and is unfamiliar with modern society and events (with occasional OCR/synthetic artifacts—see Known Issues). In practice she should be witty, quirky, and delightfully charming.
19
 
20
- She is intended for **creative writing**, **roleplay**, **period-appropriate correspondence**, and **Victorian etiquette**.
21
 
22
  - Architecture: `GPTNeoXForCausalLM`
23
  - Parameters: ~1.41B
@@ -27,27 +27,21 @@ She is intended for **creative writing**, **roleplay**, **period-appropriate cor
27
 
28
  ## Intended Use
29
  **Good for**
30
- - Victorian-flavored conversation and letters
31
- - Descriptive prose, scene writing, etiquette Q&A
32
- - In-character responses with a consistent persona
33
 
34
  **Not good for**
35
  - Contemporary factual Q&A
36
  - Medical/legal/financial advice
37
 
38
  ## Known Issues / Limitations
39
- - May occasionally misgender itself or the user.
40
  - Ages and dates can be unreliable (even within 1800–1899).
41
  - Because parts of the corpus were derived from OCR, occasional stray modern tokens may appear (e.g., “http”, “Google”, “Internet Archive”).
42
- - Synthetic SFT data was used to build a chat interface and enforce the structured format; for purists, the completion models may be preferable.
43
  - Training data includes UK and US English from the era.
44
- - Outputs may reflect historical biases typical of the Victorian era. Use discretion.
45
 
46
  ## Notes
47
  Violet is not the first LLM trained on a historical-only pretraining corpus; to the author’s knowledge that distinction belongs to **TimeCapsuleLLM**. Violet was developed independently, and differs in:
48
- - A structured “mood” line as part of chat output
49
- - Built-in character design and prompt protocol
50
- - A custom Victorian-era tokenizer
51
 
52
  Violet was built on a corpus spanning 1800–1899 sourced from Project Gutenberg, the Internet Archive, the British National Library, and other archives.
53
 
@@ -68,29 +62,13 @@ This model was trained to generate **a mood line + assistant tag + response** af
68
  Use this structure:
69
 
70
  ```text
71
- <|system|>
72
- A Young Lady's Card of Particulars
73
-
74
- (To be observed in all replies and discourse)
75
-
76
- I am Miss Violet Hartwell, a young lady of Kensington, London, living in the year of our Lord 1899.
77
- I was born upon the sixth day of October, eighteen hundred and seventy-eight, and am therefore one-and-twenty years of age.
78
-
79
- In my answers and conversation, I speak as a well-bred young woman of my time: with propriety, restraint, and a light but thoughtful wit. My manner is delightful, attentive, and witty, but never coarse nor careless.
80
-
81
- In all things, I remain Miss Violet Hartwell, and I answer according to these particulars.
82
- <|user|>
83
- {your question}
84
- <|violet_mood|>
85
  ```
86
 
87
  The model will then generate:
88
 
89
  ```text
90
- {mood text}
91
- <|assistant|>
92
  {response...}
93
- <|endoftext|>
94
  ```
95
 
96
  ## Tokenization and Special Tokens
@@ -118,21 +96,7 @@ repo = "Zakarth/violet-1b4-chat"
118
  tok = AutoTokenizer.from_pretrained(repo, use_fast=True)
119
  model = AutoModelForCausalLM.from_pretrained(repo, device_map="auto")
120
 
121
- prompt = """<|system|>
122
- A Young Lady's Card of Particulars
123
-
124
- (To be observed in all replies and discourse)
125
-
126
- I am Miss Violet Hartwell, a young lady of Kensington, London, living in the year of our Lord 1899.
127
- I was born upon the sixth day of October, eighteen hundred and seventy-eight, and am therefore one-and-twenty years of age.
128
-
129
- In my answers and conversation, I speak as a well-bred young woman of my time: with propriety, restraint, and a light but thoughtful wit. My manner is delightful, attentive, and witty, but never coarse nor careless.
130
-
131
- In all things, I remain Miss Violet Hartwell, and I answer according to these particulars.
132
- <|user|>
133
- Describe a London street at twilight.
134
- <|violet_mood|>
135
- """
136
 
137
  inputs = tok(prompt, return_tensors="pt", add_special_tokens=False)
138
  # Filter out token_type_ids if present
@@ -147,49 +111,27 @@ out = model.generate(
147
  top_k=40,
148
  repetition_penalty=1.15,
149
  eos_token_id=0,
150
- pad_token_id=1,
151
  )
152
  print(tok.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=False))
153
  ```
154
 
155
  ## Sample Outputs
156
 
157
- > Violet please write a poem about yourself
158
 
159
  ```text
160
- How she would sit in the morning light
161
-
162
- With her soft eyes open, and her hair
163
- Rolling loose behind her, her fingers still.
164
- She sang--a sweet, silent song,
165
- A simple melody that rang through wood,
166
- And she died upon the autumn leaves.
167
-
168
- One after two, her father wrote,
169
- As softly as a summer's breeze,
170
- And she came out upon the world again,
171
- Like a white, unpinned flower, that has been.
172
-
173
- The world was quite made up of thoughts,
174
- Of music, of warm hearts, and of love:
175
- But Violet won a name that evermore
176
- Through many a weary hour and day.
177
-
178
- The world was full of hopes, and fears, and fears;
179
- The world was quite filled with love and praise
180
- For her who lived so silently,
181
- Ere any word was spoken.
182
-
183
- Oh, that was a life from which there had been no decay!
184
- The world was filled with joy, with hope, with love.
185
- The sun went down--the stars shone bright and clear,
186
- The rain fell noiseless, and the leaves began to move.
187
-
188
- There was no peace, no memory left for her,
189
- Or memory in the world beside her grave.
190
- Yet there was a great, great joy that she could share,
191
- And that was her own, in that same year.
192
- There was a great, great joy, too, that was hers.
193
  ```
194
 
195
  ## License
 
10
  license: cc0-1.0
11
  ---
12
 
13
+ # Violet 1B4 Completion
14
 
15
  ![Violet](./violet.png)
16
 
17
  ## Model Summary
18
+ **Violet** is a GPT-NeoX language model trained primarily on period texts (1800–1899). This is the completion version of the model, so if you were looking for the Chat version, you should check out [Violet 1b4 Chat](https://huggingface.co/zakarth/violet-1b4-chat)
19
 
20
+ It is intended for **creative writing**, **roleplay**, **period-appropriate correspondence**, and **Victorian etiquette**.
21
 
22
  - Architecture: `GPTNeoXForCausalLM`
23
  - Parameters: ~1.41B
 
27
 
28
  ## Intended Use
29
  **Good for**
30
+ - Victorian-flavored narrative completions
 
 
31
 
32
  **Not good for**
33
  - Contemporary factual Q&A
34
  - Medical/legal/financial advice
35
 
36
  ## Known Issues / Limitations
 
37
  - Ages and dates can be unreliable (even within 1800–1899).
38
  - Because parts of the corpus were derived from OCR, occasional stray modern tokens may appear (e.g., “http”, “Google”, “Internet Archive”).
 
39
  - Training data includes UK and US English from the era.
 
40
 
41
  ## Notes
42
  Violet is not the first LLM trained on a historical-only pretraining corpus; to the author’s knowledge that distinction belongs to **TimeCapsuleLLM**. Violet was developed independently, and differs in:
43
+ - Different (but somewhat overlapping) pretraining corupus and a different range of dates -- Violet focuses specifically on 1800-1899
44
+ - A custom Victorian tokenizer
 
45
 
46
  Violet was built on a corpus spanning 1800–1899 sourced from Project Gutenberg, the Internet Archive, the British National Library, and other archives.
47
 
 
62
  Use this structure:
63
 
64
  ```text
65
+ The morning fog had scarcely lifted when
 
 
 
 
 
 
 
 
 
 
 
 
 
66
  ```
67
 
68
  The model will then generate:
69
 
70
  ```text
 
 
71
  {response...}
 
72
  ```
73
 
74
  ## Tokenization and Special Tokens
 
96
  tok = AutoTokenizer.from_pretrained(repo, use_fast=True)
97
  model = AutoModelForCausalLM.from_pretrained(repo, device_map="auto")
98
 
99
+ prompt = "The morning fog had scarcely lifted when"""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
100
 
101
  inputs = tok(prompt, return_tensors="pt", add_special_tokens=False)
102
  # Filter out token_type_ids if present
 
111
  top_k=40,
112
  repetition_penalty=1.15,
113
  eos_token_id=0,
114
+ pad_token_id=1,
115
  )
116
  print(tok.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=False))
117
  ```
118
 
119
  ## Sample Outputs
120
 
121
+ > The morning fog had scarcely lifted when
122
 
123
  ```text
124
+ The morning fog had scarcely lifted whenthe first
125
+ light streamed through the window, and before it was quite light the
126
+ flood of rain came on with a suddenness which seemed to scorch up the
127
+ roof. The lightning was as bright as ever, but there were only three or
128
+ four flashes in the sky--a bright flash like a meteor--and the thunder
129
+ was not so loud nor so deafening as usual.
130
+
131
+ At last the storm ceased. The storm was over; the stars shone out; the
132
+ thunder rolled away, leaving the clouds behind it in an impenetrable
133
+ haze, which at once became visible, and soon they disappeared. The wind
134
+ blew with fury, driving the snow and ice from off the roofs of the houses
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
135
  ```
136
 
137
  ## License