| | --- |
| | license: apache-2.0 |
| | --- |
| | --- |
| | datasets: |
| | - afoland/penchant |
| | - language: en |
| | --- |
| |
|
| | ## Penchant |
| |
|
| | Penchant-7B is a poetry-writing model finetuned on dolphin-2.1-mistral-7b from eric hartford (https://huggingface.co/cognitivecomputations/dolphin-2.1-mistral-7b/tree/main) using the penchant dataset. |
| | It is an instruction model; it does not expect multi-round chat. |
| | Included is example code that calls a local LLM server for completion. |
| |
|
| | ## Dataset |
| |
|
| | The dataset is penchant, a collection of public-domain poetry of (due to public domain / copyright dates) primarily Victorian poets. The dataset has been cleaned to primarily shorter poems. There are a little over 3000 poems in the dataset. |
| |
|
| | ## Training |
| | It took 12 hours to train 4 epochs on 2x 3090's. |
| |
|
| | ## Loading and Using the Model |
| |
|
| | Use *bitsandbytes / Transformers* to **load the model in 8 bits**. |
| | This gives the best performance. |
| |
|
| | Prompt format: |
| | This model uses [ChatML](https://github.com/openai/openai-python/blob/main/chatml.md) prompt format. |
| | ``` |
| | <|im_start|>system |
| | You are Dolphin, a helpful AI assistant.<|im_end|> |
| | <|im_start|>user |
| | {prompt}<|im_end|> |
| | <|im_start|>assistant |
| | |
| | ``` |
| |
|
| | Example: |
| | ``` |
| | <|im_start|>system |
| | You are Dolphin, a helpful AI assistant.<|im_end|> |
| | <|im_start|>user |
| | Write me a poem entitled "Sleeping Where I Fell". It should be in the style of Emily Dickinson.<|im_end|> |
| | <|im_start|>assistant |
| | ``` |
| |
|
| | The model is very sensitive to getting the format exactly right. |
| | The model expects a `<BOS>` token at the beginning, and will output an `<EOS>` token. |
| | If loading the model in text-generation-webui, use *instruct* mode only. |
| |
|
| | The output will be between a `<POEM>` and `</POEM> `tag. |
| | About 5% of the time it will fail to use these tags. |
| |
|
| | The model sometimes outputs poems it has learned; it is particularly prone to do so in response to short prompts. Favorites are "The World Is Too Much With Us", "A Sick Rose", "The Windhover", "I Wandered Lonely as a Cloud" (this one is often under another title). |
| |
|
| | The model, of course, has its own native capability and knowledge of poetry; sometimes this comes through in reciting poems it has "never seen" in the penchant training set. |
| |
|
| | ## Example Output |
| |
|
| | ``` |
| | <|im_start|>system |
| | You are Dolphin, a helpful AI assistant.<|im_end|> |
| | <|im_start|>user |
| | Write a poem about nature.<|im_end|> |
| | <|im_start|>assistant |
| | <POEM> |
| | The Cuckoo's Nest |
| | |
| | A little nest of straw and feathers |
| | In the old apple-tree; |
| | And in it lay two eggs together, |
| | As white as they could be: |
| | But one was blue, and one was green, |
| | And both were speckled brown; |
| | And there was not a bird to hatch them, |
| | Nor any creature round. |
| | </POEM> |
| | ``` |
| |
|
| | ``` |
| | <|im_start|>system |
| | You are Dolphin, a helpful AI assistant.<|im_end|> |
| | <|im_start|>user |
| | Write a poem about a season<|im_end|> |
| | <|im_start|>assistant |
| | <POEM> |
| | Spring |
| | |
| | I am the Spring that in my heart doth keep |
| | The seeds of every flower that ever grew; |
| | And I have tasted of the honey-dew |
| | That drips from blossoms when the bees do creep: |
| | And I am changed to something rich and sweet, |
| | To bees a flow'ring heath among the weeds; |
| | A hideous plant with many a hooked seed, |
| | That clasps me round about the middle neat. |
| | </POEM> |
| | ``` |
| |
|
| | ## Prompt Construction |
| | The model has been trained on primarily complex prompts. Very simple prompts are much more likely to lead to recitation of known poems. Here are some for inspiration: |
| |
|
| | -Write a poem about [subject] |
| |
|
| | -Create a literary poem. Use vivid imagery and sensory details to describe [subject]. Incorporate personification or metaphor to enhance the description. Pay attention to rhythm and rhyme scheme. Consider different points of view and tones. Choose words that evoke emotion and create a mood. Keep it simple yet profound. |
| |
|
| | -Write a poem entitled [title] |
| |
|
| | -Write an artistic poem. The title should be [title]. Use vivid imagery and creative metaphors throughout your work. Draw inspiration from the works of [poet]. Pay close attention to the sounds that words make when read aloud, and use these sounds to create rhythm and musicality within your piece. |
| |
|
| | -Generate a poem with a whimsical mood |
| |
|
| | -Write an artistic poem. Include alliteration and onomatopoeia throughout your work. Be sure to consider how the great poet [poet] might approach this task. Finally, make sure that your poem has a clear theme or message. The title of your poem should be [title]. |
| |
|
| | ## Example Code |
| |
|
| | The "examples" folder includes code for generating poetry by calling a local http LLM server. |
| | The code given is for the text-generation-webui completions endpoint but the code is easily replaced. |
| | The folder also has examples of titles that are mostly not of known poems of these authors. |
| |
|
| | ## Other Notes |
| | -If the prompt is very short, it can simply quote an existing poem up to 50% of the time. The most reliable |
| | way to get original poems is to give a longer prompt with a title and a poet, where the title is novel. |
| | Always double-check output for originality. |
| |
|
| | -Victorian poets were a moody, nature-loving lot. This shows in the poetry. If you know |
| | of a more modern public domain poetry dataset, I'd love to add it. |
| |
|
| | -The underlying dolphin is uncensored, filtered to remove alignment and bias. It is not recommended to expose it as a service, please read eric's blog post about uncensored models. https://erichartford.com/uncensored-models |
| |
|
| | -You are responsible for any content you create using this model. Enjoy responsibly. |
| |
|
| | -Apache-2.0 license, so it is suitable for commercial or non-commercial use. |
| |
|
| |
|
| | ## Gratitude |
| | - Eric Hartford for the dolphin series of models on which this is based |
| | - Dolphin-2.1-mistral-7b's training, on which penchant is based, was sponsored by [a16z](https://a16z.com/supporting-the-open-source-ai-community/). |
| | - poemhunter.com for making poetry accessible |
| | - Thank you to all the other people in the Open Source AI community who have taught me and helped me along the way. |