馃 Lotus-12B (4-bit Quantized)
This is a 4-bit quantized version of Lotus-12B, converted to safetensors and compressed using llmcompressor.
Lotus-12B is a GPT-NeoX 12B model fine-tuned on 2.5GB of a diverse range of light novels, erotica, annotated literature, and public-domain conversations for the purpose of generating novel-like fictional text and conversations.
Quantization Details
This model was quantized using the One-Shot GPTQ method to reduce memory footprint and improve inference speed while maintaining generation quality.
| Setting | Value |
|---|---|
| Method | GPTQ (W4A16) |
| Group Size | 128 |
| Dampening Fraction | 0.01 |
| Calibration Dataset | neuralmagic/LLM_compression_calibration (512 samples) |
| Ignored Modules | lm_head |
Note: The lm_head was kept in full precision to ensure stability in text generation.
Model Description
The base model used for fine-tuning is Pythia 12B Deduped, which is a 12 billion parameter auto-regressive language model trained on The Pile.
Usage
vLLM (Recommended)
This model is optimized for vLLM, which automatically detects the compressed tensors config.
from vllm import LLM, SamplingParams
# Load Model
llm = LLM(
model="Ryex/Lotus-12B-GPTQ",
trust_remote_code=True,
max_model_len=2048
)
# Prompt
prompt = '''[ Title: The Dunwich Horror; Author: H. P. Lovecraft; Genre: Horror ]
***
When a traveler'''
# Generate
params = SamplingParams(temperature=1.0, top_p=0.9, max_tokens=256)
outputs = llm.generate([prompt], sampling_params=params)
print(outputs[0].outputs[0].text)
Transformers
You can also run this using transformers with auto_gptq or compressed-tensors installed.
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "YOUR_USERNAME/Lotus-12B-GPTQ"
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
prompt = '''[ Title: The Dunwich Horror; Author: H. P. Lovecraft; Genre: Horror ]
***
When a traveler'''
input_ids = tokenizer.encode(prompt, return_tensors='pt').to(model.device)
output = model.generate(
input_ids,
do_sample=True,
temperature=1.0,
top_p=0.9,
repetition_penalty=1.2,
max_new_tokens=200,
pad_token_id=tokenizer.eos_token_id
)
print(tokenizer.decode(output[0]))
Training Data & Annotative Prompting
The data used in fine-tuning has been gathered from various sources such as the Gutenberg Project. The annotated fiction dataset has prepended tags to assist in generating towards a particular style. Here is an example prompt that shows how to use the annotations.
[ Title: The Dunwich Horror; Author: H. P. Lovecraft; Genre: Horror; Tags: 3rdperson, scary; Style: Dark ]
***
When a traveler in north central Massachusetts takes the wrong fork...
And for conversations which were scraped from My Discord Server and publicly available subreddits from Reddit:
[ Title: (2019) Cars getting transported on an open deck catch on fire after salty water shorts their batteries; Genre: CatastrophicFailure ]
***
Anonymous: Daaaaaamn try explaining that one to the owners
EDIT: who keeps reposting this for my comment to get 3k upvotes?
Anonymous: "Your car caught fire from some water"
Irythros: Lol, I wonder if any compensation was in order
Anonymous: Almost all of the carriers offer insurance but it isn鈥檛 cheap. I guarantee most of those owners declined the insurance.
The annotations can be mixed and matched to help generate towards a specific style.
Downstream Uses
This model can be used for entertainment purposes and as a creative writing assistant for fiction writers and chatbots.
Team members and Acknowledgements
This project would not have been possible without the work done by EleutherAI. Thank you!
- Anthony Mercurio
- Imperishable_NEET
In order to reach us, you can join our Discord server.
- Downloads last month
- 25
Model tree for Ryex/Lotus-12B-GPTQ
Base model
hakurei/lotus-12B