🪷 Lotus-12B (4-bit Quantized)

This is a 4-bit quantized version of Lotus-12B, converted to safetensors and compressed using llmcompressor.

Lotus-12B is a GPT-NeoX 12B model fine-tuned on 2.5GB of a diverse range of light novels, erotica, annotated literature, and public-domain conversations for the purpose of generating novel-like fictional text and conversations.

Quantization Details

This model was quantized using the One-Shot GPTQ method to reduce memory footprint and improve inference speed while maintaining generation quality.

Setting	Value
Method	GPTQ (W4A16)
Group Size	128
Dampening Fraction	0.01
Calibration Dataset	`neuralmagic/LLM_compression_calibration` (512 samples)
Ignored Modules	`lm_head`

Note: The lm_head was kept in full precision to ensure stability in text generation.

Model Description

The base model used for fine-tuning is Pythia 12B Deduped, which is a 12 billion parameter auto-regressive language model trained on The Pile.

Usage

vLLM (Recommended)

This model is optimized for vLLM, which automatically detects the compressed tensors config.

from vllm import LLM, SamplingParams

# Load Model
llm = LLM(
    model="Ryex/Lotus-12B-GPTQ",
    trust_remote_code=True,
    max_model_len=2048
)

# Prompt
prompt = '''[ Title: The Dunwich Horror; Author: H. P. Lovecraft; Genre: Horror ]
***
When a traveler'''

# Generate
params = SamplingParams(temperature=1.0, top_p=0.9, max_tokens=256)
outputs = llm.generate([prompt], sampling_params=params)

print(outputs[0].outputs[0].text)

Transformers

You can also run this using transformers with auto_gptq or compressed-tensors installed.

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "YOUR_USERNAME/Lotus-12B-GPTQ"

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(model_id)

prompt = '''[ Title: The Dunwich Horror; Author: H. P. Lovecraft; Genre: Horror ]
***
When a traveler'''

input_ids = tokenizer.encode(prompt, return_tensors='pt').to(model.device)
output = model.generate(
    input_ids, 
    do_sample=True, 
    temperature=1.0, 
    top_p=0.9, 
    repetition_penalty=1.2, 
    max_new_tokens=200, 
    pad_token_id=tokenizer.eos_token_id
)

print(tokenizer.decode(output[0]))

Training Data & Annotative Prompting

The data used in fine-tuning has been gathered from various sources such as the Gutenberg Project. The annotated fiction dataset has prepended tags to assist in generating towards a particular style. Here is an example prompt that shows how to use the annotations.

[ Title: The Dunwich Horror; Author: H. P. Lovecraft; Genre: Horror; Tags: 3rdperson, scary; Style: Dark ]
***
When a traveler in north central Massachusetts takes the wrong fork...

And for conversations which were scraped from My Discord Server and publicly available subreddits from Reddit:

[ Title: (2019) Cars getting transported on an open deck catch on fire after salty water shorts their batteries; Genre: CatastrophicFailure ]
***
Anonymous: Daaaaaamn try explaining that one to the owners
EDIT: who keeps reposting this for my comment to get 3k upvotes?
Anonymous: "Your car caught fire from some water"
Irythros: Lol, I wonder if any compensation was in order
Anonymous: Almost all of the carriers offer insurance but it isn’t cheap. I guarantee most of those owners declined the insurance.

The annotations can be mixed and matched to help generate towards a specific style.

Downstream Uses

This model can be used for entertainment purposes and as a creative writing assistant for fiction writers and chatbots.

Team members and Acknowledgements

This project would not have been possible without the work done by EleutherAI. Thank you!

Anthony Mercurio
Imperishable_NEET

In order to reach us, you can join our Discord server.

```

Downloads last month: 1

Safetensors

Model size

2B params

Tensor type

I64

F32

I32

Model tree for Ryex/Lotus-12B-GPTQ

Base model

hakurei/lotus-12B

Quantized

(1)

this model