Instructions to use nothingiisreal/L3-8B-dolphin-2.9.1-WritingPrompts with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use nothingiisreal/L3-8B-dolphin-2.9.1-WritingPrompts with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="nothingiisreal/L3-8B-dolphin-2.9.1-WritingPrompts") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("nothingiisreal/L3-8B-dolphin-2.9.1-WritingPrompts") model = AutoModelForCausalLM.from_pretrained("nothingiisreal/L3-8B-dolphin-2.9.1-WritingPrompts") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use nothingiisreal/L3-8B-dolphin-2.9.1-WritingPrompts with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "nothingiisreal/L3-8B-dolphin-2.9.1-WritingPrompts" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nothingiisreal/L3-8B-dolphin-2.9.1-WritingPrompts", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/nothingiisreal/L3-8B-dolphin-2.9.1-WritingPrompts
- SGLang
How to use nothingiisreal/L3-8B-dolphin-2.9.1-WritingPrompts with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "nothingiisreal/L3-8B-dolphin-2.9.1-WritingPrompts" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nothingiisreal/L3-8B-dolphin-2.9.1-WritingPrompts", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "nothingiisreal/L3-8B-dolphin-2.9.1-WritingPrompts" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nothingiisreal/L3-8B-dolphin-2.9.1-WritingPrompts", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use nothingiisreal/L3-8B-dolphin-2.9.1-WritingPrompts with Docker Model Runner:
docker model run hf.co/nothingiisreal/L3-8B-dolphin-2.9.1-WritingPrompts
Writing Prompts
We used r/WritingPrompts and r/DirtyWritingPrompts to do KTO against [https://huggingface.co/datasets/Gryphe/Opus-WritingPrompts] to remove slop.
Optimally Use ChatML, no system message, no nothing. And always start with Write a story using this writing prompt: For example:
Write a story using this writing prompt: As a prank a witch detached your cock and suctioned it to the shower in the girl's dorm. Neither of you expected how frequently it was going to be used, nor knew that it couldn't get soft again!
Apparently RP has also become a bit less sloppy by coincidence.
We are looking into opening the datasets up, I'm a bit tired atm, you can also just go get this torrent of the entire reddit, select only the subreddits you want and DIY [https://academictorrents.com/details/56aa49f9653ba545f48df2e33679f014d2829c10]
(for context - this model was a test run, on a small dataset. It will be scaled up later.)
Training Config:
Thanks a lot to llamafactory, the easiest train I've ever done so far.
llamafactory-cli train \
--stage kto \
--do_train True \
--model_name_or_path cognitivecomputations/dolphin-2.9.1-llama-3-8b \
--preprocessing_num_workers 16 \
--finetuning_type lora \
--quantization_bit 8 \
--template chatml \
--flash_attn auto \
--use_unsloth True \
--dataset_dir /workspace/kto \
--dataset kto_dataset \
--cutoff_len 2048 \
--learning_rate 5e-05 \
--num_train_epochs 3.0 \
--max_samples 100000 \
--per_device_train_batch_size 2 \
--gradient_accumulation_steps 8 \
--lr_scheduler_type cosine \
--max_grad_norm 1.0 \
--logging_steps 5 \
--save_steps 500 \
--warmup_steps 50 \
--optim adamw_torch \
--packing False \
--report_to all \
--output_dir saves/LLaMA3-8B/lora/train_2024-06-15-15-18-25 \
--bf16 True \
--plot_loss True \
--ddp_timeout 180000000 \
--include_num_input_tokens_seen True \
--lora_rank 32 \
--lora_alpha 32 \
--lora_dropout 0 \
--lora_target all \
--pref_beta 0.1 \
--pref_ftx 0 \
--pref_loss sigmoid \
--val_size 0.05 \
--eval_strategy steps \
--eval_steps 50 \
--per_device_eval_batch_size 2
- Downloads last month
- 24