Instructions to use androlike/TerraMix_L2_13B_16K with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use androlike/TerraMix_L2_13B_16K with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="androlike/TerraMix_L2_13B_16K")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("androlike/TerraMix_L2_13B_16K") model = AutoModelForCausalLM.from_pretrained("androlike/TerraMix_L2_13B_16K") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use androlike/TerraMix_L2_13B_16K with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "androlike/TerraMix_L2_13B_16K" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "androlike/TerraMix_L2_13B_16K", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/androlike/TerraMix_L2_13B_16K
- SGLang
How to use androlike/TerraMix_L2_13B_16K with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "androlike/TerraMix_L2_13B_16K" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "androlike/TerraMix_L2_13B_16K", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "androlike/TerraMix_L2_13B_16K" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "androlike/TerraMix_L2_13B_16K", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use androlike/TerraMix_L2_13B_16K with Docker Model Runner:
docker model run hf.co/androlike/TerraMix_L2_13B_16K
Link to GGUF version: GGUF
Thanks to everyone, who finenuted base Llama2 model, made (Q)LoRas, created scripts for merge: ties-merge, BlockMerge_Gradient, zaraki-tools
Model details:
Experiment about merging model with large context length. Use these rope scaling settings:
--rope-freq-base 10000 --rope-freq-scale 0.25 -c 16384 (llama.cpp)
--ropeconfig 0.25 10000 --contextsize 16384 (koboldcpp)
You can use various instruct formats:
Alpaca instruct format (Recommended):
### Instruction:
(your instruct prompt is here)
### Response:
Vicuna 1.1 instruct format:
You are a helpful AI assistant.
USER: <prompt>
ASSISTANT:
Metharme instruct format:
<|system|> (your instruct prompt)
<|user|> (user's reply)<|model|> (for model's output)
Models used for the merge:
Part1:
Airoboros L2 13B 2.1 + LLAMA2 13B - Holodeck merged with Creative and Reasoning Airoboros LMoE 13B 2.1
Part2:
Chronos 13B V2 merged with Kimiko-v2-13B + Nous-Hermes-Llama2-13b merged with limarp-llama2 and limarp-llama2-v2 + Synthia-13B merged with BluemoonRP-L2-13B and LLama-2-13b-chat-erp-lora-mk2 + WizardLM-1.0-Uncensored-Llama2-13b merged with Llama-2-13B-Storywriter-LORA
Part3:
Speechless Llama2 13B + Redmond Puffin 13B
Part4:
Tsukasa 13B 16K (repo is deleted) + EverythingLM-13b-V2-16k
Part5:
TerraMix_L2_13B (base) was merged with PIPPA ShareGPT Subset QLoRa 13B
Part6:
Three parts merged in one, then TsuryLM-L2-16K was merged with TerraMix_L2_13B (base).
Model is intended for creativity purposes (roleplay). It can regularly break formatting or sometimes have poor understanding about small details in occuring situations.
But yet, this model is almost absent from alignment, can generate direct output, moderately in prose, good in internet RP style.
Limitations and risks
Llama2 and its derivatives (finetunes) is licensed under LLama 2 Community License, various finetunes or (Q)LoRAs has appropriate licenses depending on used datasets in finetuning or training (Quantized) Low-Rank Adaptations. This mix can generate heavily biased output, which aren't suitable for minors or common audience.
- Downloads last month
- 5