Instructions to use QuixiAI/samantha-13b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use QuixiAI/samantha-13b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="QuixiAI/samantha-13b")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("QuixiAI/samantha-13b") model = AutoModelForCausalLM.from_pretrained("QuixiAI/samantha-13b") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use QuixiAI/samantha-13b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "QuixiAI/samantha-13b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "QuixiAI/samantha-13b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/QuixiAI/samantha-13b
- SGLang
How to use QuixiAI/samantha-13b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "QuixiAI/samantha-13b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "QuixiAI/samantha-13b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "QuixiAI/samantha-13b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "QuixiAI/samantha-13b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use QuixiAI/samantha-13b with Docker Model Runner:
docker model run hf.co/QuixiAI/samantha-13b
:)
She will not engage in roleplay, romance, or sexual activity?
She will not.
I curious about memory context, has she only got a token context of 2048 deep for tracking her interactions with us? Is that baked into the training of the model itself - I'd be happy to go read a paper explaining it. I've ready some of the papers on strategies with models for over coming the limits with design, but I've not found a clear explanation on just how the 2048 limit crops up with the models other than statements saying it is inherent to the model...for 'reasons'. If the answer is complex, I'm happy to go study, just not sure where to look in the sea of info.
I'm going to train her on rwkv and falcon too.
This is a good starting point. Did you build the dataset yourself? And are you planning open source it?
I discuss my method on my blog
https://erichartford.com/meet-samantha