Instructions to use mindrage/Manticore-13B-Chat-Pyg-Guanaco-GGML with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use mindrage/Manticore-13B-Chat-Pyg-Guanaco-GGML with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="mindrage/Manticore-13B-Chat-Pyg-Guanaco-GGML")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("mindrage/Manticore-13B-Chat-Pyg-Guanaco-GGML") model = AutoModelForCausalLM.from_pretrained("mindrage/Manticore-13B-Chat-Pyg-Guanaco-GGML") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use mindrage/Manticore-13B-Chat-Pyg-Guanaco-GGML with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "mindrage/Manticore-13B-Chat-Pyg-Guanaco-GGML" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mindrage/Manticore-13B-Chat-Pyg-Guanaco-GGML", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/mindrage/Manticore-13B-Chat-Pyg-Guanaco-GGML
- SGLang
How to use mindrage/Manticore-13B-Chat-Pyg-Guanaco-GGML with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "mindrage/Manticore-13B-Chat-Pyg-Guanaco-GGML" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mindrage/Manticore-13B-Chat-Pyg-Guanaco-GGML", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "mindrage/Manticore-13B-Chat-Pyg-Guanaco-GGML" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mindrage/Manticore-13B-Chat-Pyg-Guanaco-GGML", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use mindrage/Manticore-13B-Chat-Pyg-Guanaco-GGML with Docker Model Runner:
docker model run hf.co/mindrage/Manticore-13B-Chat-Pyg-Guanaco-GGML
GGML of:
Manticore-13b-Chat-Pyg by openaccess-ai-collective with the Guanaco 13b qLoRa by TimDettmers applied through Monero, quantized by mindrage, uncensored
12.06.2023: Added versions quantized with the new method (less precision loss relative to compression ratio, but slower (for now)): q2_K, q3_KM, q4_KS, q4_KM, q5_KS
Old Quant method: q4_0, q5_0 and q8_0 versions available
Files are quantized using the newest llama.cpp and will therefore only work with llama.cpp versions compiled after May 19th, 2023.
The model seems to have noticeably benefited from further augmentation with the Guanaco qLora. Its capabilities seem broad, even compared with other Wizard or Manticore models, with expected weaknesses at coding. It is very good at in-context-learning and (in its class) reasoning. It both follows instructions well, and can be used as a chatbot. Refreshingly, it does not seem to insist on aggressively sticking to narratives to justify formerly hallucinated output as much as similar models. It's output seems... eerily smart at times. I believe the model is fully unrestricted/uncensored and will generally not berate.
Prompting style + settings:
Presumably due to the very diverse training-data the model accepts a variety of prompting styles with relatively few issues, including the ###-Variant, but seems to work best using:
"Naming" the model works great by simply modifying the context. Substantial changes in its behaviour can be caused by appending to "ASSISTANT:", like "ASSISTANT: After careful consideration, thinking step-by-step, my response is:"
user: "USER:" - bot: "ASSISTANT:" - context: "This is a conversation between an advanced AI and a human user."
Turn Template: <|user|> <|user-message|>\n<|bot|><|bot-message|>\n
Settings that work well without (subjectively) being too deterministic:
temp: 0.15 - top_p: 0.1 - top_k: 40 - rep penalty: 1.1
- Downloads last month
- 11