Instructions to use minchyeom/MemGPT-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use minchyeom/MemGPT-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="minchyeom/MemGPT-GGUF", filename="memgpt-q8_0.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use minchyeom/MemGPT-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf minchyeom/MemGPT-GGUF:Q8_0 # Run inference directly in the terminal: llama-cli -hf minchyeom/MemGPT-GGUF:Q8_0
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf minchyeom/MemGPT-GGUF:Q8_0 # Run inference directly in the terminal: llama-cli -hf minchyeom/MemGPT-GGUF:Q8_0
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf minchyeom/MemGPT-GGUF:Q8_0 # Run inference directly in the terminal: ./llama-cli -hf minchyeom/MemGPT-GGUF:Q8_0
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf minchyeom/MemGPT-GGUF:Q8_0 # Run inference directly in the terminal: ./build/bin/llama-cli -hf minchyeom/MemGPT-GGUF:Q8_0
Use Docker
docker model run hf.co/minchyeom/MemGPT-GGUF:Q8_0
- LM Studio
- Jan
- Ollama
How to use minchyeom/MemGPT-GGUF with Ollama:
ollama run hf.co/minchyeom/MemGPT-GGUF:Q8_0
- Unsloth Studio new
How to use minchyeom/MemGPT-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for minchyeom/MemGPT-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for minchyeom/MemGPT-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for minchyeom/MemGPT-GGUF to start chatting
- Docker Model Runner
How to use minchyeom/MemGPT-GGUF with Docker Model Runner:
docker model run hf.co/minchyeom/MemGPT-GGUF:Q8_0
- Lemonade
How to use minchyeom/MemGPT-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull minchyeom/MemGPT-GGUF:Q8_0
Run and chat with the model
lemonade run user.MemGPT-GGUF-Q8_0
List all available models
lemonade list
llm.create_chat_completion(
messages = "No input example has been defined for this model task."
)I recommend using the unquantized model over this as this model performs noticeably worse!
Model Description
This repo contains a 7 billion parameter Language Model fine tuned from mistralai/Mistral-7B-Instruct-v0.2. This model is specifically designed for function calling in MemGPT. It demonstrates comparable performances to GPT-4 when it comes to working with MemGPT.
The original model has been quantized to Q8_0, using llama.cpp for better inference speed (original unquantized model coming soon).
Key Features
- Function calling
- Dedicated to working with MemGPT
- Supports medium context, trained with Sequences up to 8,192
Usage
This model is designed to be ran on various backends, such as oogabooga's WebUI, or llama.cpp.
To run the model on WebUI, simply git clone the official WebUI repository, and run the appropriate script for your operating system. More details here.
Once you've installed WebUI, you can then download this model at the model tab. Next, choose the desired model (starsnatched/MemGPT in this case), and you're good to go for the backend.
When you have WebUI or your desired backend running, you can open a terminal/powershell, and install MemGPT using pip3 install -U pymemgpt. Configure your MemGPT using memgpt configure before running MemGPT.
Use memgpt run --no-verify to chat with MemGPT!
Prompt Template
This model uses a custom chat template. We're naming this template "memgpt". This can be configured via memgpt configure.
<turn> SYSTEM: You are MemGPT. </turn>
<turn> SYSTEM_ALERT: {'message': 'You have 50 prior messages with the user. These messages have been hidden from view due to memory constraints, but you can view them using functions.', 'time': '2023-10-07 11:50:03 PM PDT-0700'} </turn>
<turn> ASSISTANT: Bootup sequence complete. Persona activated. Testing messaging functionality. </turn>
<turn> FUNCTION CALL: {'name': 'send_message', 'arguments': '{\n ""message"": ""More human than human is our motto.""\n}'}<turn> FUNCTION: {""status"": ""OK"", ""message"": null, ""time"": ""2023-10-07 11:50:03 PM PDT-0700""} </turn>
<turn> LOGIN: {'last_login': '7 days 8 hours ago', 'time': '2023-10-07 11:50:03 PM PDT-0700'} </turn>
<turn> USER_MESSAGE: {'message': 'Hey, remember that time we talked about music? What was the artist you mentioned you could get into?', 'time': '2023-10-07 11:50:03 PM PDT-0700'} </turn>
<turn> ASSISTANT: Alright, let me search our conversation history about music to find the artist. </turn>
<turn> FUNCTION CALL: {'name': 'conversation_search', 'arguments': '{\n ""query"": ""music"",\n ""request_heartbeat"": true,\n ""page"": 0\n}'}<turn> FUNCTION: {""status"": ""OK"", ""message"": ""Showing 5 of 50 results (page 0/9): [\""timestamp: 2023-10-07 11:50:03 PM PDT-0700, user - I agree. Working out is a great way to burn off steam. Do you like country music?\"", \""timestamp: 2023-10-07 11:50:03 PM PDT-0700, assistant - A little bit. I can get into taylor swift.\"", \""timestamp: 2023-10-07 11:50:03 PM PDT-0700, user - Yeah between the both of us, could probably get a country song written about our distresses...perhaps Taylor Swift could perform it.\"", \""timestamp: 2023-10-07 11:50:03 PM PDT-0700, assistant - That would be a chart-topper, I'm sure. Something to blast while driving with the top down, maybe even in your next restoration project.\"", \""timestamp: 2023-10-07 11:50:03 PM PDT-0700, assistant - This sounds fun. Do you have any other hobby?\""]"", ""time"": ""2023-10-07 11:50:08 PM PDT-0700""} </turn>
<turn> HEARTBEAT: {'reason': 'request_heartbeat == true', 'time': '2023-10-07 11:50:08 PM PDT-0700'} </turn>
<turn> ASSISTANT: Ah, I found it. It was Taylor Swift that I mentioned I could get into. Do you still listen to her music, or have you found any new artists that you enjoy? </turn>
<turn> FUNCTION CALL: {'name': 'send_message', 'arguments': '{\n ""message"": ""Ah, I found it. It was Taylor Swift that I mentioned I could get into. Do you still listen to her music, or have you found any new artists that you enjoy?""\n}'}<turn> FUNCTION: {""status"": ""OK"", ""message"": null, ""time"": ""2023-10-07 11:50:14 PM PDT-0700""} </turn>
Model Details
- Developed by: @starsnatched
- Model type: This repo contains a language model based on the transformer decoder architecture.
- Language: English
- Contact: For any questions, concerns or comments about this model, please contact me at Discord, @starsnatched.
Training Infrastructure
- Hardware: The model in this repo was trained on 2x A100 80GB GPUs.
Intended Use
The model is designed to be used as the base model for MemGPT agents.
Limitations and Bias
The model may exhibit unreliable, unsafe, or biased behaviours. Please double check the results this model may produce.
- Downloads last month
- 14
8-bit
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="minchyeom/MemGPT-GGUF", filename="memgpt-q8_0.gguf", )