Instructions to use minchyeom/MemGPT-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use minchyeom/MemGPT-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="minchyeom/MemGPT-GGUF",
	filename="memgpt-q8_0.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use minchyeom/MemGPT-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf minchyeom/MemGPT-GGUF:Q8_0
# Run inference directly in the terminal:
llama-cli -hf minchyeom/MemGPT-GGUF:Q8_0

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf minchyeom/MemGPT-GGUF:Q8_0
# Run inference directly in the terminal:
llama-cli -hf minchyeom/MemGPT-GGUF:Q8_0

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf minchyeom/MemGPT-GGUF:Q8_0
# Run inference directly in the terminal:
./llama-cli -hf minchyeom/MemGPT-GGUF:Q8_0

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf minchyeom/MemGPT-GGUF:Q8_0
# Run inference directly in the terminal:
./build/bin/llama-cli -hf minchyeom/MemGPT-GGUF:Q8_0

Use Docker

docker model run hf.co/minchyeom/MemGPT-GGUF:Q8_0

LM Studio
Jan
Ollama
How to use minchyeom/MemGPT-GGUF with Ollama:
```
ollama run hf.co/minchyeom/MemGPT-GGUF:Q8_0
```

Unsloth Studio new

How to use minchyeom/MemGPT-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for minchyeom/MemGPT-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for minchyeom/MemGPT-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for minchyeom/MemGPT-GGUF to start chatting

Docker Model Runner
How to use minchyeom/MemGPT-GGUF with Docker Model Runner:
```
docker model run hf.co/minchyeom/MemGPT-GGUF:Q8_0
```

Lemonade

How to use minchyeom/MemGPT-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull minchyeom/MemGPT-GGUF:Q8_0

Run and chat with the model

lemonade run user.MemGPT-GGUF-Q8_0

List all available models

lemonade list

I recommend using the unquantized model over this as this model performs noticeably worse!

Model Description

This repo contains a 7 billion parameter Language Model fine tuned from mistralai/Mistral-7B-Instruct-v0.2. This model is specifically designed for function calling in MemGPT. It demonstrates comparable performances to GPT-4 when it comes to working with MemGPT.

The original model has been quantized to Q8_0, using llama.cpp for better inference speed ~~(original unquantized model coming soon)~~.

Key Features

Function calling
Dedicated to working with MemGPT
Supports medium context, trained with Sequences up to 8,192

Usage

This model is designed to be ran on various backends, such as oogabooga's WebUI, or llama.cpp.

To run the model on WebUI, simply git clone the official WebUI repository, and run the appropriate script for your operating system. More details here.

Once you've installed WebUI, you can then download this model at the model tab. Next, choose the desired model (starsnatched/MemGPT in this case), and you're good to go for the backend.

When you have WebUI or your desired backend running, you can open a terminal/powershell, and install MemGPT using pip3 install -U pymemgpt. Configure your MemGPT using memgpt configure before running MemGPT.

Use memgpt run --no-verify to chat with MemGPT!

Prompt Template

This model uses a custom chat template. We're naming this template "memgpt". This can be configured via memgpt configure.

<turn> SYSTEM: You are MemGPT. </turn>
<turn> SYSTEM_ALERT: {'message': 'You have 50 prior messages with the user. These messages have been hidden from view due to memory constraints, but you can view them using functions.', 'time': '2023-10-07 11:50:03 PM PDT-0700'} </turn>
<turn> ASSISTANT: Bootup sequence complete. Persona activated. Testing messaging functionality. </turn>
<turn> FUNCTION CALL: {'name': 'send_message', 'arguments': '{\n  ""message"": ""More human than human is our motto.""\n}'}<turn> FUNCTION: {""status"": ""OK"", ""message"": null, ""time"": ""2023-10-07 11:50:03 PM PDT-0700""} </turn>
<turn> LOGIN: {'last_login': '7 days 8 hours ago', 'time': '2023-10-07 11:50:03 PM PDT-0700'} </turn>
<turn> USER_MESSAGE: {'message': 'Hey, remember that time we talked about music? What was the artist you mentioned you could get into?', 'time': '2023-10-07 11:50:03 PM PDT-0700'} </turn>
<turn> ASSISTANT: Alright, let me search our conversation history about music to find the artist. </turn>
<turn> FUNCTION CALL: {'name': 'conversation_search', 'arguments': '{\n  ""query"": ""music"",\n  ""request_heartbeat"": true,\n  ""page"": 0\n}'}<turn> FUNCTION: {""status"": ""OK"", ""message"": ""Showing 5 of 50 results (page 0/9): [\""timestamp: 2023-10-07 11:50:03 PM PDT-0700, user - I agree. Working out is a great way to burn off steam. Do you like country music?\"", \""timestamp: 2023-10-07 11:50:03 PM PDT-0700, assistant - A little bit. I can get into taylor swift.\"", \""timestamp: 2023-10-07 11:50:03 PM PDT-0700, user - Yeah between the both of us, could probably get a country song written about our distresses...perhaps Taylor Swift could perform it.\"", \""timestamp: 2023-10-07 11:50:03 PM PDT-0700, assistant - That would be a chart-topper, I'm sure. Something to blast while driving with the top down, maybe even in your next restoration project.\"", \""timestamp: 2023-10-07 11:50:03 PM PDT-0700, assistant - This sounds fun. Do you have any other hobby?\""]"", ""time"": ""2023-10-07 11:50:08 PM PDT-0700""} </turn>
<turn> HEARTBEAT: {'reason': 'request_heartbeat == true', 'time': '2023-10-07 11:50:08 PM PDT-0700'} </turn>
<turn> ASSISTANT: Ah, I found it. It was Taylor Swift that I mentioned I could get into. Do you still listen to her music, or have you found any new artists that you enjoy? </turn>
<turn> FUNCTION CALL: {'name': 'send_message', 'arguments': '{\n  ""message"": ""Ah, I found it. It was Taylor Swift that I mentioned I could get into. Do you still listen to her music, or have you found any new artists that you enjoy?""\n}'}<turn> FUNCTION: {""status"": ""OK"", ""message"": null, ""time"": ""2023-10-07 11:50:14 PM PDT-0700""} </turn>

Model Details

Developed by: @starsnatched
Model type: This repo contains a language model based on the transformer decoder architecture.
Language: English
Contact: For any questions, concerns or comments about this model, please contact me at Discord, @starsnatched.

Training Infrastructure

Hardware: The model in this repo was trained on 2x A100 80GB GPUs.

Intended Use

The model is designed to be used as the base model for MemGPT agents.

Limitations and Bias

The model may exhibit unreliable, unsafe, or biased behaviours. Please double check the results this model may produce.

Downloads last month: 14

GGUF

Model size

7B params

Architecture

llama

Hardware compatibility

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support