Instructions to use pineappleSoup/DialoGPT-medium-707 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use pineappleSoup/DialoGPT-medium-707 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="pineappleSoup/DialoGPT-medium-707")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("pineappleSoup/DialoGPT-medium-707")
model = AutoModelForCausalLM.from_pretrained("pineappleSoup/DialoGPT-medium-707")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use pineappleSoup/DialoGPT-medium-707 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "pineappleSoup/DialoGPT-medium-707"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "pineappleSoup/DialoGPT-medium-707",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/pineappleSoup/DialoGPT-medium-707

SGLang

How to use pineappleSoup/DialoGPT-medium-707 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "pineappleSoup/DialoGPT-medium-707" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "pineappleSoup/DialoGPT-medium-707",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "pineappleSoup/DialoGPT-medium-707" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "pineappleSoup/DialoGPT-medium-707",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use pineappleSoup/DialoGPT-medium-707 with Docker Model Runner:
```
docker model run hf.co/pineappleSoup/DialoGPT-medium-707
```

How did you fix the issues within the tutorial?

by Skrunbger - opened Jan 24, 2025

Discussion

Skrunbger

Jan 24, 2025

I followed the exact same tutorial. I have semi-succesfully fixed them, but I was wondering how you got around them?
E.g:
Training raising errors due to token not existing at all instead of setting to None
pytorch_model.bin not appearing, so calling torch.save(model) manually <-- I'm not even sure if this is a proper fix, since my model starts to mentally implode after 4 consecutive messages each time.

(I set the 'messages' parameter to a circular queue. Anything more than 4 will cause it to spam '!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!')

pineappleSoup

Owner Jan 26, 2025

Can you tell me which part are you at? I assume you have finished training and starting to use the model or is it before that?

Skrunbger

Jan 27, 2025

•

edited Jan 27, 2025

I've managed to follow through the entire tutorial.

Again, here are some corrections I made:
In both collate functions, I commented out the if statement, and directly returned pad_sequence w/ the padding_value
(for me, when training both the small/medium model, tokenizer._pad_token doesn't exist at all)

def collate(examples: List[torch.Tensor]):
# if tokenizer._pad_token is None:
        return pad_sequence(examples, batch_first=True)
# return pad_sequence(examples, batch_first=True, padding_value=tokenizer.pad_token_id)

After training, the pytorch_model.bin didn't appear in my drive.

I fixed this by directly calling
torch.save(model, os.path.join(args.output_dir, "pytorch_model.bin"))
in the main function (after settings args to skip training)

After doing that, my bot functions fine-ish. If I try to use the endpoint on this website, the bot replies okay for about 4 replies, then starts spamming exclamation marks after that.
This happens every time.

I used the InterfaceClient instead of whatever the tutorial provided. I then used a circular queue to store message history.

response = self.api_client.chat.completions.create(
            model="...", 
            messages=list(self.message_history), 
            max_tokens=500,
            temperature=0.8,
            stream=False
        )

If I set the maxlen of the circular queue to more than 4, the bot outputs something like this:

Skrunbger

Jan 27, 2025

•

edited Jan 27, 2025

I'm asking because I'm unsure whether it's because of my weird hack of directly calling torch.save or it's just I the training data I gave it was bad.

I extracted speech from an ao3 fanfic html file.
I replaced all narration with "CONTEXT: blah blah"
I replaced all unknown speakers with "UNKNOWN: blah blah"
and if I knew what characters were speaking, I do "HANA: blah blah"

Correction:

In the final dataset that I used, I removed all "CONTEXT" lines and all that remains are characters speaking with each other.

pineappleSoup

Owner Jun 5, 2025

Hey how is that going? I am sorry I got really busy in the middle and thus I stopped responding. So, I was able to recreate your problem but you might have already solved it :"). Mine was not as intense as yours but here is it:

User:seven do you have a brother?
707Bot: !!!
User:do you have a brother?
707Bot: Yup!! It’d be nice if we just all make up…
User:seven 707Bot: !
User:Do you know Jumin
707Bot: !?!!?

I was just testing in colab (don't mind my weird prompts. they are lore based lol).

I found out that it has nothing to do with commenting out the code in collate function. We have to now since the old ways are deprecated. But anyway, I added repetation_penelty in chat_history_ids. So, it is like this:

chat_history_ids = model.generate(
bot_input_ids, max_length=200,
pad_token_id=tokenizer.eos_token_id,
no_repeat_ngram_size=3,
do_sample=True,
top_k=100,
temperature=0.8,
top_p=0.9,
repetition_penalty=1.2,
)

Also, I don't know InterfaceClient but the old way in the tutorial stopped working after 3 years so I also had to modify my script. Here is my link on how I switched back https://github.com/ShuangAnatoli/707. It's even simpler than the tutorial tbh.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment