Instructions to use AbrahamSanders/opt-2.7b-realtime-chat with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use AbrahamSanders/opt-2.7b-realtime-chat with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="AbrahamSanders/opt-2.7b-realtime-chat")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("AbrahamSanders/opt-2.7b-realtime-chat")
model = AutoModelForCausalLM.from_pretrained("AbrahamSanders/opt-2.7b-realtime-chat")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use AbrahamSanders/opt-2.7b-realtime-chat with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "AbrahamSanders/opt-2.7b-realtime-chat"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AbrahamSanders/opt-2.7b-realtime-chat",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/AbrahamSanders/opt-2.7b-realtime-chat

SGLang

How to use AbrahamSanders/opt-2.7b-realtime-chat with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "AbrahamSanders/opt-2.7b-realtime-chat" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AbrahamSanders/opt-2.7b-realtime-chat",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "AbrahamSanders/opt-2.7b-realtime-chat" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AbrahamSanders/opt-2.7b-realtime-chat",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use AbrahamSanders/opt-2.7b-realtime-chat with Docker Model Runner:
```
docker model run hf.co/AbrahamSanders/opt-2.7b-realtime-chat
```

Base model facebook/opt-2.7b

Fine-tuned for causal language modeling of transcribed spoken dialogue from the TalkBank CABank collection. Training corpora include:

CABNC - Spoken language segment of the British National Corpus
CallFriend English (N) - Phone calls
CallFriend English (S) - Phone calls
CallHome English - Phone calls
GCSAusE - Australian conversations
ISL - Conversations recorded to test ASR methods for meeting
MICASE - Michigan Corpus of Academic Spoken English
SCoSE - The Saarbrücken Corpus of Spoken (American) English.

(Corpus descriptions are from TalkBank)

Data input format: The data format models a sequence of spoken dialogue between two or more participants:

The sequence is prefixed with information about the participants including name (can be a proper noun, a title/role, or unknown), age (can be a number or unknown), and sex (can be male, female, other, unknown).
It then proceeds to sequentially list all utterances in the conversation, each prefixed with their participant code (S1, S2, S3, etc.).
Utterances support a limited set of transcription notations in the CHAT & CHAT-CA formats:
- Pauses: (.) for a generic short pause, or (N.N) for a timed pause. For example (3.4) is a pause for 3.4 seconds.
- Non-verbal sounds: &=laughs, &=cough, &=breathes, &=click, etc. Anything describing a speaker-produced non-verbal sound can come after a prefix of &=
- Comments about speaker or setting: [% baby crying in background], [% smiling], [% phone clicking noise], [% imitating him], etc. Anything describing the state of the speaker or environment can be in this block. Also, a comment block can be used to describe speaker-produced sounds, but it is more common to use the &= prefix for that.
- Unknown or unintelligible utterances: xxx
- Breathing: hhh

Example:

<participant> S1 (name: Dave, age: 33, sex: male) <participant> S2 (name: unknown, age: unknown, sex: unknown) <dialog> S1: Hi! (2.3) are you there? S2: hhh hhh [% background noise] uh yeah (0.8) I can hear you. (1.2) &=cough can you hear me? S1: ...

Usage Info:

Per the OPT documentation, the model was trained with tokenizer setting use_fast=False.

To use this model for real-time inference in a continuous duplex dialogue system, see: https://github.com/AbrahamSanders/realtime-chatbot.

Downloads last month: 1