Instructions to use AbrahamSanders/opt-2.7b-realtime-chat with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AbrahamSanders/opt-2.7b-realtime-chat with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="AbrahamSanders/opt-2.7b-realtime-chat")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("AbrahamSanders/opt-2.7b-realtime-chat") model = AutoModelForCausalLM.from_pretrained("AbrahamSanders/opt-2.7b-realtime-chat") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use AbrahamSanders/opt-2.7b-realtime-chat with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "AbrahamSanders/opt-2.7b-realtime-chat" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AbrahamSanders/opt-2.7b-realtime-chat", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/AbrahamSanders/opt-2.7b-realtime-chat
- SGLang
How to use AbrahamSanders/opt-2.7b-realtime-chat with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "AbrahamSanders/opt-2.7b-realtime-chat" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AbrahamSanders/opt-2.7b-realtime-chat", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "AbrahamSanders/opt-2.7b-realtime-chat" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AbrahamSanders/opt-2.7b-realtime-chat", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use AbrahamSanders/opt-2.7b-realtime-chat with Docker Model Runner:
docker model run hf.co/AbrahamSanders/opt-2.7b-realtime-chat
Base model facebook/opt-2.7b
Fine-tuned for causal language modeling of transcribed spoken dialogue from the TalkBank CABank collection. Training corpora include:
- CABNC - Spoken language segment of the British National Corpus
- CallFriend English (N) - Phone calls
- CallFriend English (S) - Phone calls
- CallHome English - Phone calls
- GCSAusE - Australian conversations
- ISL - Conversations recorded to test ASR methods for meeting
- MICASE - Michigan Corpus of Academic Spoken English
- SCoSE - The Saarbrücken Corpus of Spoken (American) English.
(Corpus descriptions are from TalkBank)
Data input format: The data format models a sequence of spoken dialogue between two or more participants:
- The sequence is prefixed with information about the participants including name (can be a proper noun, a title/role, or unknown), age (can be a number or unknown), and sex (can be male, female, other, unknown).
- It then proceeds to sequentially list all utterances in the conversation, each prefixed with their participant code (S1, S2, S3, etc.).
- Utterances support a limited set of transcription notations in the CHAT & CHAT-CA formats:
- Pauses:
(.)for a generic short pause, or(N.N)for a timed pause. For example(3.4)is a pause for 3.4 seconds. - Non-verbal sounds:
&=laughs,&=cough,&=breathes,&=click, etc. Anything describing a speaker-produced non-verbal sound can come after a prefix of&= - Comments about speaker or setting:
[% baby crying in background],[% smiling],[% phone clicking noise],[% imitating him], etc. Anything describing the state of the speaker or environment can be in this block. Also, a comment block can be used to describe speaker-produced sounds, but it is more common to use the&=prefix for that. - Unknown or unintelligible utterances:
xxx - Breathing:
hhh
- Pauses:
Example:
<participant> S1 (name: Dave, age: 33, sex: male) <participant> S2 (name: unknown, age: unknown, sex: unknown) <dialog> S1: Hi! (2.3) are you there? S2: hhh hhh [% background noise] uh yeah (0.8) I can hear you. (1.2) &=cough can you hear me? S1: ...
Usage Info:
Per the OPT documentation, the model was trained with tokenizer setting use_fast=False.
To use this model for real-time inference in a continuous duplex dialogue system, see: https://github.com/AbrahamSanders/realtime-chatbot.
- Downloads last month
- 10