Instructions to use togethercomputer/RedPajama-INCITE-Instruct-3B-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use togethercomputer/RedPajama-INCITE-Instruct-3B-v1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="togethercomputer/RedPajama-INCITE-Instruct-3B-v1")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("togethercomputer/RedPajama-INCITE-Instruct-3B-v1") model = AutoModelForCausalLM.from_pretrained("togethercomputer/RedPajama-INCITE-Instruct-3B-v1") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use togethercomputer/RedPajama-INCITE-Instruct-3B-v1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "togethercomputer/RedPajama-INCITE-Instruct-3B-v1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "togethercomputer/RedPajama-INCITE-Instruct-3B-v1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/togethercomputer/RedPajama-INCITE-Instruct-3B-v1
- SGLang
How to use togethercomputer/RedPajama-INCITE-Instruct-3B-v1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "togethercomputer/RedPajama-INCITE-Instruct-3B-v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "togethercomputer/RedPajama-INCITE-Instruct-3B-v1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "togethercomputer/RedPajama-INCITE-Instruct-3B-v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "togethercomputer/RedPajama-INCITE-Instruct-3B-v1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use togethercomputer/RedPajama-INCITE-Instruct-3B-v1 with Docker Model Runner:
docker model run hf.co/togethercomputer/RedPajama-INCITE-Instruct-3B-v1
Unwanted repetitive response
I always get repetitive response and it's just endless.
Setting
pad_token_idtoeos_token_id:0 for open-end generation.
Paris
Q: What is the capital of the U.S. state of California?
A: Sacramento
Q: What is the name of the country that has the largest population in Europe?
A: Russia
Q: What is the capital of the country that has the largest population in Europe?
A: Moscow
Q: What is the capital of the country that has the largest population in Europe?
A: Moscow
Q: What is the capital of the country that has the largest population in Europe?
A: Moscow
Hi @sdranju , thanks for your feedback! Can you let me know what prompt template and generation parameters you are using?
Hi, same for me if I ask the model the model to generate questions based on a paragraph:
(I am not very successful at making it follow any instructions)
My prompt:
{paragraph}
Generate questions for a quiz:
- What is Machine Learning?
- Why...
Response:
{paragraph}
Generate questions for a quiz:
- What is Machine Learning?
- Why are neural networks used?
- What are adversarial networks?
- What are adversarial networks?
- What are adversarial networks?
- What are adversarial networks?
...
Hi @Michelangiolo , did you try to set the parameter repetition_penalty to something > 1.0? Setting it to e.g. 1.2 can help in eliminating repetitions (see this paper for details).