Instructions to use facebook/incoder-6B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use facebook/incoder-6B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="facebook/incoder-6B")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("facebook/incoder-6B") model = AutoModelForCausalLM.from_pretrained("facebook/incoder-6B") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use facebook/incoder-6B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "facebook/incoder-6B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "facebook/incoder-6B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/facebook/incoder-6B
- SGLang
How to use facebook/incoder-6B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "facebook/incoder-6B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "facebook/incoder-6B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "facebook/incoder-6B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "facebook/incoder-6B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use facebook/incoder-6B with Docker Model Runner:
docker model run hf.co/facebook/incoder-6B
update with description of half precision model
Browse files
README.md
CHANGED
|
@@ -35,6 +35,16 @@ pip install git+https://github.com/huggingface/transformers
|
|
| 35 |
|
| 36 |
See [https://github.com/dpfried/incoder](https://github.com/dpfried/incoder) for example code.
|
| 37 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
## Credits
|
| 39 |
|
| 40 |
The model was developed by Daniel Fried, Armen Aghajanyan, Jessy Lin, Sida Wang, Eric Wallace, Freda Shi, Ruiqi Zhong, Wen-tau Yih, Luke Zettlemoyer and Mike Lewis.
|
|
|
|
| 35 |
|
| 36 |
See [https://github.com/dpfried/incoder](https://github.com/dpfried/incoder) for example code.
|
| 37 |
|
| 38 |
+
This 6B model comes in two versions: with weights in full-precision (float32) (branch `main`) and weights in half-precision (float16) (branch `float16`). The versions can be loaded as follows:
|
| 39 |
+
|
| 40 |
+
- Full-precision (float32): This should be used if you are fine-tuning the model (note: this will take a lot of GPU memory, probably multiple GPUs, and we have not tried training the model in `transformers` --- it was trained in Fairseq)
|
| 41 |
+
|
| 42 |
+
`model = AutoModelForCausalLM.from_pretrained("facebook/incoder-6B")`
|
| 43 |
+
|
| 44 |
+
- Half-precision (float16): This can be used if you are only doing inference (i.e. generating from the model). It will use less GPU memory, and less RAM when loading the model. With this version it should be able to perform inference on a 16 GB GPU (with a batch size of 1, to sequence lengths of at least 256).
|
| 45 |
+
|
| 46 |
+
`model = AutoModelForCausalLM.from_pretrained("facebook/incoder-6B", revision="float16", torch_dtype=torch.float16, low_cpu_mem_usage=True)`
|
| 47 |
+
|
| 48 |
## Credits
|
| 49 |
|
| 50 |
The model was developed by Daniel Fried, Armen Aghajanyan, Jessy Lin, Sida Wang, Eric Wallace, Freda Shi, Ruiqi Zhong, Wen-tau Yih, Luke Zettlemoyer and Mike Lewis.
|