Text Generation
Transformers
Safetensors
gpt_bigcode
code
granite
conversational
Eval Results (legacy)
text-generation-inference
Instructions to use ibm-granite/granite-34b-code-instruct-8k with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ibm-granite/granite-34b-code-instruct-8k with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="ibm-granite/granite-34b-code-instruct-8k") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("ibm-granite/granite-34b-code-instruct-8k") model = AutoModelForCausalLM.from_pretrained("ibm-granite/granite-34b-code-instruct-8k") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use ibm-granite/granite-34b-code-instruct-8k with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ibm-granite/granite-34b-code-instruct-8k" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ibm-granite/granite-34b-code-instruct-8k", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/ibm-granite/granite-34b-code-instruct-8k
- SGLang
How to use ibm-granite/granite-34b-code-instruct-8k with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ibm-granite/granite-34b-code-instruct-8k" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ibm-granite/granite-34b-code-instruct-8k", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ibm-granite/granite-34b-code-instruct-8k" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ibm-granite/granite-34b-code-instruct-8k", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use ibm-granite/granite-34b-code-instruct-8k with Docker Model Runner:
docker model run hf.co/ibm-granite/granite-34b-code-instruct-8k
Commit History
update context length b7b4a96 verified
Update README.md 2b0c93e verified
granite tag 20f67e1 verified
update paper 680b1ca verified
Update README.md 3bae2fd verified
Update README.md 9130f14 verified
Update config.json c42d2bd verified
Update README.md a7175f5 verified
Update README.md a670fa3 verified
Update README.md 9b4d69b verified
Update README.md 4660a63 verified
update example c406b8d verified
apply chat template 6a72207 verified
removed code comments 6266357 verified
First commit Granite-34B-Code-Instruct 895ffb1 verified
update with correct model 836df29
upload model 1ee739c
Mayank Mishra commited on