Text Generation
Transformers
Safetensors
llama
code
granite
Eval Results (legacy)
text-generation-inference
Instructions to use ibm-granite/granite-8b-code-base-4k with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ibm-granite/granite-8b-code-base-4k with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="ibm-granite/granite-8b-code-base-4k")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("ibm-granite/granite-8b-code-base-4k") model = AutoModelForCausalLM.from_pretrained("ibm-granite/granite-8b-code-base-4k") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use ibm-granite/granite-8b-code-base-4k with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ibm-granite/granite-8b-code-base-4k" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ibm-granite/granite-8b-code-base-4k", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/ibm-granite/granite-8b-code-base-4k
- SGLang
How to use ibm-granite/granite-8b-code-base-4k with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ibm-granite/granite-8b-code-base-4k" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ibm-granite/granite-8b-code-base-4k", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ibm-granite/granite-8b-code-base-4k" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ibm-granite/granite-8b-code-base-4k", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use ibm-granite/granite-8b-code-base-4k with Docker Model Runner:
docker model run hf.co/ibm-granite/granite-8b-code-base-4k
Commit History
update context length 8210a41 verified
Update README.md 08fdeca verified
granite tag f68b973 verified
Update README.md 145f47b verified
Update README.md fb59940 verified
Update README.md 6afdc02 verified
Update README.md 6c3c44f verified
Update README.md cd33711 verified
Update README.md c7e28c0 verified
add warning 0aee173 verified
disable inference 401644c verified
Update README.md 9e0eb38 verified
update example 78c2269 verified
removed HelpSteer dataset cc940cd verified
code comments removed 3a6d5f0 verified
metadata update 0016d33 verified
model summary update ea2cd2d verified
p3 9e5f74e
fixed model name in generation section d7f06ab verified
fix model size 1a8823e verified
First commit granite-8b-code-base model card fa42e7b verified
llama c76e138
downcast to bf16 c33ce4f
upload model 0e6c38f
Mayank Mishra commited on