Text Generation
Transformers
Safetensors
English
llama
granite
ibm
lab
labrador
labradorite
conversational
text-generation-inference
Instructions to use royleibov/granite-7b-instruct-ZipNN-Compressed with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use royleibov/granite-7b-instruct-ZipNN-Compressed with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="royleibov/granite-7b-instruct-ZipNN-Compressed") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("royleibov/granite-7b-instruct-ZipNN-Compressed") model = AutoModelForCausalLM.from_pretrained("royleibov/granite-7b-instruct-ZipNN-Compressed") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use royleibov/granite-7b-instruct-ZipNN-Compressed with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "royleibov/granite-7b-instruct-ZipNN-Compressed" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "royleibov/granite-7b-instruct-ZipNN-Compressed", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/royleibov/granite-7b-instruct-ZipNN-Compressed
- SGLang
How to use royleibov/granite-7b-instruct-ZipNN-Compressed with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "royleibov/granite-7b-instruct-ZipNN-Compressed" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "royleibov/granite-7b-instruct-ZipNN-Compressed", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "royleibov/granite-7b-instruct-ZipNN-Compressed" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "royleibov/granite-7b-instruct-ZipNN-Compressed", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use royleibov/granite-7b-instruct-ZipNN-Compressed with Docker Model Runner:
docker model run hf.co/royleibov/granite-7b-instruct-ZipNN-Compressed
Commit History
Add use this model ddd9df0 verified
Add support for transformers library a2767ea verified
Add bold 999e3ca verified
Fix zipnn_hf a9f9d09 verified
Clarify this is a clone and correct the use of ZipNN 2e0bd6e verified
Add scripts 5c653a0
delete scripts 51d629d
Adding .znn back to allow HF code testing 60b1b04
test removing in json .znn d874c91
test adding .znn to model json 18188d1
Fix layout 3578d5b verified
Fix code 5ca7f52 verified
Add usage 2286c11 verified
Fix link 716ec44 verified
Make README conform with ZipNN ab5564a verified
Compress with ZipNN e640cef
Update config.json c6d1adf verified
Update architectures to LlamaForCausalLM (#2) 024256d verified
Add architectures field to the config.json file (#1) fd5873d verified
add MMLU score 30046b1 verified
update model card 136380c verified
update model card f135def verified
add paper and model-card 0ee91d0
jaideepr97 commited on