Text Generation
Transformers
Safetensors
English
llama
gptq
text-generation-inference
llama2
4-bit precision
Instructions to use aiplanet/effi-7b-gptq with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use aiplanet/effi-7b-gptq with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="aiplanet/effi-7b-gptq")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("aiplanet/effi-7b-gptq") model = AutoModelForCausalLM.from_pretrained("aiplanet/effi-7b-gptq") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use aiplanet/effi-7b-gptq with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "aiplanet/effi-7b-gptq" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "aiplanet/effi-7b-gptq", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/aiplanet/effi-7b-gptq
- SGLang
How to use aiplanet/effi-7b-gptq with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "aiplanet/effi-7b-gptq" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "aiplanet/effi-7b-gptq", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "aiplanet/effi-7b-gptq" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "aiplanet/effi-7b-gptq", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use aiplanet/effi-7b-gptq with Docker Model Runner:
docker model run hf.co/aiplanet/effi-7b-gptq
effi 7b GPTQ is a quantized version of effi 7b whiich is a 7 billion parameter model built by AI Planet. We have used Auto-gptq for quantising the model
Model Details
Model Description
This original model has been fine-tuned on Chain of Thought datasets, which has context from mixed sources with corresponding rationale. The final finetuned Large Language Model(LLM) have shown enhanced capabilities of solving novel tasks by providing a reasoning.And the final model was quantized into GPTQ format
- Developed by: AI Planet
- Model type: Casual Decoder only
- Language(s) (NLP): English
- Quantisation type: GPTQ(4-bit)
- License: Apache 2.0
- Quantized from model: Effi-7b
Qunatization Configuration
- bits: 4,
- damp_percent 0.1,
- dataset: "wikitext2",
- desc_act: false,
- group_size: 128,
- modules_in_block_to_quantize: null,
- quant_method: "gptq",
- sym: true,
- true_sequential: true
Example of usage
import torch
from transformers import AutoTokenizer , AutoModelForCausalLM
quant_path = "aiplanet/effi-7b-gptq"
model = AutoModelForCausalLM.from_pretrained(quant_path , device_map='cuda')
tokenizer = AutoTokenizer.from_pretrained(quant_path, trust_remote_code=True , safetensors=True , fuse_layers=True)
tst = """
### INSTRUCTION:
Virgin Australia, the trading name of Virgin Australia Airlines Pty Ltd, is an Australian-based airline. It is the largest airline by fleet size to use the Virgin brand. It commenced services on 31 August 2000 as Virgin Blue, with two aircraft on a single route. It suddenly found itself as a major airline in Australia's domestic market after the collapse of Ansett Australia in September 2001. The airline has since grown to directly serve 32 cities in Australia, from hubs in Brisbane, Melbourne and Sydney.Is Virgin Australia and Virgin Blue the same airlines?
"""
system_message = "Given your chain of thought reasoning, provide a rationale for the context in the source."
template=f"""
Context: {system_message}
Human: {tst}
"""
# Tokenize the input
input_ids = tokenizer(template, return_tensors="pt", truncation=True).input_ids.cuda()
# Run the model to infere an output
outputs = model.generate(input_ids=input_ids, max_new_tokens=512, top_p=0.9,temperature=0.1 , top_k=1, repetition_penalty=1.1)
# Print the result
print(f"{tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0][len(template):]}")
Framework versions
- Transformers 4.37.2
- optimum 1.16.2
- auto-gptq 0.6.0
Citation
@misc {bhavyaaiplanet,
author = { {Bhavya Bhola} },
title = { Quantized version of effi-7b by AI Planet},
year = 2024,
url = { https://huggingface.co/aiplanet/effi-7b-gptq },
publisher = { Hugging Face }
}
- Downloads last month
- 7