Phi2-GPTQ / README.md

STiFLeR7

Update README.md

0056b5d verified about 1 year ago

preview code

raw

history blame contribute delete

1.47 kB

metadata

license: apache-2.0
tags:
  - gptq
  - quantized
  - causal-lm
  - transformers
  - pytorch
  - phi-2
  - text-generation
library_name: transformers
pipeline_tag: text-generation
base_model: microsoft/phi-2
inference: true

🧠 Phi-2 GPTQ (Quantized)

This repository provides a 4-bit GPTQ quantized version of the Phi-2 model by Microsoft, optimized for efficient inference using gptqmodel.

📌 Model Details

Base Model: Microsoft Phi-2
Quantization: GPTQ (4-bit)
Quantizer: GPTQModel
Framework: PyTorch + HuggingFace Transformers
Device Support: CUDA (GPU)
License: Apache 2.0

🚀 Features

✅ Lightweight: 4-bit quantization significantly reduces memory usage
✅ Fast Inference: Ideal for deployment on consumer GPUs
✅ Compatible: Works with transformers, optimum, and gptqmodel
✅ CUDA-accelerated: Automatically uses GPU for speed

📚 Usage

This model is ready-to-use with the Hugging Face transformers library.

🧪 Intended Use

Research and development
Prototyping generative applications
Fast inference environments with limited GPU memory

📖 References

Microsoft Phi-2: https://huggingface.co/microsoft/phi-2
GPTQModel: https://github.com/ModelCoud/GPTQModel
Transformers: https://github.com/huggingface/transformers

⚖️ License

This model is distributed under the Apache License 2.0.