Phi2-GPTQ / README.md
STiFLeR7's picture
Update README.md
0056b5d verified
metadata
license: apache-2.0
tags:
  - gptq
  - quantized
  - causal-lm
  - transformers
  - pytorch
  - phi-2
  - text-generation
library_name: transformers
pipeline_tag: text-generation
base_model: microsoft/phi-2
inference: true

🧠 Phi-2 GPTQ (Quantized)

This repository provides a 4-bit GPTQ quantized version of the Phi-2 model by Microsoft, optimized for efficient inference using gptqmodel.

πŸ“Œ Model Details

  • Base Model: Microsoft Phi-2
  • Quantization: GPTQ (4-bit)
  • Quantizer: GPTQModel
  • Framework: PyTorch + HuggingFace Transformers
  • Device Support: CUDA (GPU)
  • License: Apache 2.0

πŸš€ Features

  • βœ… Lightweight: 4-bit quantization significantly reduces memory usage
  • βœ… Fast Inference: Ideal for deployment on consumer GPUs
  • βœ… Compatible: Works with transformers, optimum, and gptqmodel
  • βœ… CUDA-accelerated: Automatically uses GPU for speed

πŸ“š Usage

This model is ready-to-use with the Hugging Face transformers library.

πŸ§ͺ Intended Use

  • Research and development
  • Prototyping generative applications
  • Fast inference environments with limited GPU memory

πŸ“– References

βš–οΈ License

This model is distributed under the Apache License 2.0.