bbarn4's picture
Create README.md
bafd826 verified
metadata
base_model: Heralax/Augmentoolkit-DataSpecialist-v0.1
library_name: vllm
tags:
  - awq
  - text-generation
  - llmcompressor
license: apache-2.0

Augmentoolkit-DataSpecialist-v0.1-AWQ

This is a 4-bit AWQ quantized version of Heralax/Augmentoolkit-DataSpecialist-v0.1.

Motivation

This quantization was created specifically to enable efficient serving of Heralax's Data Specialist model using the vLLM engine.

While the original model is excellent, running it at full precision can be resource-intensive. This AWQ version allows for high-throughput inference on consumer hardware (such as single or dual RTX 3090s) while maintaining the model's specialized capabilities.

Methodology

This model was converted using the LLM Compressor library (the currently recommended method for vLLM).

  • Quantization Scheme: W4A16 (4-bit weights, 16-bit activations)
  • Group Size: 128
  • Calibration Dataset: open_platypus

Credit

All credit for the training and architecture of the original model goes to Evan Armstrong (Heralax). This repository merely provides the quantized weights to facilitate broader adoption and easier deployment within the vLLM ecosystem.

How to Use with vLLM

from vllm import LLM, SamplingParams

# Load the model
llm = LLM(
    model="bbarn4/Augmentoolkit-DataSpecialist-v0.1-AWQ",
    quantization="awq"
)

# Generate
prompts = ["Hello, my name is"]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
outputs = llm.generate(prompts, sampling_params)

for output in outputs:
    print(f"Prompt: {output.prompt!r}, Generated text: {output.outputs[0].text!r}")