Create README.md

bafd826 verified 5 months ago

1.76 kB

base_model: Heralax/Augmentoolkit-DataSpecialist-v0.1
library_name: vllm
tags:
  - awq
  - text-generation
  - llmcompressor
license: apache-2.0

Augmentoolkit-DataSpecialist-v0.1-AWQ

This is a 4-bit AWQ quantized version of Heralax/Augmentoolkit-DataSpecialist-v0.1.

Motivation

This quantization was created specifically to enable efficient serving of Heralax's Data Specialist model using the vLLM engine.

While the original model is excellent, running it at full precision can be resource-intensive. This AWQ version allows for high-throughput inference on consumer hardware (such as single or dual RTX 3090s) while maintaining the model's specialized capabilities.

Methodology

This model was converted using the LLM Compressor library (the currently recommended method for vLLM).

Quantization Scheme: W4A16 (4-bit weights, 16-bit activations)
Group Size: 128
Calibration Dataset: open_platypus

Credit

All credit for the training and architecture of the original model goes to Evan Armstrong (Heralax). This repository merely provides the quantized weights to facilitate broader adoption and easier deployment within the vLLM ecosystem.

How to Use with vLLM

from vllm import LLM, SamplingParams

# Load the model
llm = LLM(
    model="bbarn4/Augmentoolkit-DataSpecialist-v0.1-AWQ",
    quantization="awq"
)

# Generate
prompts = ["Hello, my name is"]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
outputs = llm.generate(prompts, sampling_params)

for output in outputs:
    print(f"Prompt: {output.prompt!r}, Generated text: {output.outputs[0].text!r}")