nielsr's picture
nielsr HF Staff
Add improved model card with usage example
1db7651 verified
|
raw
history blame
2.93 kB
metadata
pipeline_tag: text-generation
library_name: transformers
license: apache-2.0

PosS: Position Specialist Generates Better Draft for Speculative Decoding

This repository contains the PosS-2 model described in the paper POSS: Position Specialist Generates Better Draft for Speculative Decoding.

PosS proposes several Position Specialists, which are responsible for drafting certain positions. They are trained to generate high-quality draft tokens with certain previous deviated features as inputs. During inference time, these Positions Specialists mitigate feature deviations and make accurate predictions even at large positions.

PosS achieves higher position-wise acceptance rate (acceptance rate at a position given its previous positions are accepted) than previous methods:

PosS Weights

We also provide our trained parameters in Hugging Face:

Simplified Inference Example

This example uses the transformers library. Make sure to install it first (pip install transformers).

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "HINT-lab/PosS2-Llama3-8B-Instruct" # Or choose another PosS model
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_id)

prompt = "The capital of France is"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(inputs["input_ids"], max_new_tokens=10) # Adjust max_new_tokens as needed
generated_text = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
print(generated_text)

Code: https://github.com/shrango/PosS