--- library_name: transformers tags: - custom_generate --- ## Description Implementation of [Decoding by Contrasting Layers (DoLa)](https://huggingface.co/papers/2309.03883), a contrastive decoding strategy for improving factuality and reducing hallucinations in language model outputs. DoLa works by **contrasting the logits** from the final layer with those from earlier layers of the model, amplifying factual knowledge localized in specific layers and suppressing spurious information. This can be useful for: * **Short-answer tasks** (e.g., TruthfulQA) — using higher layers (`dola_layers="high"`) * **Long-answer reasoning tasks** (e.g., GSM8K, StrategyQA, FACTOR, VicunaQA) — using lower layers (`dola_layers="low"`) DoLa is **not recommended for smaller models** such as GPT-2, as the improvement may be negligible. This implementation matches the `DoLa` functionality present in `transformers<4.53.0`. --- ## Base model * [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) --- ## Model compatibility * Decoder-only transformer models --- ## Additional Arguments * **`dola_layers`** (*str* or *List\[int]*, optional): Which earlier layers to contrast with the final layer. Can be: * `"low"` — lower half of layers (recommended for long answers) * `"high"` — upper half of layers (recommended for short answers) * List of integer indices (e.g., `[18, 20]`) **Note:** * Layer 0 is the word embedding; layer 1 is the first transformer block. * If the model has tied word embeddings, layer 0 is skipped and counting starts at layer 2. * Typical defaults: | # Layers | `"low"` range | `"high"` range | | -------- | ------------------- | ------------------- | | > 40 | `(0, 20, 2)` | `(N - 20, N, 2)` | | ≤ 40 | `range(0, N//2, 2)` | `range(N//2, N, 2)` | * **`repetition_penalty`** (*float*, optional, defaults to `None`): Helps reduce repetition. A value of `1.2` is recommended. --- ## Output Type changes * The `generate` method output remains the same as default `transformers` generation, but logits are post-processed using the DoLa contrastive scoring before token selection. --- ## Example usage ### Using higher layers (short-answer tasks) ```python # requires `transformers>=4.56.0`, previously, it was part of the library import torch from transformers import AutoModelForCausalLM, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B") model = AutoModelForCausalLM.from_pretrained( "Qwen/Qwen3-0.6B", torch_dtype=torch.float16 ).to("cuda") inputs = tokenizer("What is the highest peak in the world?", return_tensors="pt").to("cuda") outputs = model.generate( **inputs, max_new_tokens=50, do_sample=False, custom_generate="transformers-community/dola", trust_remote_code=True, dola_layers="high" ) print(tokenizer.batch_decode(outputs, skip_special_tokens=True)) ``` --- ### Contrasting specific layers ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B") model = AutoModelForCausalLM.from_pretrained( "Qwen/Qwen3-0.6B", torch_dtype=torch.float16 ).to("cuda") inputs = tokenizer("What is the highest peak in the world?", return_tensors="pt").to("cuda") outputs = model.generate( **inputs, max_new_tokens=50, do_sample=False, repetition_penalty=1.2, custom_generate="transformers-community/dola", trust_remote_code=True, dola_layers=[18, 20] ) # Only decode the newly generated tokens print(tokenizer.batch_decode(outputs[:, inputs.input_ids.shape[-1]:], skip_special_tokens=True)) ```