Qwen3 Backward LoRA for Self-Alignment with Instruction Backtranslation
This repository contains a LoRA-finetuned backward model based on Qwen/Qwen3-1.7B for an assignment implementation of the paper Self-Alignment with Instruction Backtranslation.
The backward model is trained to predict the most likely user instruction given an assistant response, i.e. it models:
p(x | y)
where:
- x = instruction
- y = response
Model Details
Model Description
This model is a LoRA adapter trained on top of Qwen/Qwen3-1.7B.
Its purpose is reverse instruction generation: given an assistant response, it generates a plausible user instruction that could have led to that response.
This repository corresponds to Step 1 of the assignment pipeline:
- Train a backward model on OpenAssistant-Guanaco seed pairs.
- Use the backward model to generate synthetic instructions from LIMA responses.
- Curate synthetic pairs with prompt-based quality scoring.
- Train a forward instruction-following model on the curated dataset.
- Developed by: Hengming
- Funded by [optional]: Course assignment / academic use
- Shared by [optional]: Hengming
- Model type: Causal language model with LoRA adapters
- Language(s) (NLP): English
- License: Apache-2.0 for this repository; users should also follow the terms of the base model and datasets
- Finetuned from model [optional]: Qwen/Qwen3-1.7B
Model Sources
- Repository: https://huggingface.co/Hengming0805/qwen3-backward-lora-assignment3
- Paper: Self-Alignment with Instruction Backtranslation (arXiv:2308.06259)
- Base model: https://huggingface.co/Qwen/Qwen3-1.7B
- Seed dataset: https://huggingface.co/datasets/timdettmers/openassistant-guanaco
Uses
Direct Use
This model is intended for:
- reverse instruction generation
- generating candidate instructions from assistant responses
- self-augmentation in instruction-tuning pipelines
- assignment-scale experiments on synthetic data generation
Downstream Use
This model can be used to construct synthetic instruction-response pairs:
- Start from a response.
- Use the backward model to generate a likely instruction.
- Pair the generated instruction with the original response.
- Filter these pairs before training a forward model.
Out-of-Scope Use
This model is not intended for:
- direct deployment as a general-purpose chatbot
- high-stakes decision making
- legal, medical, or financial advice
- production systems requiring high factual reliability
- multilingual evaluation without further validation
Bias, Risks, and Limitations
This model has several limitations:
- It is trained on a relatively small assignment-scale subset rather than a full large-scale setup.
- Generated instructions are synthetic and may not match the true original prompt.
- It may produce vague, overly long, or unnatural instructions.
- It may inherit artifacts and biases from the base model and seed data.
- Output quality varies depending on the specificity and style of the response.
Recommendations
Users should:
- treat outputs as synthetic candidate instructions rather than ground truth
- manually inspect generated samples before downstream use
- apply filtering / curation before training a forward model
- avoid using this model in high-risk settings
How to Get Started with the Model
Use the code below to load the base model and this LoRA adapter.
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
base_model_id = "Qwen/Qwen3-1.7B"
adapter_id = "Hengming0805/qwen3-backward-lora-assignment3"
tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True)
base_model = AutoModelForCausalLM.from_pretrained(
base_model_id,
trust_remote_code=True,
device_map="auto"
)
model = PeftModel.from_pretrained(base_model, adapter_id)
prompt = """You are a reverse instruction generator.
Given an assistant response, generate the most likely user instruction.
### Response:
Paris is the capital of France.
### Instruction:
"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=64,
do_sample=True,
temperature=0.7,
top_p=0.9
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))