Upload folder using huggingface_hub

ba25603 verified 2 days ago

4.83 kB

	---
	license: apache-2.0
	base_model: Qwen/Qwen3-0.6B
	tags:
	- SAT
	- combinatorial-optimization
	- classification
	- cube-and-conquer
	- data-augmentation
	language:
	- en
	pipeline_tag: text-classification
	---

	# Qwen3-0.6B-SAT-VarSelector-Sym-Aug

	A Qwen3-0.6B model fine-tuned for SAT branching variable selection using symmetry-based data augmentation.

	## Model Description

	This model predicts which variable to branch/cube on next, given a SAT CNF formula state. It was trained with 5x augmented data using CNF symmetry transformations, resulting in significantly improved generalization.

	### Architecture

	- Base: `Qwen/Qwen3-0.6B` (causal language model)
	- Head: LayerNorm → Linear(hidden_size, 601)
	- Max Variables: 600
	- Pooling: Last non-pad token hidden state
	- Masking: Invalid variables (not in CNF) are masked to -10000 before softmax
	- Size: ~1.2GB (bfloat16)

	### Training with Symmetry Augmentation

	This model was trained with 5x data augmentation using semantically-safe CNF transformations:

	\| Augmentation \| Description \| Effect \|
	\|-------------\|-------------\|--------\|
	\| Variable Permutation \| Bijective remapping of variable IDs \| Prevents memorizing specific variable numbers \|
	\| Clause Shuffling \| Random reordering of clauses \| Teaches position-independence \|
	\| Literal Reordering \| Shuffle literals within clauses \| Token-level variation \|
	\| Polarity Flipping \| Flip signs of random variable subset \| Teaches structural vs. polarity features \|

	### Training Details

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Original training samples \| 8,110 \|
	\| Augmented training samples \| 40,550 (5x) \|
	\| Validation samples \| 902 (unaugmented) \|
	\| Epochs \| 3 \|
	\| Hardware \| 8×H100 GPUs \|
	\| Training framework \| DeepSpeed ZeRO-3 \|
	\| Peak learning rate \| 5e-6 \|
	\| Best checkpoint \| Step 1800 (epoch 2.84) \|

	### Performance Comparison

	\| Model \| Training Data \| Top-1 Accuracy \| Top-5 Accuracy \|
	\|-------\|--------------\|----------------\|----------------\|
	\| Qwen3-0.6B (baseline) \| 8,110 samples \| ~12% \| ~32% \|
	\| Qwen3-0.6B (augmented) \| 40,550 samples \| ~19% \| ~42% \|
	\| Improvement \| +5x data \| +7pp \| +10pp \|

	### Key Insight: Why Validation Loss < Training Loss

	During augmented training, you'll observe validation loss consistently lower than training loss. This is expected and indicates the augmentation is working:

	1. Training data is harder — augmented CNFs with permuted variables, shuffled clauses
	2. Validation data is clean — original CNFs without transformations
	3. Model generalizes well — learned structural patterns, not memorized examples

	## Usage

	```python
	import torch
	from transformers import AutoTokenizer
	from sft_qwen_var_classifier import QwenVarClassifier, cnf_valid_mask

	# Load model
	model = QwenVarClassifier("Qwen/Qwen3-0.6B", max_vars=600)
	state_dict = torch.load("pytorch_model.bin", map_location="cpu")
	model.load_state_dict(state_dict, strict=False)
	model = model.to("cuda", dtype=torch.bfloat16)
	model.eval()

	# Load tokenizer
	tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B")

	# Prepare CNF input
	cnf_text = """p cnf 100 250
	1 -2 3 0
	-1 2 -4 0
	...
	"""

	# Tokenize
	inputs = tokenizer(cnf_text, return_tensors="pt", truncation=True, max_length=8192)
	inputs = {k: v.to("cuda") for k, v in inputs.items()}

	# Get valid variable mask
	valid_mask = torch.tensor([cnf_valid_mask(cnf_text, max_vars=600)], dtype=torch.bool, device="cuda")

	# Predict
	with torch.no_grad():
	outputs = model(**inputs)
	logits = outputs["logits"]
	logits = logits.masked_fill(~valid_mask, -1e4)
	predicted_var = logits.argmax(dim=-1).item()

	print(f"Predicted branching variable: {predicted_var}")
	```

	## Files

	- `pytorch_model.bin` - Model weights (~1.2GB, bfloat16)
	- `sft_qwen_var_classifier.py` - Model class definition (required for loading)

	## When to Use This Model

	- Better generalization than non-augmented version
	- Production/deployment with improved accuracy
	- When training data is limited — augmentation effectively multiplies your data

	## Augmentation Code

	The augmentation script is available at:
	```
	Yale-ROSE/Transformer-SAT/new_transformer/augment_sft_dataset.py
	```

	Usage:
	```bash
	python augment_sft_dataset.py input.jsonl output.jsonl --multiplier 5
	```

	## Limitations

	- Maximum 600 variables
	- Maximum 8192 tokens for CNF input
	- Trained on specific CNF distribution

	## Related Models

	- [Qwen3-0.6B-SAT-VarSelector](https://huggingface.co/Yale-ROSE/Qwen3-0.6B-SAT-VarSelector) - Non-augmented baseline
	- [Qwen3-4B-SAT-VarSelector](https://huggingface.co/Yale-ROSE/Qwen3-4B-SAT-VarSelector) - Higher accuracy, larger model

	## Citation

	If you use this model, please cite the Transformer-CnC paper.

	## License

	Apache 2.0