Upload folder using huggingface_hub

Browse files

Files changed (10) hide show

.gitattributes +1 -0
README.md +217 -3
added_tokens.json +28 -0
chat_template.jinja +89 -0
merges.txt +0 -0
model.pt +3 -0
special_tokens_map.json +31 -0
tokenizer.json +3 -0
tokenizer_config.json +239 -0
vocab.json +0 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,3 +1,217 @@
----
-license: mit
----

+---
+license: mit
+language:
+- en
+tags:
+- sat-solver
+- variable-selection
+- dpo
+- direct-preference-optimization
+- qwen3
+- combinatorial-optimization
+- neural-symbolic
+base_model: Qwen/Qwen3-4B
+datasets:
+- Yale-ROSE/SAT-CNF-Rewards
+pipeline_tag: text-classification
+---
+# Qwen3-4B-SAT-VarSelector-Sym-Aug-DPO
+A neural SAT variable selector fine-tuned with **Direct Preference Optimization (DPO)** on cube-score-aligned preferences. This model predicts which variable to branch on next when solving SAT problems, optimized for **actual solving performance** rather than imitation of expert choices.
+## Model Description
+This model is a `QwenVarClassifier` built on Qwen3-4B backbone, trained in two stages:
+1. **Stage 1 - SFT**: Supervised fine-tuning on expert variable selections from MiniZinc/CaDiCaL with symmetry augmentation
+2. **Stage 2 - DPO**: Direct Preference Optimization using cube-score-aligned preference pairs
+### Key Innovation: Cube-Score Alignment
+Traditional supervised learning imitates expert choices, but experts don't always choose optimally. DPO training aligns the model with **cube scores** - the actual reward signal indicating how much a variable choice helps the SAT solver make progress.
+## Performance
+| Model | Avg Reward (Cube Score) | Reward Gap | Top-1 Accuracy |
+|-------|------------------------|------------|----------------|
+| **DPO-7500** (this model) | **1.489** | **0.239** | 18.51% |
+| SFT Baseline | 1.368 | 0.218 | 25.06% |
+| GRPO-5x | 1.418 | 0.221 | 16.51% |
+**Key Results:**
+- **+8.8% improvement** in average cube score over SFT baseline
+- DPO trades accuracy (matching expert labels) for better actual solving performance
+- When DPO disagrees with expert, it often picks variables with higher cube scores
+### Why Accuracy Drops But Performance Improves
+Top-1 accuracy measures agreement with expert labels. But experts don't always choose the variable with the highest cube score. DPO learns to optimize for cube scores directly, so it may disagree with experts when better options exist - this is **desirable behavior**.
+## Architecture
+```python
+class QwenVarClassifier(nn.Module):
+    """
+    Input: CNF formula in DIMACS format
+    Output: Logits for each variable (1 to max_vars)
+    """
+    def __init__(self, model_name="Qwen/Qwen3-4B", max_vars=600):
+        self.qwen = AutoModelForCausalLM.from_pretrained(model_name)
+        self.head = nn.Sequential(
+            nn.LayerNorm(hidden_size),
+            nn.Linear(hidden_size, max_vars + 1)
+        )
+    def forward(self, input_ids, attention_mask):
+        # Get last token representation
+        outputs = self.qwen(input_ids, attention_mask, output_hidden_states=True)
+        last_hidden = outputs.hidden_states[-1]  # (batch, seq, hidden)
+        # Pool using last token
+        seq_lengths = attention_mask.sum(dim=1) - 1
+        last_token_hidden = last_hidden[range(batch_size), seq_lengths]
+        # Classify
+        logits = self.head(last_token_hidden)
+        return logits
+```
+## Usage
+### Loading the Model
+```python
+import torch
+from transformers import AutoTokenizer
+# Load tokenizer
+tokenizer = AutoTokenizer.from_pretrained("Yale-ROSE/Qwen3-4B-SAT-VarSelector-Sym-Aug-DPO")
+# Load model
+checkpoint = torch.load("model.pt", map_location="cuda")
+# Initialize model architecture
+from sft_qwen_var_classifier import QwenVarClassifier
+model = QwenVarClassifier("Qwen/Qwen3-4B", max_vars=600)
+model.load_state_dict(checkpoint)
+model.eval()
+```
+### Making Predictions
+```python
+def predict_variable(cnf_text: str, model, tokenizer, device="cuda"):
+    """Predict the best variable to branch on."""
+    # Tokenize
+    inputs = tokenizer(cnf_text, return_tensors="pt", max_length=8192, truncation=True)
+    inputs = {k: v.to(device) for k, v in inputs.items()}
+    # Get valid variable mask from CNF
+    valid_mask = cnf_valid_mask(cnf_text, max_vars=600)
+    valid_mask = torch.tensor(valid_mask, device=device)
+    # Predict
+    with torch.no_grad():
+        logits = model(**inputs)
+        # Mask invalid variables
+        logits = logits.masked_fill(~valid_mask.bool(), float('-inf'))
+        pred_var = logits.argmax(dim=-1).item()
+    return pred_var
+# Example CNF (DIMACS format)
+cnf = """p cnf 5 3
+1 -2 3 0
+-1 2 0
+2 -3 0
+"""
+pred = predict_variable(cnf, model, tokenizer)
+print(f"Predicted variable: {pred}")
+```
+### Input Format
+The model expects CNF formulas in DIMACS format:
+```
+p cnf <num_vars> <num_clauses>
+<literal1> <literal2> ... 0
+<literal1> <literal2> ... 0
+...
+```
+## Training Details
+### DPO Training Configuration
+```yaml
+# Training hyperparameters
+beta: 0.1                    # DPO temperature
+learning_rate: 1e-6          # Lower LR for DPO
+epochs: 3
+batch_size: 1
+gradient_accumulation: 16
+max_length: 8192
+# Data
+train_pairs: 48,982          # Filtered (margin ≥ 0.3)
+valid_pairs: 5,393
+```
+### DPO Loss Function
+```python
+def dpo_loss(policy_chosen, policy_rejected, ref_chosen, ref_rejected, beta=0.1):
+    """DPO loss for classifier architecture."""
+    policy_diff = policy_chosen - policy_rejected
+    ref_diff = ref_chosen - ref_rejected
+    logits = beta * (policy_diff - ref_diff)
+    return -F.logsigmoid(logits).mean()
+```
+### Training Progress
+The model was trained for 9,000 steps. Checkpoint-7500 achieved the best average reward on validation.
+| Checkpoint | Avg Reward | Reward Gap |
+|------------|-----------|------------|
+| 1000 | 1.335 | 0.192 |
+| 3500 | 1.427 | 0.226 |
+| 5000 | 1.460 | 0.235 |
+| **7500** | **1.489** | **0.239** |
+| 9000 | 1.461 | 0.234 |
+## Training Data
+- **SFT Data**: `Yale-ROSE/SAT-CNF-Rewards` (sym_augmented split)
+- **DPO Preferences**: `Yale-ROSE/SAT-CNF-Rewards` (dpo/ folder)
+  - Generated using cube scores from CaDiCaL solver
+  - Filtered for margin ≥ 0.3
+## Related Models
+- **SFT Base**: [Yale-ROSE/Qwen3-4B-SAT-VarSelector-Sym-Aug](https://huggingface.co/Yale-ROSE/Qwen3-4B-SAT-VarSelector-Sym-Aug)
+- **GRPO Variant**: [Yale-ROSE/Qwen3-4B-SAT-VarSelector-GRPO-5x](https://huggingface.co/Yale-ROSE/Qwen3-4B-SAT-VarSelector-GRPO-5x)
+## Limitations
+- Optimized for SAT problems with up to 600 variables
+- Input length limited to 8192 tokens
+- Cube scores computed with 100ms solver timeout
+## Citation
+```bibtex
+@misc{sat-var-dpo-2026,
+  title={Neural SAT Variable Selection with Direct Preference Optimization},
+  author={Yale ROSE Lab},
+  year={2026},
+  howpublished={HuggingFace Models},
+  url={https://huggingface.co/Yale-ROSE/Qwen3-4B-SAT-VarSelector-Sym-Aug-DPO}
+}
+```
+## License
+MIT License

added_tokens.json ADDED Viewed

	@@ -0,0 +1,28 @@

+{
+  "</think>": 151668,
+  "</tool_call>": 151658,
+  "</tool_response>": 151666,
+  "<think>": 151667,
+  "<tool_call>": 151657,
+  "<tool_response>": 151665,
+  "<|box_end|>": 151649,
+  "<|box_start|>": 151648,
+  "<|endoftext|>": 151643,
+  "<|file_sep|>": 151664,
+  "<|fim_middle|>": 151660,
+  "<|fim_pad|>": 151662,
+  "<|fim_prefix|>": 151659,
+  "<|fim_suffix|>": 151661,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|image_pad|>": 151655,
+  "<|object_ref_end|>": 151647,
+  "<|object_ref_start|>": 151646,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|repo_name|>": 151663,
+  "<|video_pad|>": 151656,
+  "<|vision_end|>": 151653,
+  "<|vision_pad|>": 151654,
+  "<|vision_start|>": 151652
+}

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,89 @@

+{%- if tools %}
+    {{- '<|im_start|>system\n' }}
+    {%- if messages[0].role == 'system' %}
+        {{- messages[0].content + '\n\n' }}
+    {%- endif %}
+    {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
+    {%- for tool in tools %}
+        {{- "\n" }}
+        {{- tool | tojson }}
+    {%- endfor %}
+    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
+{%- else %}
+    {%- if messages[0].role == 'system' %}
+        {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
+    {%- endif %}
+{%- endif %}
+{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
+{%- for message in messages[::-1] %}
+    {%- set index = (messages|length - 1) - loop.index0 %}
+    {%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
+        {%- set ns.multi_step_tool = false %}
+        {%- set ns.last_query_index = index %}
+    {%- endif %}
+{%- endfor %}
+{%- for message in messages %}
+    {%- if message.content is string %}
+        {%- set content = message.content %}
+    {%- else %}
+        {%- set content = '' %}
+    {%- endif %}
+    {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
+        {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
+    {%- elif message.role == "assistant" %}
+        {%- set reasoning_content = '' %}
+        {%- if message.reasoning_content is string %}
+            {%- set reasoning_content = message.reasoning_content %}
+        {%- else %}
+            {%- if '</think>' in content %}
+                {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
+                {%- set content = content.split('</think>')[-1].lstrip('\n') %}
+            {%- endif %}
+        {%- endif %}
+        {%- if loop.index0 > ns.last_query_index %}
+            {%- if loop.last or (not loop.last and reasoning_content) %}
+                {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
+            {%- else %}
+                {{- '<|im_start|>' + message.role + '\n' + content }}
+            {%- endif %}
+        {%- else %}
+            {{- '<|im_start|>' + message.role + '\n' + content }}
+        {%- endif %}
+        {%- if message.tool_calls %}
+            {%- for tool_call in message.tool_calls %}
+                {%- if (loop.first and content) or (not loop.first) %}
+                    {{- '\n' }}
+                {%- endif %}
+                {%- if tool_call.function %}
+                    {%- set tool_call = tool_call.function %}
+                {%- endif %}
+                {{- '<tool_call>\n{"name": "' }}
+                {{- tool_call.name }}
+                {{- '", "arguments": ' }}
+                {%- if tool_call.arguments is string %}
+                    {{- tool_call.arguments }}
+                {%- else %}
+                    {{- tool_call.arguments | tojson }}
+                {%- endif %}
+                {{- '}\n</tool_call>' }}
+            {%- endfor %}
+        {%- endif %}
+        {{- '<|im_end|>\n' }}
+    {%- elif message.role == "tool" %}
+        {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
+            {{- '<|im_start|>user' }}
+        {%- endif %}
+        {{- '\n<tool_response>\n' }}
+        {{- content }}
+        {{- '\n</tool_response>' }}
+        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
+            {{- '<|im_end|>\n' }}
+        {%- endif %}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '<|im_start|>assistant\n' }}
+    {%- if enable_thinking is defined and enable_thinking is false %}
+        {{- '<think>\n\n</think>\n\n' }}
+    {%- endif %}
+{%- endif %}

merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

model.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1603341ae56ce2b3d81e8d0e2a4a0064e36706eeed49ddc66bcddfe68f2e0cfa
+size 8048184767

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:aeb13307a71acd8fe81861d94ad54ab689df773318809eed3cbe794b4492dae4
+size 11422654

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,239 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151658": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151659": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151660": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151661": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151662": {
+      "content": "<|fim_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151663": {
+      "content": "<|repo_name|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151664": {
+      "content": "<|file_sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151665": {
+      "content": "<tool_response>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151666": {
+      "content": "</tool_response>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151667": {
+      "content": "<think>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151668": {
+      "content": "</think>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "bos_token": null,
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "extra_special_tokens": {},
+  "model_max_length": 131072,
+  "pad_token": "<|endoftext|>",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff