DeepSeek-R1-Dyck-Finetuned

Model Description

This model is a fine-tuned version of unsloth/DeepSeek-R1-Distill-Llama-8B specifically optimized for Dyck language bracket completion tasks. The model has been trained to complete incomplete Dyck bracket sequences by tracking the stack of open brackets and generating the appropriate closing brackets.

Key Features

  • Reasoning Capability: Generates step-by-step reasoning using <think> blocks before providing the final answer
  • Dyck Language Completion: Accurately completes bracket sequences for 8 different bracket types: (), [], {}, <>, ⟨⟩, ⟦⟧, ⦃⦄, ⦅⦆
  • LoRA Fine-tuning: Uses Low-Rank Adaptation (LoRA) for efficient training
  • High Accuracy: Trained on 60k diverse Dyck sequence examples

Training Details

Training Data

  • Dataset: 60k Dyck language sequences
  • Train/Val Split: 95%/5%
  • Format: Chat template with system/user/assistant messages
  • Reasoning: All samples include <think> reasoning blocks

Training Configuration

  • Base Model: unsloth/DeepSeek-R1-Distill-Llama-8B
  • LoRA Rank: 32 (attention layers only)
  • LoRA Alpha: 64
  • LoRA Dropout: 0.25
  • Learning Rate: 3e-6
  • Batch Size: 4 × 32 (effective batch: 128)
  • Epochs: 4
  • Warmup: 30% of total steps
  • Gradient Clipping: 0.05
  • Optimizer: AdamW
  • Scheduler: Linear

Training Hardware

  • GPU: 40GB GPU
  • Precision: Full (bfloat16)
  • Training Time: ~6-8 hours

Usage

Installation

pip install unsloth transformers

Loading the Model

from unsloth import FastLanguageModel

# Load LoRA adapters
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="akashdutta1030/dddd",
    max_seq_length=2048,
    dtype=None,
    load_in_4bit=False,  # Use True for 4-bit quantization
)

FastLanguageModel.for_inference(model)

Inference Example

messages = [
    {
        "role": "system",
        "content": "You are a logic engine. Complete the Dyck bracket sequence by tracking the stack of open brackets."
    },
    {
        "role": "user",
        "content": "([{<"
    }
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([prompt], return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.1,
        top_p=0.95,
        do_sample=True,
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=False)
print(response)

Expected Output Format

The model generates responses in the following format:

<think>
1. Input sequence: (, [, {, <
2. Maintain a stack of opening brackets:
   - Push '(' -> Stack: ['(']
   - Push '[' -> Stack: ['(', '[']
   - Push '{' -> Stack: ['(', '[', '{']
   - Push '<' -> Stack: ['(', '[', '{', '<']
3. To close the sequence, pop from the stack in reverse order:
   - Pop '<' -> Closing: '>'
   - Pop '{' -> Closing: '}'
   - Pop '[' -> Closing: ']'
   - Pop '(' -> Closing: ')'
4. Appending closing brackets to input: ([{<>}])
</think>
([{<>}])

Model Architecture

  • Base Architecture: Llama-based (DeepSeek-R1)
  • Parameters: 8B base model
  • LoRA Parameters: ~167M trainable parameters (1.8% of base model)
  • Target Modules: Attention layers only (q_proj, k_proj, v_proj, o_proj)

Performance

The model has been trained and validated on:

  • Training Loss: Decreasing smoothly
  • Validation Loss: Monitored every 200 steps
  • Gradient Stability: Controlled with strict clipping (0.05)
  • Reasoning Quality: Generates detailed step-by-step reasoning

Limitations

  • The model is specifically trained for Dyck language bracket completion
  • Performance may vary on sequences with very deep nesting (>20 levels)
  • Requires proper formatting with chat template for best results

Citation

If you use this model, please cite:

@misc{deepseek-r1-dyck-finetuned,
  title={DeepSeek-R1-Dyck-Finetuned: Bracket Completion Model},
  author={Fine-tuned on DeepSeek-R1-Distill-Llama-8B},
  year={2024},
  howpublished={\url{https://huggingface.co/akashdutta1030/dddd}}
}

License

This model is licensed under the Apache 2.0 license.

Acknowledgments

  • Base model: DeepSeek-R1-Distill-Llama-8B
  • Training framework: Unsloth
  • Fine-tuning approach: LoRA (Low-Rank Adaptation)
Downloads last month
13
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for akashdutta1030/dddd