DeepSeek-R1-Dyck-Finetuned

Model Description

This model is a fine-tuned version of unsloth/DeepSeek-R1-Distill-Llama-8B specifically optimized for Dyck language bracket completion tasks. The model has been trained to complete incomplete Dyck bracket sequences by tracking the stack of open brackets and generating the appropriate closing brackets.

Key Features

Reasoning Capability: Generates step-by-step reasoning using <think> blocks before providing the final answer
Dyck Language Completion: Accurately completes bracket sequences for 8 different bracket types: (), [], {}, <>, ⟨⟩, ⟦⟧, ⦃⦄, ⦅⦆
LoRA Fine-tuning: Uses Low-Rank Adaptation (LoRA) for efficient training
High Accuracy: Trained on 60k diverse Dyck sequence examples

Training Details

Training Data

Dataset: 60k Dyck language sequences
Train/Val Split: 95%/5%
Format: Chat template with system/user/assistant messages
Reasoning: All samples include <think> reasoning blocks

Training Configuration

Base Model: unsloth/DeepSeek-R1-Distill-Llama-8B
LoRA Rank: 32 (attention layers only)
LoRA Alpha: 64
LoRA Dropout: 0.25
Learning Rate: 3e-6
Batch Size: 4 × 32 (effective batch: 128)
Epochs: 4
Warmup: 30% of total steps
Gradient Clipping: 0.05
Optimizer: AdamW
Scheduler: Linear

Training Hardware

GPU: 40GB GPU
Precision: Full (bfloat16)
Training Time: ~6-8 hours

Usage

Installation

pip install unsloth transformers

Loading the Model

from unsloth import FastLanguageModel

# Load LoRA adapters
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="akashdutta1030/dddd",
    max_seq_length=2048,
    dtype=None,
    load_in_4bit=False,  # Use True for 4-bit quantization
)

FastLanguageModel.for_inference(model)

Inference Example

messages = [
    {
        "role": "system",
        "content": "You are a logic engine. Complete the Dyck bracket sequence by tracking the stack of open brackets."
    },
    {
        "role": "user",
        "content": "([{<"
    }
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([prompt], return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.1,
        top_p=0.95,
        do_sample=True,
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=False)
print(response)

Expected Output Format

The model generates responses in the following format:

<think>
1. Input sequence: (, [, {, <
2. Maintain a stack of opening brackets:
   - Push '(' -> Stack: ['(']
   - Push '[' -> Stack: ['(', '[']
   - Push '{' -> Stack: ['(', '[', '{']
   - Push '<' -> Stack: ['(', '[', '{', '<']
3. To close the sequence, pop from the stack in reverse order:
   - Pop '<' -> Closing: '>'
   - Pop '{' -> Closing: '}'
   - Pop '[' -> Closing: ']'
   - Pop '(' -> Closing: ')'
4. Appending closing brackets to input: ([{<>}])
</think>
([{<>}])

Model Architecture

Base Architecture: Llama-based (DeepSeek-R1)
Parameters: 8B base model
LoRA Parameters: ~167M trainable parameters (1.8% of base model)
Target Modules: Attention layers only (q_proj, k_proj, v_proj, o_proj)

Performance

The model has been trained and validated on:

Training Loss: Decreasing smoothly
Validation Loss: Monitored every 200 steps
Gradient Stability: Controlled with strict clipping (0.05)
Reasoning Quality: Generates detailed step-by-step reasoning

Limitations

The model is specifically trained for Dyck language bracket completion
Performance may vary on sequences with very deep nesting (>20 levels)
Requires proper formatting with chat template for best results

Citation

If you use this model, please cite:

@misc{deepseek-r1-dyck-finetuned,
  title={DeepSeek-R1-Dyck-Finetuned: Bracket Completion Model},
  author={Fine-tuned on DeepSeek-R1-Distill-Llama-8B},
  year={2024},
  howpublished={\url{https://huggingface.co/akashdutta1030/dddd}}
}

License

This model is licensed under the Apache 2.0 license.

Acknowledgments

Base model: DeepSeek-R1-Distill-Llama-8B
Training framework: Unsloth
Fine-tuning approach: LoRA (Low-Rank Adaptation)

Downloads last month: 5

Safetensors

Model size

8B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for akashdutta1030/dddd

Base model

deepseek-ai/DeepSeek-R1-Distill-Llama-8B

Finetuned

unsloth/DeepSeek-R1-Distill-Llama-8B

Adapter

(6)

this model