Soumyajit-7 commited on
Commit
27f0b35
·
verified ·
1 Parent(s): 0d1f03d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +212 -8
README.md CHANGED
@@ -4,20 +4,224 @@ tags:
4
  - text-generation-inference
5
  - transformers
6
  - unsloth
7
- - llama
8
  - trl
9
- - sft
 
 
 
 
10
  license: apache-2.0
11
  language:
12
  - en
13
  ---
 
14
 
15
- # Uploaded model
16
 
17
- - **Developed by:** Soumyajit-7
18
- - **License:** apache-2.0
19
- - **Finetuned from model :** unsloth/deepseek-r1-distill-llama-8b-unsloth-bnb-4bit
20
 
21
- This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
22
 
23
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  - text-generation-inference
5
  - transformers
6
  - unsloth
 
7
  - trl
8
+ - code
9
+ - math
10
+ - competetive
11
+ - cp
12
+ - deepseek
13
  license: apache-2.0
14
  language:
15
  - en
16
  ---
17
+ # DeepSeek R1 Code Reasoning 8B
18
 
19
+ ## Model Description
20
 
21
+ This model is a fine-tuned version of [unsloth/DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/unsloth/DeepSeek-R1-Distill-Llama-8B) specialized for advanced code reasoning tasks. It has been trained on challenging programming problems from the [nvidia/OpenCodeReasoning](https://huggingface.co/datasets/nvidia/OpenCodeReasoning) dataset, specifically focusing on problems with "VERY_HARD" difficulty levels (10 and 11).
 
 
22
 
23
+ ## Model Details
24
 
25
+ - **Base Model**: DeepSeek-R1-Distill-Llama-8B
26
+ - **Model Type**: Causal Language Model (Fine-tuned)
27
+ - **Architecture**: LLaMA-based transformer
28
+ - **Parameters**: ~8 billion
29
+ - **Training Data**: Filtered nvidia/OpenCodeReasoning dataset (VERY_HARD difficulty problems)
30
+ - **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
31
+ - **License**: Apache 2.0
32
+
33
+ ### Training Details
34
+
35
+ - **Training Framework**: Unsloth + Transformers
36
+ - **Fine-tuning Method**: LoRA with rank 16
37
+ - **Batch Size**: 2 per device with 4 gradient accumulation steps
38
+ - **Learning Rate**: 2e-4
39
+ - **Optimizer**: AdamW 8-bit
40
+ - **Precision**: Mixed precision (FP16/BF16)
41
+ - **Max Sequence Length**: 2048 tokens
42
+
43
+ ### Dataset
44
+
45
+ The model was trained on a carefully filtered subset of the nvidia/OpenCodeReasoning dataset:
46
+ - **Source**: nvidia/OpenCodeReasoning (split_0)
47
+ - **Filter Criteria**: Only problems with difficulty "VERY_HARD", 10, or 11
48
+ - **Columns Used**: input (problem), output (expected result), solution (reasoning)
49
+
50
+ ## Intended Use
51
+
52
+ This model is designed for:
53
+ - Advanced algorithmic problem solving
54
+ - Code generation with detailed reasoning
55
+ - Educational purposes for understanding complex programming concepts
56
+ - Research in automated code reasoning
57
+
58
+ ## Usage
59
+
60
+ ```python
61
+ from transformers import AutoTokenizer, AutoModelForCausalLM
62
+ import torch
63
+
64
+ # Load model and tokenizer
65
+ tokenizer = AutoTokenizer.from_pretrained("Soumyajit-7/code-reasoning-deepseek-8b")
66
+ model = AutoModelForCausalLM.from_pretrained(
67
+ "Soumyajit-7/code-reasoning-deepseek-8b",
68
+ torch_dtype=torch.float16,
69
+ device_map="auto"
70
+ )
71
+
72
+ # Define the prompt template
73
+ prompt_template = """Below is an instruction that describes a coding task, paired with an input that provides further context.
74
+ Write a response that appropriately completes the request.
75
+ Before answering, think carefully about the problem and create a step-by-step chain of thoughts to ensure a logical and accurate response.
76
+
77
+ ### Instruction:
78
+ You are a coding expert with advanced knowledge in programming, algorithms, and problem-solving.
79
+ Please solve the following coding problem with detailed reasoning.
80
+
81
+ ### Problem:
82
+ {problem}
83
+
84
+ ### Response:
85
+ <think>"""
86
+
87
+ # Example usage
88
+ problem = """
89
+ Problem description.
90
+ Vipul is a hardworking super-hero who maintains the bracket ratio of all the strings in the world. Recently he indulged himself in saving the string population so much that he lost his ability for checking brackets (luckily, not permanently ).Being his super-hero friend help him in his time of hardship.
91
+ Input
92
+
93
+ The first line of the input contains an integer T denoting the number of test cases. The description of T test cases follows.
94
+ The first line of each test case contains a single string S denoting the string to be checked.
95
+
96
+
97
+ Output
98
+
99
+ For each test case, output a single line printing "YES" or "NO" (without " " and in uppercase only) , denoting if the brackets in the given string is balanced or not .
100
+
101
+
102
+ Constraints
103
+
104
+ 1 ≤ T ≤ 10
105
+ 1 ≤ length of S ≤ 60
106
+
107
+
108
+ Example
109
+ Input:
110
+ 3
111
+ ((()))
112
+ (())()
113
+ ()(()
114
+
115
+ Output:
116
+ YES
117
+ YES
118
+ NO
119
+
120
+
121
+
122
+ Explanation
123
+ Example is self-explanatory.
124
+ """
125
+ prompt = prompt_template.format(problem=problem)
126
+
127
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
128
+ outputs = model.generate(
129
+ **inputs,
130
+ max_new_tokens=1200,
131
+ temperature=0.7,
132
+ do_sample=True,
133
+ pad_token_id=tokenizer.eos_token_id
134
+ )
135
+
136
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
137
+ print(response.split("### Response:")[1])
138
+ ```
139
+
140
+ ## Model Capabilities
141
+
142
+ The model excels at:
143
+ - **Algorithm Design**: Creating efficient algorithms for complex problems
144
+ - **Code Optimization**: Improving time and space complexity
145
+ - **Problem Analysis**: Breaking down complex problems into manageable steps
146
+ - **Mathematical Reasoning**: Solving problems requiring mathematical insights
147
+ - **Data Structure Implementation**: Designing and implementing advanced data structures
148
+
149
+ ## Prompt Format
150
+
151
+ The model expects prompts in the following format:
152
+
153
+ ```
154
+ ### Instruction:
155
+ You are a coding expert with advanced knowledge in programming, algorithms, and problem-solving.
156
+ Please solve the following coding problem with detailed reasoning.
157
+
158
+ ### Problem:
159
+ [Your coding problem here]
160
+
161
+ ### Response:
162
+ <think>
163
+ [The model will provide step-by-step reasoning here]
164
+ </think>
165
+ [Final solution/answer here]
166
+ ```
167
+
168
+ ## Performance
169
+
170
+ This model has been specifically trained on the most challenging programming problems and shows improved performance on:
171
+ - Advanced algorithmic challenges
172
+ - Complex data structure problems
173
+ - Mathematical programming tasks
174
+ - Optimization problems
175
+
176
+ ## Limitations
177
+
178
+ - The model is specialized for code reasoning and may not perform as well on general conversation
179
+ - Training was focused on very hard problems, so it might be over-engineered for simple tasks
180
+ - Like all language models, it may occasionally generate incorrect or suboptimal solutions
181
+ - The model should be used as a coding assistant, not a replacement for human review
182
+
183
+ ## Training Infrastructure
184
+
185
+ - **GPU**: NVIDIA A100/V100 (recommended)
186
+ - **Memory**: 16GB+ GPU memory required
187
+ - **Framework**: Unsloth for efficient training
188
+ - **Quantization**: Trained with 4-bit quantization for memory efficiency
189
+
190
+ ## Ethical Considerations
191
+
192
+ This model is designed for educational and research purposes. Users should:
193
+ - Verify generated code before using in production
194
+ - Understand the logic behind solutions rather than blindly copying
195
+ - Use responsibly for learning and problem-solving enhancement
196
+
197
+ ## Future Work
198
+
199
+ Potential improvements:
200
+ - Training on additional challenging datasets
201
+ - Multi-language code generation support
202
+ - Integration with code execution environments
203
+ - Fine-tuning on specific programming domains
204
+
205
+ ## Citation
206
+
207
+ If you use this model in your research, please cite:
208
+
209
+ ```bibtex
210
+ @misc{code-reasoning-deepseek-8b,
211
+ title={DeepSeek R1 Code Reasoning 8B},
212
+ author={Soumyajit},
213
+ year={2025},
214
+ howpublished={\url{https://huggingface.co/Soumyajit-7/code-reasoning-deepseek-8b}},
215
+ }
216
+ ```
217
+
218
+ ## Acknowledgments
219
+
220
+ - Based on DeepSeek-R1-Distill-Llama-8B
221
+ - Trained using Unsloth for efficient fine-tuning
222
+ - Dataset from NVIDIA's OpenCodeReasoning project
223
+ - Special thanks to the open-source community for making this possible
224
+
225
+ ---
226
+
227
+ *Model trained and maintained by Soumyajit-7. For questions or issues, please open an issue in the repository.*