Update README.md

Browse files

Files changed (1) hide show

README.md +212 -8

README.md CHANGED Viewed

@@ -4,20 +4,224 @@ tags:
 - text-generation-inference
 - transformers
 - unsloth
-- llama
 - trl
-- sft
 license: apache-2.0
 language:
 - en
 ---
-# Uploaded  model
-- **Developed by:** Soumyajit-7
-- **License:** apache-2.0
-- **Finetuned from model :** unsloth/deepseek-r1-distill-llama-8b-unsloth-bnb-4bit
-This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
-[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 - text-generation-inference
 - transformers
 - unsloth
 - trl
+- code
+- math
+- competetive
+- cp
+- deepseek
 license: apache-2.0
 language:
 - en
 ---
+# DeepSeek R1 Code Reasoning 8B
+## Model Description
+This model is a fine-tuned version of [unsloth/DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/unsloth/DeepSeek-R1-Distill-Llama-8B) specialized for advanced code reasoning tasks. It has been trained on challenging programming problems from the [nvidia/OpenCodeReasoning](https://huggingface.co/datasets/nvidia/OpenCodeReasoning) dataset, specifically focusing on problems with "VERY_HARD" difficulty levels (10 and 11).
+## Model Details
+- **Base Model**: DeepSeek-R1-Distill-Llama-8B
+- **Model Type**: Causal Language Model (Fine-tuned)
+- **Architecture**: LLaMA-based transformer
+- **Parameters**: ~8 billion
+- **Training Data**: Filtered nvidia/OpenCodeReasoning dataset (VERY_HARD difficulty problems)
+- **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
+- **License**: Apache 2.0
+### Training Details
+- **Training Framework**: Unsloth + Transformers
+- **Fine-tuning Method**: LoRA with rank 16
+- **Batch Size**: 2 per device with 4 gradient accumulation steps
+- **Learning Rate**: 2e-4
+- **Optimizer**: AdamW 8-bit
+- **Precision**: Mixed precision (FP16/BF16)
+- **Max Sequence Length**: 2048 tokens
+### Dataset
+The model was trained on a carefully filtered subset of the nvidia/OpenCodeReasoning dataset:
+- **Source**: nvidia/OpenCodeReasoning (split_0)
+- **Filter Criteria**: Only problems with difficulty "VERY_HARD", 10, or 11
+- **Columns Used**: input (problem), output (expected result), solution (reasoning)
+## Intended Use
+This model is designed for:
+- Advanced algorithmic problem solving
+- Code generation with detailed reasoning
+- Educational purposes for understanding complex programming concepts
+- Research in automated code reasoning
+## Usage
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import torch
+# Load model and tokenizer
+tokenizer = AutoTokenizer.from_pretrained("Soumyajit-7/code-reasoning-deepseek-8b")
+model = AutoModelForCausalLM.from_pretrained(
+    "Soumyajit-7/code-reasoning-deepseek-8b",
+    torch_dtype=torch.float16,
+    device_map="auto"
+)
+# Define the prompt template
+prompt_template = """Below is an instruction that describes a coding task, paired with an input that provides further context.
+Write a response that appropriately completes the request.
+Before answering, think carefully about the problem and create a step-by-step chain of thoughts to ensure a logical and accurate response.
+### Instruction:
+You are a coding expert with advanced knowledge in programming, algorithms, and problem-solving.
+Please solve the following coding problem with detailed reasoning.
+### Problem:
+{problem}
+### Response:
+<think>"""
+# Example usage
+problem = """
+Problem description.
+Vipul is a hardworking super-hero who maintains the bracket ratio of all the strings in the world. Recently he indulged himself in saving the string population so much that he lost his ability for checking brackets (luckily, not permanently ).Being his super-hero friend help him in his time of hardship.
+Input
+The first line of the input contains an integer T denoting the number of test cases. The description of T test cases follows.
+The first line of each test case contains a single string S denoting the string to be checked.
+Output
+For each test case, output a single line printing "YES" or "NO" (without " " and in uppercase only) , denoting if the brackets in the given string is balanced or not .
+Constraints
+1 ≤ T ≤ 10
+1 ≤ length of S ≤ 60
+Example
+Input:
+3
+((()))
+(())()
+()(()
+Output:
+YES
+YES
+NO
+Explanation
+Example is self-explanatory.
+"""
+prompt = prompt_template.format(problem=problem)
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+outputs = model.generate(
+    **inputs,
+    max_new_tokens=1200,
+    temperature=0.7,
+    do_sample=True,
+    pad_token_id=tokenizer.eos_token_id
+)
+response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(response.split("### Response:")[1])
+```
+## Model Capabilities
+The model excels at:
+- **Algorithm Design**: Creating efficient algorithms for complex problems
+- **Code Optimization**: Improving time and space complexity
+- **Problem Analysis**: Breaking down complex problems into manageable steps
+- **Mathematical Reasoning**: Solving problems requiring mathematical insights
+- **Data Structure Implementation**: Designing and implementing advanced data structures
+## Prompt Format
+The model expects prompts in the following format:
+```
+### Instruction:
+You are a coding expert with advanced knowledge in programming, algorithms, and problem-solving.
+Please solve the following coding problem with detailed reasoning.
+### Problem:
+[Your coding problem here]
+### Response:
+<think>
+[The model will provide step-by-step reasoning here]
+</think>
+[Final solution/answer here]
+```
+## Performance
+This model has been specifically trained on the most challenging programming problems and shows improved performance on:
+- Advanced algorithmic challenges
+- Complex data structure problems
+- Mathematical programming tasks
+- Optimization problems
+## Limitations
+- The model is specialized for code reasoning and may not perform as well on general conversation
+- Training was focused on very hard problems, so it might be over-engineered for simple tasks
+- Like all language models, it may occasionally generate incorrect or suboptimal solutions
+- The model should be used as a coding assistant, not a replacement for human review
+## Training Infrastructure
+- **GPU**: NVIDIA A100/V100 (recommended)
+- **Memory**: 16GB+ GPU memory required
+- **Framework**: Unsloth for efficient training
+- **Quantization**: Trained with 4-bit quantization for memory efficiency
+## Ethical Considerations
+This model is designed for educational and research purposes. Users should:
+- Verify generated code before using in production
+- Understand the logic behind solutions rather than blindly copying
+- Use responsibly for learning and problem-solving enhancement
+## Future Work
+Potential improvements:
+- Training on additional challenging datasets
+- Multi-language code generation support
+- Integration with code execution environments
+- Fine-tuning on specific programming domains
+## Citation
+If you use this model in your research, please cite:
+```bibtex
+@misc{code-reasoning-deepseek-8b,
+  title={DeepSeek R1 Code Reasoning 8B},
+  author={Soumyajit},
+  year={2025},
+  howpublished={\url{https://huggingface.co/Soumyajit-7/code-reasoning-deepseek-8b}},
+}
+```
+## Acknowledgments
+- Based on DeepSeek-R1-Distill-Llama-8B
+- Trained using Unsloth for efficient fine-tuning
+- Dataset from NVIDIA's OpenCodeReasoning project
+- Special thanks to the open-source community for making this possible
+---
+*Model trained and maintained by Soumyajit-7. For questions or issues, please open an issue in the repository.*