Update README.md
Browse files
README.md
CHANGED
|
@@ -41,7 +41,7 @@ Think about it: most AI models blindly generate reasoning even when unnecessary,
|
|
| 41 |
- Reasoning & Self-Reflection: The model first decides if reasoning is necessary and then either provides step-by-step logic or directly answers the question.
|
| 42 |
- Structured Output: Responses follow a strict format with `<think>`, `<reflection>`, and `<answer>` sections, ensuring clarity and interpretability.
|
| 43 |
- Optimized Training: Trained using GRPO (Guided Reward Policy Optimization) to enforce structured responses and improve decision-making.
|
| 44 |
-
- Efficient Inference: Fine-tuned with Unsloth & Hugging Face
|
| 45 |
|
| 46 |
## Prompt Structure
|
| 47 |
|
|
@@ -99,7 +99,7 @@ This model is released under the Apache-2.0 license.
|
|
| 99 |
|
| 100 |
## Acknowledgments
|
| 101 |
|
| 102 |
-
Special thanks to the Unsloth team for optimizing the fine-tuning pipeline and to Hugging Face
|
| 103 |
|
| 104 |
## Security & Format Considerations
|
| 105 |
|
|
|
|
| 41 |
- Reasoning & Self-Reflection: The model first decides if reasoning is necessary and then either provides step-by-step logic or directly answers the question.
|
| 42 |
- Structured Output: Responses follow a strict format with `<think>`, `<reflection>`, and `<answer>` sections, ensuring clarity and interpretability.
|
| 43 |
- Optimized Training: Trained using GRPO (Guided Reward Policy Optimization) to enforce structured responses and improve decision-making.
|
| 44 |
+
- Efficient Inference: Fine-tuned with Unsloth & Hugging Face's TRL, ensuring faster inference speeds and optimized resource utilization.
|
| 45 |
|
| 46 |
## Prompt Structure
|
| 47 |
|
|
|
|
| 99 |
|
| 100 |
## Acknowledgments
|
| 101 |
|
| 102 |
+
Special thanks to the Unsloth team for optimizing the fine-tuning pipeline and to Hugging Face's TRL for enabling advanced fine-tuning techniques.
|
| 103 |
|
| 104 |
## Security & Format Considerations
|
| 105 |
|