Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,46 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
|
| 2 |
+
---
|
| 3 |
+
|
| 4 |
+
base_model: unsloth/Meta-Llama-3.1-8B-bnb-4bit
|
| 5 |
+
language:
|
| 6 |
+
- en
|
| 7 |
+
license: apache-2.0
|
| 8 |
+
tags:
|
| 9 |
+
- text-generation-inference
|
| 10 |
+
- transformers
|
| 11 |
+
- unsloth
|
| 12 |
+
- llama
|
| 13 |
+
- trl
|
| 14 |
+
|
| 15 |
+
---
|
| 16 |
+
|
| 17 |
+

|
| 18 |
+
|
| 19 |
+
# QuantFactory/Reflection-Llama-3.1-8B-GGUF
|
| 20 |
+
This is quantized version of [terrycraddock/Reflection-Llama-3.1-8B](https://huggingface.co/terrycraddock/Reflection-Llama-3.1-8B) created using llama.cpp
|
| 21 |
+
|
| 22 |
+
# Original Model Card
|
| 23 |
+
|
| 24 |
+
|
| 25 |
+
# Uploaded model
|
| 26 |
+
|
| 27 |
+
- **Developed by:** terrycraddock
|
| 28 |
+
- **License:** apache-2.0
|
| 29 |
+
- **Finetuned from model :** unsloth/Meta-Llama-3.1-8B-bnb-4bit
|
| 30 |
+
|
| 31 |
+
*** Currently Re-Training Model on multiple epochs of the data set to get a better loss rate. I will remove this notice when I upload the new version ***
|
| 32 |
+
|
| 33 |
+
Trained from unsloth/Meta-Llama-3.1-8B-bnb-4bit, you can sample from Reflection Llama-3.1 8B using the same code, pipelines, etc. as any other Llama model. It even uses the stock Llama 3.1 chat template format (though, we've trained in a few new special tokens to aid in reasoning and reflection).
|
| 34 |
+
|
| 35 |
+
During sampling, the model will start by outputting reasoning inside <thinking> and </thinking> tags, and then once it is satisfied with its reasoning, it will output the final answer inside <output> and </output> tags. Each of these tags are special tokens, trained into the model.
|
| 36 |
+
|
| 37 |
+
This enables the model to separate its internal thoughts and reasoning from its final answer, improving the experience for the user.
|
| 38 |
+
|
| 39 |
+
Inside the <thinking> section, the model may output one or more <reflection> tags, which signals the model has caught an error in its reasoning and will attempt to correct it before providing a final answer.
|
| 40 |
+
|
| 41 |
+
This model was finetuned on one epoch of https://huggingface.co/datasets/mahiatlinux/Reflection-Dataset-v2 .
|
| 42 |
+
|
| 43 |
+
This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
|
| 44 |
+
|
| 45 |
+
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
|
| 46 |
+
|