HenryHHHH
/

DistilLlama

Text Generation

knowledge-distillation

transfer-learning

text-generation-inference

Model card Files Files and versions

HenryHHHH commited on Oct 23, 2024

Commit

2e09c9c

·

verified ·

1 Parent(s): 7fd8fd6

Update README.md

Files changed (1) hide show

README.md +1 -21

README.md CHANGED Viewed

@@ -1,22 +1,3 @@
-```yaml
----
-language: "en"
-license: "mit"
-tags:
-  - llama
-  - distillation
-  - openwebtext
-  - wikitext
-  - text-generation
-  - knowledge-distillation
-datasets:
-  - openwebtext
-  - wikitext-103-raw-v1
-model_name: "DistilLLaMA"
-base_model: "Meta's 7B LLaMA 2"
-inference: true
----
 ### Overview
 This model is a distilled version of LLaMA 2, containing approximately 80 million parameters. It was trained using a mix of OpenWebText and WikiText Raw V1 datasets. Knowledge distillation was employed to transfer knowledge from a larger "teacher" model—Meta’s 7B LLaMA 2—to help this smaller model mimic the behavior of the teacher.
@@ -122,5 +103,4 @@ The model’s performance is evaluated on 200 queries created in-house. For more
       url={https://arxiv.org/abs/2308.02019},
 }
-*Note: The repository will be updated as training progresses. Last update 2024-10-23*
-```

 ### Overview
 This model is a distilled version of LLaMA 2, containing approximately 80 million parameters. It was trained using a mix of OpenWebText and WikiText Raw V1 datasets. Knowledge distillation was employed to transfer knowledge from a larger "teacher" model—Meta’s 7B LLaMA 2—to help this smaller model mimic the behavior of the teacher.
       url={https://arxiv.org/abs/2308.02019},
 }
+*Note: The repository will be updated as training progresses. Last update 2024-10-23*