RichardErkhov commited on
Commit
e710c47
·
verified ·
1 Parent(s): 3fe2563

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +50 -0
README.md ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ TinyLLama-v0 - bnb 8bits
11
+ - Model creator: https://huggingface.co/Maykeye/
12
+ - Original model: https://huggingface.co/Maykeye/TinyLLama-v0/
13
+
14
+
15
+
16
+
17
+ Original model description:
18
+ ---
19
+ license: apache-2.0
20
+ ---
21
+ This is a first version of recreating roneneldan/TinyStories-1M but using Llama architecture.
22
+
23
+ * Full training process is included in the notebook train.ipynb. Recreating it as simple as downloading
24
+ TinyStoriesV2-GPT4-train.txt and TinyStoriesV2-GPT4-valid.txt in the same folder with the notebook and running
25
+ the cells. Validation content is not used by the script so you put anythin in
26
+
27
+ * Backup directory has a script do_backup that I used to copy weights from remote machine to local.
28
+ Weight are generated too quickly, so by the time script copied weihgt N+1
29
+
30
+ * This is extremely PoC version. Training truncates stories that are longer than context size and doesn't use
31
+ any sliding window to train story not from the start
32
+
33
+ * Training took approximately 9 hours (3 hours per epoch) on 40GB A100. ~30GB VRAM was used
34
+
35
+ * I use tokenizer from open_llama_3b. However I had troubles with it locally(https://github.com/openlm-research/open_llama/issues/69).
36
+ I had no troubles on the cloud machine with preninstalled libraries.
37
+
38
+ * Demo script is demo.py
39
+
40
+ * Validation script is provided: valid.py. use it like `python valid.py path/to/TinyStoriesV2-GPT4-valid.txt [optional-model-id-or-path]`:
41
+ After training I decided that it's not necessary to beat validation into chunks
42
+
43
+ * Also this version uses very stupid caching mechinsm to shuffle stories for training: it keeps cache of N recently loaded chunks
44
+ so if random shuffle asks for a story, it may use cache or load chunk.
45
+ Training dataset is too small, so in next versions I will get rid of it.
46
+
47
+
48
+ from transformers import AutoModelForCausalLM, AutoTokenizer
49
+
50
+