Corianas commited on
Commit
0936c44
·
verified ·
1 Parent(s): 5576aa5

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -0
README.md ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ ---
4
+ A llama.c model based on Karpathy's Llama2.c project. https://github.com/karpathy/llama2.c
5
+ Vocab of 4096, trained on Tinystories, and my custom littlestories dataset (currently unreleased.)
6
+ This version was further trained on following instructions... somewhat... using https://github.com/mlabonne/llm-course/blob/main/Fine_tune_Llama_2_in_Google_Colab.ipynb
7
+
8
+ Model uses ↨ as a shift key, instead of using capial letters, this allowed simplification of the tokenizer to avoid duplicates that are uppercase.
9
+
10
+ To convert normal text to the right format I use:
11
+ ```
12
+ def add_caseifer(text):
13
+ # Using list comprehension for more efficient concatenation
14
+ return ''.join(['↨' + char.lower() if char.isupper() else char for char in text])
15
+ ```
16
+
17
+ To return the text to human format I use:
18
+ ```
19
+ def remove_caseifer(text):
20
+ new_text = ""
21
+ i = 0
22
+ while i < len(text):
23
+ if text[i] == "↨":
24
+ if i+1 < len(text):
25
+ new_text += text[i+1].upper()
26
+ i += 1
27
+ else:
28
+ pass # skip this index
29
+ else:
30
+ new_text += text[i]
31
+ i += 1
32
+ return new_text
33
+ ```