Corianas
/

Tiny_Test

Text Generation

text-generation-inference

Model card Files Files and versions

Corianas commited on Mar 22, 2024

Commit

0b412d8

·

verified ·

1 Parent(s): 332bcb4

Update README.md

Files changed (1) hide show

README.md +29 -0

README.md CHANGED Viewed

@@ -1,3 +1,32 @@
 ---
 license: cc-by-nc-4.0
 ---

 ---
 license: cc-by-nc-4.0
 ---
+A llama.c model based on Karpathy's Llama2.c project. https://github.com/karpathy/llama2.c
+Vocab of 4096, trained on Tinystories, and my custom littlestories dataset (currently unreleased.)
+Model uses ↨ as a shift key, instead of using capial letters, this allowed simplification of the tokenizer to avoid duplicates that are uppercase.
+To convert normal text to the right format I use:
+```
+def add_caseifer(text):
+    # Using list comprehension for more efficient concatenation
+    return ''.join(['↨' + char.lower() if char.isupper() else char for char in text])
+```
+To return the text to human format I use:
+```
+def remove_caseifer(text):
+    new_text = ""
+    i = 0
+    while i < len(text):
+        if text[i] == "↨":
+            if i+1 < len(text):
+                new_text += text[i+1].upper()
+                i += 1
+            else:
+                pass  # skip this index
+        else:
+            new_text += text[i]
+        i += 1
+    return new_text
+```