| | --- |
| | license: cc-by-nc-4.0 |
| | --- |
| | A llama.c model based on Karpathy's Llama2.c project. https://github.com/karpathy/llama2.c |
| |
|
| | Vocab of 4096, trained on Tinystories, and my custom littlestories dataset (currently unreleased.) |
| |
|
| | This version was further trained on following instructions... somewhat... using https://github.com/mlabonne/llm-course/blob/main/Fine_tune_Llama_2_in_Google_Colab.ipynb |
| |
|
| |
|
| | Model uses ↨ as a shift key, instead of using capial letters, this allowed simplification of the tokenizer to avoid duplicates that are uppercase. |
| |
|
| | To convert normal text to the right format I use: |
| | ``` |
| | def add_caseifer(text): |
| | # Using list comprehension for more efficient concatenation |
| | return ''.join(['↨' + char.lower() if char.isupper() else char for char in text]) |
| | ``` |
| |
|
| | To return the text to human format I use: |
| | ``` |
| | def remove_caseifer(text): |
| | new_text = "" |
| | i = 0 |
| | while i < len(text): |
| | if text[i] == "↨": |
| | if i+1 < len(text): |
| | new_text += text[i+1].upper() |
| | i += 1 |
| | else: |
| | pass # skip this index |
| | else: |
| | new_text += text[i] |
| | i += 1 |
| | return new_text |
| | ``` |