Corianas
/

TinyTask-minipaca

Text Generation

text-generation-inference

Model card Files Files and versions

TinyTask-minipaca / README.md

Corianas's picture

Update README.md

fdd70d7 verified almost 2 years ago

|

1.18 kB

	---
	license: cc-by-nc-4.0
	---
	A llama.c model based on Karpathy's Llama2.c project. https://github.com/karpathy/llama2.c

	Vocab of 4096, trained on Tinystories, and my custom littlestories dataset (currently unreleased.)

	This version was further trained on following instructions... somewhat... using https://github.com/mlabonne/llm-course/blob/main/Fine_tune_Llama_2_in_Google_Colab.ipynb


	Model uses ↨ as a shift key, instead of using capial letters, this allowed simplification of the tokenizer to avoid duplicates that are uppercase.

	To convert normal text to the right format I use:
	```
	def add_caseifer(text):
	# Using list comprehension for more efficient concatenation
	return ''.join(['↨' + char.lower() if char.isupper() else char for char in text])
	```

	To return the text to human format I use:
	```
	def remove_caseifer(text):
	new_text = ""
	i = 0
	while i < len(text):
	if text[i] == "↨":
	if i+1 < len(text):
	new_text += text[i+1].upper()
	i += 1
	else:
	pass # skip this index
	else:
	new_text += text[i]
	i += 1
	return new_text
	```