ARMZyany
/

Cascade0-159M-Instruct-45k

Text Generation

Model card Files Files and versions

Cascade0-159M-Instruct-45k / README.md

ARMZyany's picture

Update README.md

94f7712 verified 6 months ago

|

history blame contribute delete

2.07 kB

	---
	license: apache-2.0
	datasets:
	- Salesforce/wikitext
	- VisionTheta/fineweb-1B
	- teknium/OpenHermes-2.5
	- ARMZyany/MoonGeneralQA-V1
	- WizardLMTeam/WizardLM_evol_instruct_V2_196k
	- Voxel51/fiftyone-qa-pairs-14k
	- Open-Orca/OpenOrca
	- OpenAssistant/oasst2
	- Ereeeeef3/Qu-QA-v2
	- tau/commonsense_qa
	- OpenAssistant/oasst1
	- hkust-nlp/deita-10k-v0
	- HuggingFaceH4/ultrafeedback_binarized
	- meta-math/MetaMathQA
	- HuggingFaceH4/ultrachat_200k
	language:
	- en
	pipeline_tag: text-generation
	---
	# Cascade0
	My first ever LLM trained on a single RTX 4080 locally, in 1.5 - 2 weeks.
	Altough its small (159M) and it cannot answer direct questions (What's the capital of France?)
	it can absolutely complete sentences coherently and correctly.
	Only thing to mention is that it currently outputs everything in lowercase (due to training bug).

	## GPT2 vs Cascade0
	Both models are similar in size (161M for GPT2) (159M for Cascade0), and same F16 Quant.
	Eg response
	![image/png](https://cdn-uploads.huggingface.co/production/uploads/678945b9c72ac913d8d588a0/kqe_WXvIxeN0xfED40omx.png)
	Both models hallucinate after the second turn in one chat.
	![image/png](https://cdn-uploads.huggingface.co/production/uploads/678945b9c72ac913d8d588a0/dzpO9vKzWz6w2FFbkMmLo.png)
	![image/png](https://cdn-uploads.huggingface.co/production/uploads/678945b9c72ac913d8d588a0/a8iylthyYBw2oUsulcT_a.png)

	According to Gemini 2.5 Flash after analyzing the responses, its verdict was:
	![image/png](https://cdn-uploads.huggingface.co/production/uploads/678945b9c72ac913d8d588a0/jT3OQBcNXtk9Pd3OyJDjZ.png)

	This project started in May 2025. Code for training is AI generated, BUT it took a lot of human effort (Rather debugging and prompt engineering) to reach this state, including lots of trial and error, AI changing (from GPT-Gemini-Deepsek) wasted electricity in training and time... And lots of furstration.
	It was only recently when i bought ChatGPT Plus when I could this pull off, after almost abandoning everything.
	But after all, this is my dream, and I just feel good when I see this on my page. <3