|
|
--- |
|
|
license: apache-2.0 |
|
|
datasets: |
|
|
- Salesforce/wikitext |
|
|
- VisionTheta/fineweb-1B |
|
|
- teknium/OpenHermes-2.5 |
|
|
- ARMZyany/MoonGeneralQA-V1 |
|
|
- WizardLMTeam/WizardLM_evol_instruct_V2_196k |
|
|
- Voxel51/fiftyone-qa-pairs-14k |
|
|
- Open-Orca/OpenOrca |
|
|
- OpenAssistant/oasst2 |
|
|
- Ereeeeef3/Qu-QA-v2 |
|
|
- tau/commonsense_qa |
|
|
- OpenAssistant/oasst1 |
|
|
- hkust-nlp/deita-10k-v0 |
|
|
- HuggingFaceH4/ultrafeedback_binarized |
|
|
- meta-math/MetaMathQA |
|
|
- HuggingFaceH4/ultrachat_200k |
|
|
language: |
|
|
- en |
|
|
pipeline_tag: text-generation |
|
|
--- |
|
|
# Cascade0 |
|
|
My first ever LLM trained on a single RTX 4080 locally, in 1.5 - 2 weeks. |
|
|
Altough its small (159M) and it cannot answer direct questions (What's the capital of France?) |
|
|
it can absolutely complete sentences coherently and correctly. |
|
|
Only thing to mention is that it currently outputs everything in lowercase (due to training bug). |
|
|
|
|
|
## GPT2 vs Cascade0 |
|
|
Both models are similar in size (161M for GPT2) (159M for Cascade0), and same F16 Quant. |
|
|
Eg response |
|
|
 |
|
|
Both models hallucinate after the second turn in one chat. |
|
|
 |
|
|
 |
|
|
|
|
|
According to Gemini 2.5 Flash after analyzing the responses, its verdict was: |
|
|
 |
|
|
|
|
|
This project started in May 2025. Code for training is AI generated, BUT it took a lot of human effort (Rather debugging and prompt engineering) to reach this state, including lots of trial and error, AI changing (from GPT-Gemini-Deepsek) wasted electricity in training and time... And lots of furstration. |
|
|
It was only recently when i bought ChatGPT Plus when I could this pull off, after almost abandoning everything. |
|
|
But after all, this is my dream, and I just feel good when I see this on my page. <3 |