ariankharazmi
/

Curiosity-16

Text Generation

proof-of-concept

Model card Files Files and versions

ariankharazmi commited on Dec 9, 2025

Commit

98d4f9c

·

verified ·

1 Parent(s): 06211c3

Update README.md

Files changed (1) hide show

README.md +1 -2

README.md CHANGED Viewed

@@ -31,5 +31,4 @@ Description
 - Curiosity-16 is a small research model (based on pre-trained GPT-2 Medium) that has 354.8M Parameters. It uses training samples from 11 diverse HuggingFace datasets.
 Citation
-Kharazmi, A. (2025). Curiosity-16: A 354.8M Parameter Large Language Model (v1.0). Zenodo. https://doi.org/10.5281/zenodo.17871084Curiosity-16 is a proof-of-concept Large Language Model fine-tuned for short factual responses and basic reasoning capabilities, presented during UCinci’s Summer 2025 EEP Research co-op session. This demonstrated the feasibility of pushing open-source GPT-2 Medium to higher usability through carefully structured two-phase Supervised Fine-tuning. Trained on ~153k prompt-response pairs across 11 curated HuggingFace datasets. Achieved over 100 downloads on HuggingFace. Evaluated against GPT-2 Medium on HellaSwag & Massive Multitask Language Understanding (MMLU) Benchmarks. Model Summary - Parameters: 354.8M Parameters - Base: GPT-2 Medium (Decoder) - Tokenizer: AutoTokenizer - Training: 2-Phase Full SFT (Phase I: Generalization, Phase II: Task-focus/Chain-of-Thought) - Training Duration: 30 Hours -- Phase I: 17 Hours, Phase II: 13 Hours - Purpose: Research Model -- Proof of Concept - Evaluated Strengths: Short factual responses, brief creative prompts, basic reasoning inquiries - Evaluated Limitations: Hard-limit at 1-2 Sentences, no safety filter, prone to misinterpret or hallucinate. Citation: Kharazmi, A. (2025). Curiosity-16: A 354.8M Parameter Large Language Model (v1.0). Zenodo. https://doi.org/10.5281/zenodo.17871084

 - Curiosity-16 is a small research model (based on pre-trained GPT-2 Medium) that has 354.8M Parameters. It uses training samples from 11 diverse HuggingFace datasets.
 Citation
+Kharazmi, A. (2025). Curiosity-16: A 354.8M Parameter Large Language Model (v1.0). Zenodo. https://doi.org/10.5281/zenodo.17871084