haykgrigorian
/

v2mini-eval1

Text Generation

text-generation-inference

Model card Files Files and versions

v2mini-eval1 / README.md

haykgrigorian's picture

Adding `transformers` as library name (#1)

c762e9a verified 20 days ago

|

history blame contribute delete

2.2 kB

	---
	license: mit
	language:
	- en
	pipeline_tag: text-generation
	datasets:
	- haykgrigorian/TimeCapsuleLLM-London-1800-1875-v2-15GB
	library_name: transformers
	---

	# haykgrigorian/v2mini-eval1: Llama-Architecture 318M Model

	## Model Overview

	v2mini-eval1 model, trained from scratch on 15GB of 1800-1875 london texts using the modern Llama architecture. This model was trained for v2's dataset evaluation.

	\| Detail \| Value \|
	\| :--- \| :--- \|
	\| Model Architecture \| LlamaForCausalLM (Decoder-Only Transformer) \|
	\| Parameter Count \| ~318 Million (318M) \|
	\| Training Type \| Trained from Scratch (Random Initialization) \|
	\| Tokenizer \| Custom BPE, Vocab Size 32,000 \|
	\| Sequence Length \| 1024 tokens \|
	\| Attention Type \| Grouped Query Attention (GQA) \|

	## Configuration Details

	This model is a custom size and configuration based on Llama:

	\| Parameter \| Value \|
	\| :--- \| :--- \|
	\| Number of Layers \| 20 \|
	\| Hidden Size (d) \| 1024 \|
	\| Intermediate Size ($\text{d}_{\text{ff}}$) \| 2752 \|
	\| Attention Heads \| 16 (Query) / 8 (Key/Value) \|
	\| Activation Function \| SiLU (`silu`) \|
	\| Normalization \| RMS Norm (`rms_norm_eps`: 1e-05) \|
	\| Position Embeddings \| Rotary Positional Embeddings (RoPE) \|

	## Model Issues

	This is an evaluation model, it was trained from scratch using a 15GB sample from a 90GB dataset for 10k steps. There was a tokenization issue and output comes out like this:

	- default: "D oes that work more of h ise x cell ent st ir ring , in his pl ays"

	- fixed: "Does that work more of his excellent stirring, in his plays"

	This is just a tokenizer issue, just fix the output yourself or if you're lazy feed it to an LLM and have it fixed.

	### How to Load and Run the Model

	Install all the files locally in a folder and run the test script. You will have to make some adjustments in the run script like updating the config/file path and test prompts

	### Test script

	A run file for testing and evaluating this model is available on the main project repository:

	* Test Script Link: [test_v2mini_eval1.py on GitHub](https://github.com/haykgrigo3/TimeCapsuleLLM/blob/main/london_1800_1875_v2mini_eval1/test_v2mini_eval1.py)