RicardoPoleo
/

DL_LLM_from_scratch_2

Model card Files Files and versions

DL_LLM_from_scratch_2 / README.md

RicardoPoleo's picture

Update README.md

4fd0757 verified over 1 year ago

|

history blame contribute delete

2.23 kB

	# Model Card: Custom Language Model

	## Overview

	This model was trained using the WikiText-103 dataset to generate text based on input prompts.

	## Dataset

	Dataset Used: WikiText-103

	Source: [Hugging Face Datasets](https://huggingface.co/datasets/wikitext)

	Dataset Details: The WikiText-103 dataset is a collection of over 100 million tokens extracted from the set of verified "Good" and "Featured" articles on Wikipedia. It is designed for language modeling and other text generation tasks.

	## Data Cleaning

	To ensure high-quality input for training, the dataset underwent the following cleaning steps:
	1. Removal of non-standard characters and punctuation.
	2. Tokenization using BERT's tokenizer.
	3. Lowercasing all text.
	4. Filtering out any overly short or long sequences to maintain a consistent input size.

	## Neural Network Definition

	The neural network used for this model is based on a transformer architecture with the following specifications:
	- Model Type: BERT-based transformer
	- Number of Layers: 5
	- Dropout: Applied at each layer to prevent overfitting
	- Optimizer: AdamW with a learning rate of 5e-5
	- Loss Function: Cross-entropy loss for language modeling

	## Training Details

	The model was trained on an L4 GPU with the following resources:
	- CPU Cores: 16
	- System RAM: 62.8 GB
	- GPU RAM: 22.5 GB
	- Disk: 201.2 GB

	Training Configuration:
	- Batch Size: Dynamic, adjusted based on GPU RAM availability
	- Epochs: 50
	- Initial Learning Rate: 5e-5

	### Training Results

	The training involved several experiments with different batch sizes and epochs. The final training loss was plotted to visualize the model's performance.

	## Usage

	To use this model, you can load it from Hugging Face and generate text as follows:

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	tokenizer = AutoTokenizer.from_pretrained("RicardoPoleo/DL_LLM_from_scratch_2")
	model = AutoModelForCausalLM.from_pretrained("RicardoPoleo/DL_LLM_from_scratch_2")

	input_text = "Once upon a time"
	inputs = tokenizer(input_text, return_tensors="pt")
	outputs = model.generate(**inputs)

	print(tokenizer.decode(outputs[0], skip_special_tokens=True))