houcine-bdk
/

chatMachineProto

@@ -1,14 +1,30 @@
 # NanoGPT Personal Experiment
-This repository contains my personal experiment with training and fine-tuning a GPT-2 style language model. This project was undertaken as a learning exercise to understand transformer-based language models and explore the capabilities of modern AI architectures.
 ## Model Description
-The architecture follows the original GPT-2 design principles while being more accessible and easier to understand.
 ### Technical Details
 - Base Architecture: GPT-2
 - Training Infrastructure: 8x A100 80GB GPUs
 - Parameters: ~124M (similar to GPT-2 small)
@@ -16,7 +32,8 @@ The architecture follows the original GPT-2 design principles while being more a
 The model underwent a multi-stage training process:
 1. Initial training on a subset of the OpenWebText dataset
-2. Experimentation with different hyperparameters and optimization techniques
 ### Features

+---
+language: en
+tags:
+- pytorch
+- gpt2
+- text-generation
+- nanoGPT
+license: mit
+datasets:
+- custom
+model-index:
+- name: chatMachineProto
+  results: []
+---
 # NanoGPT Personal Experiment
+This repository contains my personal experiment with training and fine-tuning a GPT-2 style language model using the nanoGPT architecture. This project was undertaken as a learning exercise to understand transformer-based language models and explore the capabilities of modern AI architectures.
 ## Model Description
+This model is based on the nanoGPT implementation, which is a minimal, clean implementation of GPT-2 style models. The architecture follows the original GPT-2 design principles while being more accessible and easier to understand.
 ### Technical Details
 - Base Architecture: GPT-2
+- Implementation: nanoGPT
 - Training Infrastructure: 8x A100 80GB GPUs
 - Parameters: ~124M (similar to GPT-2 small)
 The model underwent a multi-stage training process:
 1. Initial training on a subset of the OpenWebText dataset
+2. Fine-tuning experiments on various datasets including Shakespeare's works
+3. Experimentation with different hyperparameters and optimization techniques
 ### Features