Upload folder using huggingface_hub

Browse files

Files changed (4) hide show

README.md +61 -92
config.json +1 -2
generation_config.json +1 -1
model.safetensors +2 -2

README.md CHANGED Viewed

@@ -1,111 +1,80 @@
 ---
 language: en
 tags:
-- question-answering
-- squad
 - gpt2
 license: mit
 datasets:
-- squad
 ---
-# GPT-2 Fine-tuned for Question Answering
-This model is a GPT-2 model fine-tuned on the SQuAD (Stanford Question Answering Dataset) for question answering tasks. It takes a context and a question as input and generates a concise, accurate answer based on the provided context.
 ## Model Description
-- **Model Type:** GPT-2
-- **Language:** English
-- **Training Data:** SQuAD dataset
-- **Input Format:** "Context: [context] Question: [question] Answer:"
-- **Output Format:** Direct answer without any additional formatting
-## Usage
-```python
-from transformers import GPT2LMHeadModel, GPT2Tokenizer
-# Load model and tokenizer
-model = GPT2LMHeadModel.from_pretrained("houcine-bdk/chatMachine_v1")
-tokenizer = GPT2Tokenizer.from_pretrained("houcine-bdk/chatMachine_v1")
-# Prepare input
-context = "George Washington was the first president of the United States, serving from 1789 to 1797."
-question = "Who was the first president of the United States?"
-input_text = f"Context: {context} Question: {question} Answer:"
-# Tokenize
-inputs = tokenizer(input_text, return_tensors="pt")
-# Generate answer
-outputs = model.generate(
-    **inputs,
-    max_new_tokens=30,
-    temperature=0.1,
-    top_k=50,
-)
-# Decode and extract answer
-generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
-answer = generated_text.split("Answer:")[-1].strip()
-print(f"Answer: {answer}")
-```
-## Example Outputs
-1. **Factual Questions:**
-   ```
-   Context: George Washington was the first president of the United States, serving from 1789 to 1797.
-   Question: Who was the first president of the United States?
-   Answer: George Washington
-   ```
-2. **Date Questions:**
-   ```
-   Context: The Declaration of Independence was signed on July 4, 1776, by the Continental Congress in Philadelphia.
-   Question: When was the Declaration of Independence signed?
-   Answer: July 4 1776
-   ```
-3. **Location Questions:**
-   ```
-   Context: Paris is the capital and largest city of France, located on the river Seine.
-   Question: What is the capital of France?
-   Answer: Paris
-   ```
-4. **Measurement Questions:**
-   ```
-   Context: The Eiffel Tower was completed in 1889 and stands at a height of 324 meters.
-   Question: How tall is the Eiffel Tower?
-   Answer: 324 meters
-   ```
-## Model Performance
-The model demonstrates strong performance in:
-- Extracting precise information from context
-- Providing concise answers
-- Handling various question types (who, what, when, where, how)
-- Maintaining accuracy with numerical values and dates
 ## Limitations
-- The model requires context to be provided along with the question
-- Best suited for factual questions rather than opinion or analysis
-- Context length is limited to the model's maximum sequence length
-## Training Details
-The model was fine-tuned using:
-- Base model: GPT-2
-- Training dataset: SQuAD
-- Training parameters:
-  - Learning rate: 2e-5
-  - Batch size: 16
-  - Mixed precision: bfloat16
-## License
-This model is released under the MIT license.

 ---
 language: en
 tags:
+- pytorch
 - gpt2
+- text-generation
+- nanoGPT
 license: mit
 datasets:
+- custom
+model-index:
+- name: chatMachineProto
+  results: []
 ---
+# NanoGPT Personal Experiment
+This repository contains my personal experiment with training and fine-tuning a GPT-2 style language model. This project was undertaken as a learning exercise to understand transformer-based language models and explore the capabilities of modern AI architectures.
 ## Model Description
+The architecture follows the original GPT-2 design principles while being more accessible and easier to understand.
+### Technical Details
+- Base Architecture: GPT-2
+- Training Infrastructure: 8x A100 80GB GPUs
+- Parameters: ~124M (similar to GPT-2 small)
+### Training Process
+The model underwent a multi-stage training process:
+1. Initial training on a subset of the OpenWebText dataset
+2. Experimentation with different hyperparameters and optimization techniques
+### Features
+- Clean, minimal implementation of the GPT architecture
+- Efficient training utilizing modern GPU capabilities
+- Configurable generation parameters (temperature, top-k sampling)
+- Support for both direct text generation and interactive chat
+## Use Cases
+This model is primarily an experimental project and can be used for:
+- Educational purposes to understand transformer architectures
+- Text generation experiments
+- Research into language model behavior
+- Interactive chat experiments
 ## Limitations
+As this is a personal experiment, please note:
+- The model may produce inconsistent or incorrect outputs
+- It's not intended for production use
+- Responses may be unpredictable or contain biases
+- Performance may vary significantly depending on the input
+## Development Context
+This project was developed as part of my personal exploration into AI/ML, specifically focusing on:
+- Understanding transformer architectures
+- Learning about large-scale model training
+- Experimenting with different training approaches
+- Gaining hands-on experience with modern AI infrastructure
+## Acknowledgments
+This project builds upon the excellent work of:
+- The original GPT-2 paper by OpenAI
+- The nanoGPT implementation by Andrej Karpathy
+- The broader open-source AI community
+## Disclaimer
+This is a personal experimental project and should be treated as such. It's not intended for production use or as a replacement for more established language models. The primary goal was learning and experimentation.
+---
+Feel free to explore the model and provide feedback. Remember that this is an experimental project, and results may vary significantly from more established models.

config.json CHANGED Viewed

@@ -1,5 +1,4 @@
 {
-  "_name_or_path": "./hf_model",
   "activation_function": "gelu_new",
   "architectures": [
     "GPT2LMHeadModel"
@@ -25,7 +24,7 @@
   "summary_proj_to_labels": true,
   "summary_type": "cls_index",
   "summary_use_proj": true,
-  "torch_dtype": "float32",
   "transformers_version": "4.48.1",
   "use_cache": true,
   "vocab_size": 50257

 {
   "activation_function": "gelu_new",
   "architectures": [
     "GPT2LMHeadModel"
   "summary_proj_to_labels": true,
   "summary_type": "cls_index",
   "summary_use_proj": true,
+  "torch_dtype": "bfloat16",
   "transformers_version": "4.48.1",
   "use_cache": true,
   "vocab_size": 50257

generation_config.json CHANGED Viewed

@@ -4,6 +4,6 @@
   "max_new_tokens": 30,
   "min_new_tokens": 1,
   "pad_token_id": 50256,
-  "temperature": 0.1,
   "transformers_version": "4.48.1"
 }

   "max_new_tokens": 30,
   "min_new_tokens": 1,
   "pad_token_id": 50256,
+  "temperature": 0.7,
   "transformers_version": "4.48.1"
 }

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:c08f088e35b8ad14bc0b6d1bb4077a08c8e34cb6c894a99a1991c7ea392a885c
-size 497774208

 version https://git-lfs.github.com/spec/v1
+oid sha256:420e9f35cbdddd9730e940a210a914c47c3c678fb8687f87fee20b1d6851cef7
+size 248894656