houcine-bdk commited on
Commit
9bbe442
verified
1 Parent(s): e1edd13

Upload folder using huggingface_hub

Browse files
Files changed (4) hide show
  1. README.md +61 -92
  2. config.json +1 -2
  3. generation_config.json +1 -1
  4. model.safetensors +2 -2
README.md CHANGED
@@ -1,111 +1,80 @@
1
  ---
2
  language: en
3
  tags:
4
- - question-answering
5
- - squad
6
  - gpt2
 
 
7
  license: mit
8
  datasets:
9
- - squad
 
 
 
10
  ---
11
 
12
- # GPT-2 Fine-tuned for Question Answering
13
 
14
- This model is a GPT-2 model fine-tuned on the SQuAD (Stanford Question Answering Dataset) for question answering tasks. It takes a context and a question as input and generates a concise, accurate answer based on the provided context.
15
 
16
  ## Model Description
17
 
18
- - **Model Type:** GPT-2
19
- - **Language:** English
20
- - **Training Data:** SQuAD dataset
21
- - **Input Format:** "Context: [context] Question: [question] Answer:"
22
- - **Output Format:** Direct answer without any additional formatting
23
-
24
- ## Usage
25
-
26
- ```python
27
- from transformers import GPT2LMHeadModel, GPT2Tokenizer
28
-
29
- # Load model and tokenizer
30
- model = GPT2LMHeadModel.from_pretrained("houcine-bdk/chatMachine_v1")
31
- tokenizer = GPT2Tokenizer.from_pretrained("houcine-bdk/chatMachine_v1")
32
-
33
- # Prepare input
34
- context = "George Washington was the first president of the United States, serving from 1789 to 1797."
35
- question = "Who was the first president of the United States?"
36
- input_text = f"Context: {context} Question: {question} Answer:"
37
-
38
- # Tokenize
39
- inputs = tokenizer(input_text, return_tensors="pt")
40
-
41
- # Generate answer
42
- outputs = model.generate(
43
- **inputs,
44
- max_new_tokens=30,
45
- temperature=0.1,
46
- top_k=50,
47
- )
48
-
49
- # Decode and extract answer
50
- generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
51
- answer = generated_text.split("Answer:")[-1].strip()
52
- print(f"Answer: {answer}")
53
- ```
54
-
55
- ## Example Outputs
56
-
57
- 1. **Factual Questions:**
58
- ```
59
- Context: George Washington was the first president of the United States, serving from 1789 to 1797.
60
- Question: Who was the first president of the United States?
61
- Answer: George Washington
62
- ```
63
-
64
- 2. **Date Questions:**
65
- ```
66
- Context: The Declaration of Independence was signed on July 4, 1776, by the Continental Congress in Philadelphia.
67
- Question: When was the Declaration of Independence signed?
68
- Answer: July 4 1776
69
- ```
70
-
71
- 3. **Location Questions:**
72
- ```
73
- Context: Paris is the capital and largest city of France, located on the river Seine.
74
- Question: What is the capital of France?
75
- Answer: Paris
76
- ```
77
-
78
- 4. **Measurement Questions:**
79
- ```
80
- Context: The Eiffel Tower was completed in 1889 and stands at a height of 324 meters.
81
- Question: How tall is the Eiffel Tower?
82
- Answer: 324 meters
83
- ```
84
-
85
- ## Model Performance
86
-
87
- The model demonstrates strong performance in:
88
- - Extracting precise information from context
89
- - Providing concise answers
90
- - Handling various question types (who, what, when, where, how)
91
- - Maintaining accuracy with numerical values and dates
92
 
93
  ## Limitations
94
 
95
- - The model requires context to be provided along with the question
96
- - Best suited for factual questions rather than opinion or analysis
97
- - Context length is limited to the model's maximum sequence length
 
 
 
 
 
 
 
 
 
 
98
 
99
- ## Training Details
100
 
101
- The model was fine-tuned using:
102
- - Base model: GPT-2
103
- - Training dataset: SQuAD
104
- - Training parameters:
105
- - Learning rate: 2e-5
106
- - Batch size: 16
107
- - Mixed precision: bfloat16
108
 
109
- ## License
 
 
 
 
110
 
111
- This model is released under the MIT license.
 
1
  ---
2
  language: en
3
  tags:
4
+ - pytorch
 
5
  - gpt2
6
+ - text-generation
7
+ - nanoGPT
8
  license: mit
9
  datasets:
10
+ - custom
11
+ model-index:
12
+ - name: chatMachineProto
13
+ results: []
14
  ---
15
 
16
+ # NanoGPT Personal Experiment
17
 
18
+ This repository contains my personal experiment with training and fine-tuning a GPT-2 style language model. This project was undertaken as a learning exercise to understand transformer-based language models and explore the capabilities of modern AI architectures.
19
 
20
  ## Model Description
21
 
22
+ The architecture follows the original GPT-2 design principles while being more accessible and easier to understand.
23
+
24
+ ### Technical Details
25
+
26
+ - Base Architecture: GPT-2
27
+ - Training Infrastructure: 8x A100 80GB GPUs
28
+ - Parameters: ~124M (similar to GPT-2 small)
29
+
30
+ ### Training Process
31
+
32
+ The model underwent a multi-stage training process:
33
+ 1. Initial training on a subset of the OpenWebText dataset
34
+ 2. Experimentation with different hyperparameters and optimization techniques
35
+
36
+ ### Features
37
+
38
+ - Clean, minimal implementation of the GPT architecture
39
+ - Efficient training utilizing modern GPU capabilities
40
+ - Configurable generation parameters (temperature, top-k sampling)
41
+ - Support for both direct text generation and interactive chat
42
+
43
+ ## Use Cases
44
+
45
+ This model is primarily an experimental project and can be used for:
46
+ - Educational purposes to understand transformer architectures
47
+ - Text generation experiments
48
+ - Research into language model behavior
49
+ - Interactive chat experiments
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
 
51
  ## Limitations
52
 
53
+ As this is a personal experiment, please note:
54
+ - The model may produce inconsistent or incorrect outputs
55
+ - It's not intended for production use
56
+ - Responses may be unpredictable or contain biases
57
+ - Performance may vary significantly depending on the input
58
+
59
+ ## Development Context
60
+
61
+ This project was developed as part of my personal exploration into AI/ML, specifically focusing on:
62
+ - Understanding transformer architectures
63
+ - Learning about large-scale model training
64
+ - Experimenting with different training approaches
65
+ - Gaining hands-on experience with modern AI infrastructure
66
 
67
+ ## Acknowledgments
68
 
69
+ This project builds upon the excellent work of:
70
+ - The original GPT-2 paper by OpenAI
71
+ - The nanoGPT implementation by Andrej Karpathy
72
+ - The broader open-source AI community
 
 
 
73
 
74
+ ## Disclaimer
75
+
76
+ This is a personal experimental project and should be treated as such. It's not intended for production use or as a replacement for more established language models. The primary goal was learning and experimentation.
77
+
78
+ ---
79
 
80
+ Feel free to explore the model and provide feedback. Remember that this is an experimental project, and results may vary significantly from more established models.
config.json CHANGED
@@ -1,5 +1,4 @@
1
  {
2
- "_name_or_path": "./hf_model",
3
  "activation_function": "gelu_new",
4
  "architectures": [
5
  "GPT2LMHeadModel"
@@ -25,7 +24,7 @@
25
  "summary_proj_to_labels": true,
26
  "summary_type": "cls_index",
27
  "summary_use_proj": true,
28
- "torch_dtype": "float32",
29
  "transformers_version": "4.48.1",
30
  "use_cache": true,
31
  "vocab_size": 50257
 
1
  {
 
2
  "activation_function": "gelu_new",
3
  "architectures": [
4
  "GPT2LMHeadModel"
 
24
  "summary_proj_to_labels": true,
25
  "summary_type": "cls_index",
26
  "summary_use_proj": true,
27
+ "torch_dtype": "bfloat16",
28
  "transformers_version": "4.48.1",
29
  "use_cache": true,
30
  "vocab_size": 50257
generation_config.json CHANGED
@@ -4,6 +4,6 @@
4
  "max_new_tokens": 30,
5
  "min_new_tokens": 1,
6
  "pad_token_id": 50256,
7
- "temperature": 0.1,
8
  "transformers_version": "4.48.1"
9
  }
 
4
  "max_new_tokens": 30,
5
  "min_new_tokens": 1,
6
  "pad_token_id": 50256,
7
+ "temperature": 0.7,
8
  "transformers_version": "4.48.1"
9
  }
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:c08f088e35b8ad14bc0b6d1bb4077a08c8e34cb6c894a99a1991c7ea392a885c
3
- size 497774208
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:420e9f35cbdddd9730e940a210a914c47c3c678fb8687f87fee20b1d6851cef7
3
+ size 248894656