CesarChaMal commited on
Commit
af6613d
·
verified ·
1 Parent(s): 2f704ee

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -1,199 +1,203 @@
1
- ---
2
- library_name: transformers
3
- tags: []
4
- ---
5
-
6
- # Model Card for Model ID
7
-
8
- <!-- Provide a quick summary of what the model is/does. -->
9
-
10
-
11
 
12
- ## Model Details
13
 
14
- ### Model Description
15
 
16
- <!-- Provide a longer summary of what this model is. -->
 
 
 
 
17
 
18
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
 
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
-
28
- ### Model Sources [optional]
29
-
30
- <!-- Provide the basic links for the model. -->
31
-
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
 
36
  ## Uses
37
 
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
-
40
  ### Direct Use
41
 
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
-
44
- [More Information Needed]
45
-
46
- ### Downstream Use [optional]
47
 
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
 
50
- [More Information Needed]
 
51
 
52
- ### Out-of-Scope Use
53
-
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
-
56
- [More Information Needed]
57
-
58
- ## Bias, Risks, and Limitations
59
-
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
 
62
- [More Information Needed]
 
 
63
 
64
- ### Recommendations
65
-
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
-
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
 
69
 
70
- ## How to Get Started with the Model
71
-
72
- Use the code below to get started with the model.
73
 
74
- [More Information Needed]
 
 
75
 
76
  ## Training Details
77
 
78
  ### Training Data
79
 
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
-
82
- [More Information Needed]
 
 
83
 
84
  ### Training Procedure
85
 
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
 
 
 
87
 
88
- #### Preprocessing [optional]
89
 
90
- [More Information Needed]
91
-
92
-
93
- #### Training Hyperparameters
94
-
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
-
97
- #### Speeds, Sizes, Times [optional]
98
-
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
-
101
- [More Information Needed]
102
 
103
  ## Evaluation
104
 
105
- <!-- This section describes the evaluation protocols and provides the results. -->
106
-
107
- ### Testing Data, Factors & Metrics
108
-
109
- #### Testing Data
110
-
111
- <!-- This should link to a Dataset Card if possible. -->
112
-
113
- [More Information Needed]
114
-
115
- #### Factors
116
-
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
 
119
- [More Information Needed]
120
 
121
- #### Metrics
 
 
 
 
 
 
 
 
 
 
122
 
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
 
125
- [More Information Needed]
126
 
127
- ### Results
128
-
129
- [More Information Needed]
130
-
131
- #### Summary
132
-
133
-
134
-
135
- ## Model Examination [optional]
136
-
137
- <!-- Relevant interpretability work for the model goes here -->
138
-
139
- [More Information Needed]
140
 
141
- ## Environmental Impact
142
 
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
 
 
 
144
 
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
 
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
 
153
- ## Technical Specifications [optional]
154
 
155
- ### Model Architecture and Objective
156
 
157
- [More Information Needed]
 
 
 
158
 
159
  ### Compute Infrastructure
160
 
161
- [More Information Needed]
 
 
 
162
 
163
- #### Hardware
164
 
165
- [More Information Needed]
166
 
167
- #### Software
 
 
168
 
169
- [More Information Needed]
170
 
171
- ## Citation [optional]
 
172
 
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
 
 
 
174
 
175
- **BibTeX:**
 
 
176
 
177
- [More Information Needed]
 
 
 
 
 
 
 
 
 
178
 
179
- **APA:**
 
 
 
180
 
181
- [More Information Needed]
182
 
183
- ## Glossary [optional]
184
 
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
 
 
 
 
 
186
 
187
- [More Information Needed]
188
 
189
- ## More Information [optional]
190
 
191
- [More Information Needed]
 
 
 
 
 
 
 
192
 
193
- ## Model Card Authors [optional]
194
 
195
- [More Information Needed]
 
 
196
 
197
- ## Model Card Contact
198
 
199
- [More Information Needed]
 
1
+ # JVM Troubleshooting Assistant
 
 
 
 
 
 
 
 
 
2
 
3
+ ## Model Description
4
 
5
+ This is a fine-tuned conversational AI model specialized in JVM (Java Virtual Machine) troubleshooting and performance optimization. The model has been trained on domain-specific Q&A pairs generated from JVM troubleshooting documentation to provide expert-level assistance with Java application issues.
6
 
7
+ - **Developed by:** CesarChaMal
8
+ - **Model type:** Conversational AI / Question-Answering
9
+ - **Language(s):** English
10
+ - **License:** MIT
11
+ - **Finetuned from model:** microsoft/DialoGPT-small
12
 
13
+ ## Model Sources
14
 
15
+ - **Repository:** https://github.com/CesarChaMal/python_process_custom_data_from_pdf
16
+ - **Dataset:** https://huggingface.co/datasets/CesarChaMal/jvm_troubleshooting_guide
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
  ## Uses
19
 
 
 
20
  ### Direct Use
21
 
22
+ This model is designed for:
23
+ - **JVM Troubleshooting:** Diagnosing memory issues, OutOfMemoryErrors, and performance problems
24
+ - **Performance Optimization:** Recommending JVM parameters and tuning strategies
25
+ - **Technical Support:** Providing expert guidance on Java application issues
26
+ - **Educational Purposes:** Teaching JVM concepts and best practices
27
 
28
+ ### Example Usage
29
 
30
+ ```python
31
+ from transformers import AutoTokenizer, AutoModelForCausalLM
32
 
33
+ tokenizer = AutoTokenizer.from_pretrained("CesarChaMal/jvm_troubleshooting_model")
34
+ model = AutoModelForCausalLM.from_pretrained("CesarChaMal/jvm_troubleshooting_model")
 
 
 
 
 
 
 
35
 
36
+ # Format your question
37
+ question = "What are common JVM memory issues?"
38
+ input_text = f"### Human: {question}\n### Assistant:"
39
 
40
+ # Generate response
41
+ inputs = tokenizer(input_text, return_tensors='pt')
42
+ outputs = model.generate(**inputs, max_new_tokens=150, temperature=0.7)
43
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
44
+ print(response.split("### Assistant:")[-1].strip())
45
+ ```
46
 
47
+ ### Out-of-Scope Use
 
 
48
 
49
+ - **General Programming Questions:** Not optimized for non-JVM related programming issues
50
+ - **Production Critical Decisions:** Always verify recommendations with official documentation
51
+ - **Non-English Languages:** Trained primarily on English content
52
 
53
  ## Training Details
54
 
55
  ### Training Data
56
 
57
+ The model was fine-tuned on a custom dataset of JVM troubleshooting Q&A pairs:
58
+ - **Source:** JVM troubleshooting guide PDF documentation
59
+ - **Generation Method:** AI-powered Q&A pair creation using OLLAMA
60
+ - **Dataset Size:** 100 training examples, 50 test examples
61
+ - **Format:** Conversational format with "### Human:" and "### Assistant:" markers
62
 
63
  ### Training Procedure
64
 
65
+ - **Fine-tuning Method:** Full fine-tuning
66
+ - **Base Model:** microsoft/DialoGPT-small
67
+ - **Training Framework:** Hugging Face Transformers
68
+ - **Optimization:** AdamW optimizer with linear learning rate scheduling
69
 
70
+ ### Training Hyperparameters
71
 
72
+ - **Training regime:** Full fine-tuning
73
+ - **Learning rate:** 5e-5
74
+ - **Batch size:** 2
75
+ - **Number of epochs:** 3
76
+ - **Sequence length:** 512 tokens
77
+ - **Warmup steps:** 50
 
 
 
 
 
 
78
 
79
  ## Evaluation
80
 
81
+ ### Test Questions
 
 
 
 
 
 
 
 
 
 
 
 
82
 
83
+ The model has been evaluated on 11 key JVM troubleshooting topics:
84
 
85
+ 1. Common JVM memory issues
86
+ 2. OutOfMemoryError troubleshooting
87
+ 3. JVM performance parameters
88
+ 4. Garbage collection log analysis
89
+ 5. High CPU usage diagnosis
90
+ 6. Memory leak debugging
91
+ 7. JVM monitoring best practices
92
+ 8. Startup time optimization
93
+ 9. JVM profiling tools
94
+ 10. StackOverflowError handling
95
+ 11. Heap vs non-heap memory differences
96
 
97
+ ### Performance
98
 
99
+ The model demonstrates strong domain knowledge in JVM troubleshooting scenarios and provides contextually relevant responses for technical support use cases.
100
 
101
+ ## Bias, Risks, and Limitations
 
 
 
 
 
 
 
 
 
 
 
 
102
 
103
+ ### Limitations
104
 
105
+ - **Domain Specific:** Optimized for JVM/Java topics, may not perform well on other subjects
106
+ - **Training Data Scope:** Limited to the knowledge present in the source documentation
107
+ - **Model Size:** 117M parameters may limit response complexity compared to larger models
108
+ - **Factual Accuracy:** Always verify technical recommendations with official documentation
109
 
110
+ ### Recommendations
111
 
112
+ - Use as a starting point for JVM troubleshooting research
113
+ - Verify all technical recommendations before implementing in production
114
+ - Combine with official Java/JVM documentation for comprehensive guidance
115
+ - Consider the model's training data limitations when evaluating responses
 
116
 
117
+ ## Technical Specifications
118
 
119
+ ### Model Architecture
120
 
121
+ - **Architecture:** Transformer-based language model
122
+ - **Parameters:** ~117M
123
+ - **Context Length:** 512 tokens
124
+ - **Vocabulary Size:** 50257
125
 
126
  ### Compute Infrastructure
127
 
128
+ - **Hardware:** Consumer-grade GPU (RTX series) or CPU
129
+ - **Training Time:** ~30 minutes
130
+ - **Framework:** PyTorch + Hugging Face Transformers
131
+ - **Fine-tuning Technique:** Full fine-tuning
132
 
133
+ ## How to Get Started
134
 
135
+ ### Installation
136
 
137
+ ```bash
138
+ pip install transformers torch
139
+ ```
140
 
141
+ ### Quick Start
142
 
143
+ ```python
144
+ from transformers import AutoTokenizer, AutoModelForCausalLM
145
 
146
+ # Load model and tokenizer
147
+ model_name = "CesarChaMal/jvm_troubleshooting_model"
148
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
149
+ model = AutoModelForCausalLM.from_pretrained(model_name)
150
 
151
+ # Ask a question
152
+ question = "How do I troubleshoot OutOfMemoryError?"
153
+ input_text = f"### Human: {question}\n### Assistant:"
154
 
155
+ # Generate response
156
+ inputs = tokenizer(input_text, return_tensors='pt', truncation=True, max_length=512)
157
+ with torch.no_grad():
158
+ outputs = model.generate(
159
+ **inputs,
160
+ max_new_tokens=150,
161
+ temperature=0.7,
162
+ do_sample=True,
163
+ pad_token_id=tokenizer.eos_token_id
164
+ )
165
 
166
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
167
+ answer = response.split("### Assistant:")[-1].strip()
168
+ print(answer)
169
+ ```
170
 
171
+ ### Interactive Testing
172
 
173
+ Clone the repository for interactive testing tools:
174
 
175
+ ```bash
176
+ git clone https://github.com/CesarChaMal/python_process_custom_data_from_pdf
177
+ cd python_process_custom_data_from_pdf
178
+ python test_model.py # Interactive chat
179
+ python quick_test.py # Batch testing
180
+ ```
181
 
182
+ ## Citation
183
 
184
+ If you use this model in your research or applications, please cite:
185
 
186
+ ```bibtex
187
+ @misc{jvm_troubleshooting_model,
188
+ title={JVM Troubleshooting Assistant: A Fine-tuned Conversational AI Model},
189
+ author={CesarChaMal},
190
+ year={2024},
191
+ url={https://huggingface.co/CesarChaMal/jvm_troubleshooting_model}
192
+ }
193
+ ```
194
 
195
+ ## Model Card Contact
196
 
197
+ For questions or issues regarding this model, please:
198
+ - Open an issue in the [GitHub repository](https://github.com/CesarChaMal/python_process_custom_data_from_pdf)
199
+ - Contact: [Your contact information]
200
 
201
+ ---
202
 
203
+ *This model card was automatically generated as part of the PDF to Q&A Dataset Generator pipeline.*
checkpoint-39/chat_template.jinja ADDED
@@ -0,0 +1 @@
 
 
1
+ {% for message in messages %}{{ message.content }}{{ eos_token }}{% endfor %}
checkpoint-39/config.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "activation_function": "gelu_new",
3
+ "architectures": [
4
+ "GPT2LMHeadModel"
5
+ ],
6
+ "attn_pdrop": 0.1,
7
+ "bos_token_id": 50256,
8
+ "dtype": "float16",
9
+ "embd_pdrop": 0.1,
10
+ "eos_token_id": 50256,
11
+ "initializer_range": 0.02,
12
+ "layer_norm_epsilon": 1e-05,
13
+ "model_type": "gpt2",
14
+ "n_ctx": 1024,
15
+ "n_embd": 1280,
16
+ "n_head": 20,
17
+ "n_inner": null,
18
+ "n_layer": 36,
19
+ "n_positions": 1024,
20
+ "reorder_and_upcast_attn": false,
21
+ "resid_pdrop": 0.1,
22
+ "scale_attn_by_inverse_layer_idx": false,
23
+ "scale_attn_weights": true,
24
+ "summary_activation": null,
25
+ "summary_first_dropout": 0.1,
26
+ "summary_proj_to_labels": true,
27
+ "summary_type": "cls_index",
28
+ "summary_use_proj": true,
29
+ "task_specific_params": {
30
+ "conversational": {
31
+ "max_length": 1000
32
+ }
33
+ },
34
+ "transformers_version": "4.56.2",
35
+ "use_cache": true,
36
+ "vocab_size": 50257
37
+ }
checkpoint-39/generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 50256,
4
+ "eos_token_id": 50256,
5
+ "transformers_version": "4.56.2"
6
+ }
checkpoint-39/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-39/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dae6a24423332f62a0b844e5b48d562159c5b800726ad4cb9ee29299d6ead2c1
3
+ size 1548105416
checkpoint-39/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fd812da8c14d0175777728c126f1d2fe1cab8619ffdbda7784377887fa0c770f
3
+ size 3096491711
checkpoint-39/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7558caba0c912b5a7f57ea13a0b8ad40b237df30d9c71b15f52b323e3d224f5c
3
+ size 14645
checkpoint-39/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5086d99b20db6eac0059c7f255a5f24f8811e5ab9af233823055cdc26b5f0dc3
3
+ size 1465
checkpoint-39/special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|endoftext|>",
4
+ "lstrip": false,
5
+ "normalized": true,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|endoftext|>",
11
+ "lstrip": false,
12
+ "normalized": true,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "<|endoftext|>",
17
+ "unk_token": {
18
+ "content": "<|endoftext|>",
19
+ "lstrip": false,
20
+ "normalized": true,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
checkpoint-39/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-39/tokenizer_config.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "50256": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": true,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ }
13
+ },
14
+ "bos_token": "<|endoftext|>",
15
+ "clean_up_tokenization_spaces": true,
16
+ "eos_token": "<|endoftext|>",
17
+ "errors": "replace",
18
+ "extra_special_tokens": {},
19
+ "model_max_length": 1024,
20
+ "pad_token": "<|endoftext|>",
21
+ "tokenizer_class": "GPT2Tokenizer",
22
+ "unk_token": "<|endoftext|>"
23
+ }
checkpoint-39/trainer_state.json ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": null,
3
+ "best_metric": null,
4
+ "best_model_checkpoint": null,
5
+ "epoch": 3.0,
6
+ "eval_steps": 100,
7
+ "global_step": 39,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.8,
14
+ "grad_norm": NaN,
15
+ "learning_rate": 2.7e-06,
16
+ "loss": 684.0879,
17
+ "step": 10
18
+ },
19
+ {
20
+ "epoch": 1.56,
21
+ "grad_norm": NaN,
22
+ "learning_rate": 5.7000000000000005e-06,
23
+ "loss": 0.0,
24
+ "step": 20
25
+ },
26
+ {
27
+ "epoch": 2.32,
28
+ "grad_norm": NaN,
29
+ "learning_rate": 8.7e-06,
30
+ "loss": 0.0,
31
+ "step": 30
32
+ }
33
+ ],
34
+ "logging_steps": 10,
35
+ "max_steps": 39,
36
+ "num_input_tokens_seen": 0,
37
+ "num_train_epochs": 3,
38
+ "save_steps": 100,
39
+ "stateful_callbacks": {
40
+ "TrainerControl": {
41
+ "args": {
42
+ "should_epoch_stop": false,
43
+ "should_evaluate": false,
44
+ "should_log": false,
45
+ "should_save": true,
46
+ "should_training_stop": true
47
+ },
48
+ "attributes": {}
49
+ }
50
+ },
51
+ "total_flos": 979278888960000.0,
52
+ "train_batch_size": 2,
53
+ "trial_name": null,
54
+ "trial_params": null
55
+ }
checkpoint-39/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:727c690971fc5ec923ae6674f94581184a426a8d33ff9d1b0381b9e5b434b81f
3
+ size 5777
checkpoint-39/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
config.json CHANGED
@@ -1,37 +1,37 @@
1
- {
2
- "activation_function": "gelu_new",
3
- "architectures": [
4
- "GPT2LMHeadModel"
5
- ],
6
- "attn_pdrop": 0.1,
7
- "bos_token_id": 50256,
8
- "dtype": "float32",
9
- "embd_pdrop": 0.1,
10
- "eos_token_id": 50256,
11
- "initializer_range": 0.02,
12
- "layer_norm_epsilon": 1e-05,
13
- "model_type": "gpt2",
14
- "n_ctx": 1024,
15
- "n_embd": 1280,
16
- "n_head": 20,
17
- "n_inner": null,
18
- "n_layer": 36,
19
- "n_positions": 1024,
20
- "reorder_and_upcast_attn": false,
21
- "resid_pdrop": 0.1,
22
- "scale_attn_by_inverse_layer_idx": false,
23
- "scale_attn_weights": true,
24
- "summary_activation": null,
25
- "summary_first_dropout": 0.1,
26
- "summary_proj_to_labels": true,
27
- "summary_type": "cls_index",
28
- "summary_use_proj": true,
29
- "task_specific_params": {
30
- "conversational": {
31
- "max_length": 1000
32
- }
33
- },
34
- "transformers_version": "4.56.2",
35
- "use_cache": true,
36
- "vocab_size": 50257
37
- }
 
1
+ {
2
+ "activation_function": "gelu_new",
3
+ "architectures": [
4
+ "GPT2LMHeadModel"
5
+ ],
6
+ "attn_pdrop": 0.1,
7
+ "bos_token_id": 50256,
8
+ "dtype": "float16",
9
+ "embd_pdrop": 0.1,
10
+ "eos_token_id": 50256,
11
+ "initializer_range": 0.02,
12
+ "layer_norm_epsilon": 1e-05,
13
+ "model_type": "gpt2",
14
+ "n_ctx": 1024,
15
+ "n_embd": 1280,
16
+ "n_head": 20,
17
+ "n_inner": null,
18
+ "n_layer": 36,
19
+ "n_positions": 1024,
20
+ "reorder_and_upcast_attn": false,
21
+ "resid_pdrop": 0.1,
22
+ "scale_attn_by_inverse_layer_idx": false,
23
+ "scale_attn_weights": true,
24
+ "summary_activation": null,
25
+ "summary_first_dropout": 0.1,
26
+ "summary_proj_to_labels": true,
27
+ "summary_type": "cls_index",
28
+ "summary_use_proj": true,
29
+ "task_specific_params": {
30
+ "conversational": {
31
+ "max_length": 1000
32
+ }
33
+ },
34
+ "transformers_version": "4.56.2",
35
+ "use_cache": true,
36
+ "vocab_size": 50257
37
+ }
generation_config.json CHANGED
@@ -1,6 +1,6 @@
1
- {
2
- "_from_model_config": true,
3
- "bos_token_id": 50256,
4
- "eos_token_id": 50256,
5
- "transformers_version": "4.56.2"
6
- }
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 50256,
4
+ "eos_token_id": 50256,
5
+ "transformers_version": "4.56.2"
6
+ }
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:407e71124a6f0c552d3c49d9d8c4150557defa597d071f1ed7137ff29aee5b2a
3
- size 3096165928
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dae6a24423332f62a0b844e5b48d562159c5b800726ad4cb9ee29299d6ead2c1
3
+ size 1548105416
special_tokens_map.json CHANGED
@@ -1,24 +1,24 @@
1
- {
2
- "bos_token": {
3
- "content": "<|endoftext|>",
4
- "lstrip": false,
5
- "normalized": true,
6
- "rstrip": false,
7
- "single_word": false
8
- },
9
- "eos_token": {
10
- "content": "<|endoftext|>",
11
- "lstrip": false,
12
- "normalized": true,
13
- "rstrip": false,
14
- "single_word": false
15
- },
16
- "pad_token": "<|endoftext|>",
17
- "unk_token": {
18
- "content": "<|endoftext|>",
19
- "lstrip": false,
20
- "normalized": true,
21
- "rstrip": false,
22
- "single_word": false
23
- }
24
- }
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|endoftext|>",
4
+ "lstrip": false,
5
+ "normalized": true,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|endoftext|>",
11
+ "lstrip": false,
12
+ "normalized": true,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "<|endoftext|>",
17
+ "unk_token": {
18
+ "content": "<|endoftext|>",
19
+ "lstrip": false,
20
+ "normalized": true,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
tokenizer.json CHANGED
@@ -2,13 +2,13 @@
2
  "version": "1.0",
3
  "truncation": {
4
  "direction": "Right",
5
- "max_length": 512,
6
  "strategy": "LongestFirst",
7
  "stride": 0
8
  },
9
  "padding": {
10
  "strategy": {
11
- "Fixed": 512
12
  },
13
  "direction": "Right",
14
  "pad_to_multiple_of": null,
 
2
  "version": "1.0",
3
  "truncation": {
4
  "direction": "Right",
5
+ "max_length": 768,
6
  "strategy": "LongestFirst",
7
  "stride": 0
8
  },
9
  "padding": {
10
  "strategy": {
11
+ "Fixed": 768
12
  },
13
  "direction": "Right",
14
  "pad_to_multiple_of": null,
tokenizer_config.json CHANGED
@@ -1,23 +1,23 @@
1
- {
2
- "add_bos_token": false,
3
- "add_prefix_space": false,
4
- "added_tokens_decoder": {
5
- "50256": {
6
- "content": "<|endoftext|>",
7
- "lstrip": false,
8
- "normalized": true,
9
- "rstrip": false,
10
- "single_word": false,
11
- "special": true
12
- }
13
- },
14
- "bos_token": "<|endoftext|>",
15
- "clean_up_tokenization_spaces": true,
16
- "eos_token": "<|endoftext|>",
17
- "errors": "replace",
18
- "extra_special_tokens": {},
19
- "model_max_length": 1024,
20
- "pad_token": "<|endoftext|>",
21
- "tokenizer_class": "GPT2Tokenizer",
22
- "unk_token": "<|endoftext|>"
23
- }
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "50256": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": true,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ }
13
+ },
14
+ "bos_token": "<|endoftext|>",
15
+ "clean_up_tokenization_spaces": true,
16
+ "eos_token": "<|endoftext|>",
17
+ "errors": "replace",
18
+ "extra_special_tokens": {},
19
+ "model_max_length": 1024,
20
+ "pad_token": "<|endoftext|>",
21
+ "tokenizer_class": "GPT2Tokenizer",
22
+ "unk_token": "<|endoftext|>"
23
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:727c690971fc5ec923ae6674f94581184a426a8d33ff9d1b0381b9e5b434b81f
3
+ size 5777
training_log.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "loss": 684.0879,
4
+ "grad_norm": NaN,
5
+ "learning_rate": 2.7e-06,
6
+ "epoch": 0.8,
7
+ "step": 10
8
+ },
9
+ {
10
+ "loss": 0.0,
11
+ "grad_norm": NaN,
12
+ "learning_rate": 5.7000000000000005e-06,
13
+ "epoch": 1.56,
14
+ "step": 20
15
+ },
16
+ {
17
+ "loss": 0.0,
18
+ "grad_norm": NaN,
19
+ "learning_rate": 8.7e-06,
20
+ "epoch": 2.32,
21
+ "step": 30
22
+ },
23
+ {
24
+ "train_runtime": 105.4702,
25
+ "train_samples_per_second": 2.844,
26
+ "train_steps_per_second": 0.37,
27
+ "total_flos": 979278888960000.0,
28
+ "train_loss": 175.40715144230768,
29
+ "epoch": 3.0,
30
+ "step": 39
31
+ }
32
+ ]