CesarChaMal commited on Sep 27, 2025

Commit

af6613d

verified ·

1 Parent(s): 2f704ee

Upload folder using huggingface_hub

Browse files

Files changed (23) hide show

README.md +139 -135
checkpoint-39/chat_template.jinja +1 -0
checkpoint-39/config.json +37 -0
checkpoint-39/generation_config.json +6 -0
checkpoint-39/merges.txt +0 -0
checkpoint-39/model.safetensors +3 -0
checkpoint-39/optimizer.pt +3 -0
checkpoint-39/rng_state.pth +3 -0
checkpoint-39/scheduler.pt +3 -0
checkpoint-39/special_tokens_map.json +24 -0
checkpoint-39/tokenizer.json +0 -0
checkpoint-39/tokenizer_config.json +23 -0
checkpoint-39/trainer_state.json +55 -0
checkpoint-39/training_args.bin +3 -0
checkpoint-39/vocab.json +0 -0
config.json +37 -37
generation_config.json +6 -6
model.safetensors +2 -2
special_tokens_map.json +24 -24
tokenizer.json +2 -2
tokenizer_config.json +23 -23
training_args.bin +3 -0
training_log.json +32 -0

README.md CHANGED Viewed

@@ -1,199 +1,203 @@
----
-library_name: transformers
-tags: []
----
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
-## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
 ## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 ### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
 ## Training Details
 ### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
 ### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
 ## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
 ### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]

+# JVM Troubleshooting Assistant
+## Model Description
+This is a fine-tuned conversational AI model specialized in JVM (Java Virtual Machine) troubleshooting and performance optimization. The model has been trained on domain-specific Q&A pairs generated from JVM troubleshooting documentation to provide expert-level assistance with Java application issues.
+- **Developed by:** CesarChaMal
+- **Model type:** Conversational AI / Question-Answering
+- **Language(s):** English
+- **License:** MIT
+- **Finetuned from model:** microsoft/DialoGPT-small
+## Model Sources
+- **Repository:** https://github.com/CesarChaMal/python_process_custom_data_from_pdf
+- **Dataset:** https://huggingface.co/datasets/CesarChaMal/jvm_troubleshooting_guide
 ## Uses
 ### Direct Use
+This model is designed for:
+- **JVM Troubleshooting:** Diagnosing memory issues, OutOfMemoryErrors, and performance problems
+- **Performance Optimization:** Recommending JVM parameters and tuning strategies
+- **Technical Support:** Providing expert guidance on Java application issues
+- **Educational Purposes:** Teaching JVM concepts and best practices
+### Example Usage
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+tokenizer = AutoTokenizer.from_pretrained("CesarChaMal/jvm_troubleshooting_model")
+model = AutoModelForCausalLM.from_pretrained("CesarChaMal/jvm_troubleshooting_model")
+# Format your question
+question = "What are common JVM memory issues?"
+input_text = f"### Human: {question}\n### Assistant:"
+# Generate response
+inputs = tokenizer(input_text, return_tensors='pt')
+outputs = model.generate(**inputs, max_new_tokens=150, temperature=0.7)
+response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(response.split("### Assistant:")[-1].strip())
+```
+### Out-of-Scope Use
+- **General Programming Questions:** Not optimized for non-JVM related programming issues
+- **Production Critical Decisions:** Always verify recommendations with official documentation
+- **Non-English Languages:** Trained primarily on English content
 ## Training Details
 ### Training Data
+The model was fine-tuned on a custom dataset of JVM troubleshooting Q&A pairs:
+- **Source:** JVM troubleshooting guide PDF documentation
+- **Generation Method:** AI-powered Q&A pair creation using OLLAMA
+- **Dataset Size:** 100 training examples, 50 test examples
+- **Format:** Conversational format with "### Human:" and "### Assistant:" markers
 ### Training Procedure
+- **Fine-tuning Method:** Full fine-tuning
+- **Base Model:** microsoft/DialoGPT-small
+- **Training Framework:** Hugging Face Transformers
+- **Optimization:** AdamW optimizer with linear learning rate scheduling
+### Training Hyperparameters
+- **Training regime:** Full fine-tuning
+- **Learning rate:** 5e-5
+- **Batch size:** 2
+- **Number of epochs:** 3
+- **Sequence length:** 512 tokens
+- **Warmup steps:** 50
 ## Evaluation
+### Test Questions
+The model has been evaluated on 11 key JVM troubleshooting topics:
+1. Common JVM memory issues
+2. OutOfMemoryError troubleshooting
+3. JVM performance parameters
+4. Garbage collection log analysis
+5. High CPU usage diagnosis
+6. Memory leak debugging
+7. JVM monitoring best practices
+8. Startup time optimization
+9. JVM profiling tools
+10. StackOverflowError handling
+11. Heap vs non-heap memory differences
+### Performance
+The model demonstrates strong domain knowledge in JVM troubleshooting scenarios and provides contextually relevant responses for technical support use cases.
+## Bias, Risks, and Limitations
+### Limitations
+- **Domain Specific:** Optimized for JVM/Java topics, may not perform well on other subjects
+- **Training Data Scope:** Limited to the knowledge present in the source documentation
+- **Model Size:** 117M parameters may limit response complexity compared to larger models
+- **Factual Accuracy:** Always verify technical recommendations with official documentation
+### Recommendations
+- Use as a starting point for JVM troubleshooting research
+- Verify all technical recommendations before implementing in production
+- Combine with official Java/JVM documentation for comprehensive guidance
+- Consider the model's training data limitations when evaluating responses
+## Technical Specifications
+### Model Architecture
+- **Architecture:** Transformer-based language model
+- **Parameters:** ~117M
+- **Context Length:** 512 tokens
+- **Vocabulary Size:** 50257
 ### Compute Infrastructure
+- **Hardware:** Consumer-grade GPU (RTX series) or CPU
+- **Training Time:** ~30 minutes
+- **Framework:** PyTorch + Hugging Face Transformers
+- **Fine-tuning Technique:** Full fine-tuning
+## How to Get Started
+### Installation
+```bash
+pip install transformers torch
+```
+### Quick Start
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+# Load model and tokenizer
+model_name = "CesarChaMal/jvm_troubleshooting_model"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(model_name)
+# Ask a question
+question = "How do I troubleshoot OutOfMemoryError?"
+input_text = f"### Human: {question}\n### Assistant:"
+# Generate response
+inputs = tokenizer(input_text, return_tensors='pt', truncation=True, max_length=512)
+with torch.no_grad():
+    outputs = model.generate(
+        **inputs,
+        max_new_tokens=150,
+        temperature=0.7,
+        do_sample=True,
+        pad_token_id=tokenizer.eos_token_id
+    )
+response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+answer = response.split("### Assistant:")[-1].strip()
+print(answer)
+```
+### Interactive Testing
+Clone the repository for interactive testing tools:
+```bash
+git clone https://github.com/CesarChaMal/python_process_custom_data_from_pdf
+cd python_process_custom_data_from_pdf
+python test_model.py  # Interactive chat
+python quick_test.py  # Batch testing
+```
+## Citation
+If you use this model in your research or applications, please cite:
+```bibtex
+@misc{jvm_troubleshooting_model,
+  title={JVM Troubleshooting Assistant: A Fine-tuned Conversational AI Model},
+  author={CesarChaMal},
+  year={2024},
+  url={https://huggingface.co/CesarChaMal/jvm_troubleshooting_model}
+}
+```
+## Model Card Contact
+For questions or issues regarding this model, please:
+- Open an issue in the [GitHub repository](https://github.com/CesarChaMal/python_process_custom_data_from_pdf)
+- Contact: [Your contact information]
+---
+*This model card was automatically generated as part of the PDF to Q&A Dataset Generator pipeline.*

checkpoint-39/chat_template.jinja ADDED Viewed

	@@ -0,0 +1 @@


1	+ {% for message in messages %}{{ message.content }}{{ eos_token }}{% endfor %}

checkpoint-39/config.json ADDED Viewed

	@@ -0,0 +1,37 @@

+{
+  "activation_function": "gelu_new",
+  "architectures": [
+    "GPT2LMHeadModel"
+  ],
+  "attn_pdrop": 0.1,
+  "bos_token_id": 50256,
+  "dtype": "float16",
+  "embd_pdrop": 0.1,
+  "eos_token_id": 50256,
+  "initializer_range": 0.02,
+  "layer_norm_epsilon": 1e-05,
+  "model_type": "gpt2",
+  "n_ctx": 1024,
+  "n_embd": 1280,
+  "n_head": 20,
+  "n_inner": null,
+  "n_layer": 36,
+  "n_positions": 1024,
+  "reorder_and_upcast_attn": false,
+  "resid_pdrop": 0.1,
+  "scale_attn_by_inverse_layer_idx": false,
+  "scale_attn_weights": true,
+  "summary_activation": null,
+  "summary_first_dropout": 0.1,
+  "summary_proj_to_labels": true,
+  "summary_type": "cls_index",
+  "summary_use_proj": true,
+  "task_specific_params": {
+    "conversational": {
+      "max_length": 1000
+    }
+  },
+  "transformers_version": "4.56.2",
+  "use_cache": true,
+  "vocab_size": 50257
+}

checkpoint-39/generation_config.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 50256,
+  "eos_token_id": 50256,
+  "transformers_version": "4.56.2"
+}

checkpoint-39/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-39/model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dae6a24423332f62a0b844e5b48d562159c5b800726ad4cb9ee29299d6ead2c1
+size 1548105416

checkpoint-39/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fd812da8c14d0175777728c126f1d2fe1cab8619ffdbda7784377887fa0c770f
+size 3096491711

checkpoint-39/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7558caba0c912b5a7f57ea13a0b8ad40b237df30d9c71b15f52b323e3d224f5c
+size 14645

checkpoint-39/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5086d99b20db6eac0059c7f255a5f24f8811e5ab9af233823055cdc26b5f0dc3
+size 1465

checkpoint-39/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "bos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": "<|endoftext|>",
+  "unk_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}

checkpoint-39/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-39/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,23 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "50256": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<|endoftext|>",
+  "clean_up_tokenization_spaces": true,
+  "eos_token": "<|endoftext|>",
+  "errors": "replace",
+  "extra_special_tokens": {},
+  "model_max_length": 1024,
+  "pad_token": "<|endoftext|>",
+  "tokenizer_class": "GPT2Tokenizer",
+  "unk_token": "<|endoftext|>"
+}

checkpoint-39/trainer_state.json ADDED Viewed

	@@ -0,0 +1,55 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 3.0,
+  "eval_steps": 100,
+  "global_step": 39,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.8,
+      "grad_norm": NaN,
+      "learning_rate": 2.7e-06,
+      "loss": 684.0879,
+      "step": 10
+    },
+    {
+      "epoch": 1.56,
+      "grad_norm": NaN,
+      "learning_rate": 5.7000000000000005e-06,
+      "loss": 0.0,
+      "step": 20
+    },
+    {
+      "epoch": 2.32,
+      "grad_norm": NaN,
+      "learning_rate": 8.7e-06,
+      "loss": 0.0,
+      "step": 30
+    }
+  ],
+  "logging_steps": 10,
+  "max_steps": 39,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 3,
+  "save_steps": 100,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": true
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 979278888960000.0,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-39/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:727c690971fc5ec923ae6674f94581184a426a8d33ff9d1b0381b9e5b434b81f
+size 5777

checkpoint-39/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff

config.json CHANGED Viewed

@@ -1,37 +1,37 @@
-{
-  "activation_function": "gelu_new",
-  "architectures": [
-    "GPT2LMHeadModel"
-  ],
-  "attn_pdrop": 0.1,
-  "bos_token_id": 50256,
-  "dtype": "float32",
-  "embd_pdrop": 0.1,
-  "eos_token_id": 50256,
-  "initializer_range": 0.02,
-  "layer_norm_epsilon": 1e-05,
-  "model_type": "gpt2",
-  "n_ctx": 1024,
-  "n_embd": 1280,
-  "n_head": 20,
-  "n_inner": null,
-  "n_layer": 36,
-  "n_positions": 1024,
-  "reorder_and_upcast_attn": false,
-  "resid_pdrop": 0.1,
-  "scale_attn_by_inverse_layer_idx": false,
-  "scale_attn_weights": true,
-  "summary_activation": null,
-  "summary_first_dropout": 0.1,
-  "summary_proj_to_labels": true,
-  "summary_type": "cls_index",
-  "summary_use_proj": true,
-  "task_specific_params": {
-    "conversational": {
-      "max_length": 1000
-    }
-  },
-  "transformers_version": "4.56.2",
-  "use_cache": true,
-  "vocab_size": 50257
-}

+{
+  "activation_function": "gelu_new",
+  "architectures": [
+    "GPT2LMHeadModel"
+  ],
+  "attn_pdrop": 0.1,
+  "bos_token_id": 50256,
+  "dtype": "float16",
+  "embd_pdrop": 0.1,
+  "eos_token_id": 50256,
+  "initializer_range": 0.02,
+  "layer_norm_epsilon": 1e-05,
+  "model_type": "gpt2",
+  "n_ctx": 1024,
+  "n_embd": 1280,
+  "n_head": 20,
+  "n_inner": null,
+  "n_layer": 36,
+  "n_positions": 1024,
+  "reorder_and_upcast_attn": false,
+  "resid_pdrop": 0.1,
+  "scale_attn_by_inverse_layer_idx": false,
+  "scale_attn_weights": true,
+  "summary_activation": null,
+  "summary_first_dropout": 0.1,
+  "summary_proj_to_labels": true,
+  "summary_type": "cls_index",
+  "summary_use_proj": true,
+  "task_specific_params": {
+    "conversational": {
+      "max_length": 1000
+    }
+  },
+  "transformers_version": "4.56.2",
+  "use_cache": true,
+  "vocab_size": 50257
+}

generation_config.json CHANGED Viewed

@@ -1,6 +1,6 @@
-{
-  "_from_model_config": true,
-  "bos_token_id": 50256,
-  "eos_token_id": 50256,
-  "transformers_version": "4.56.2"
-}

+{
+  "_from_model_config": true,
+  "bos_token_id": 50256,
+  "eos_token_id": 50256,
+  "transformers_version": "4.56.2"
+}

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:407e71124a6f0c552d3c49d9d8c4150557defa597d071f1ed7137ff29aee5b2a
-size 3096165928

 version https://git-lfs.github.com/spec/v1
+oid sha256:dae6a24423332f62a0b844e5b48d562159c5b800726ad4cb9ee29299d6ead2c1
+size 1548105416

special_tokens_map.json CHANGED Viewed

@@ -1,24 +1,24 @@
-{
-  "bos_token": {
-    "content": "<|endoftext|>",
-    "lstrip": false,
-    "normalized": true,
-    "rstrip": false,
-    "single_word": false
-  },
-  "eos_token": {
-    "content": "<|endoftext|>",
-    "lstrip": false,
-    "normalized": true,
-    "rstrip": false,
-    "single_word": false
-  },
-  "pad_token": "<|endoftext|>",
-  "unk_token": {
-    "content": "<|endoftext|>",
-    "lstrip": false,
-    "normalized": true,
-    "rstrip": false,
-    "single_word": false
-  }
-}

+{
+  "bos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": "<|endoftext|>",
+  "unk_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json CHANGED Viewed

@@ -2,13 +2,13 @@
   "version": "1.0",
   "truncation": {
     "direction": "Right",
-    "max_length": 512,
     "strategy": "LongestFirst",
     "stride": 0
   },
   "padding": {
     "strategy": {
-      "Fixed": 512
     },
     "direction": "Right",
     "pad_to_multiple_of": null,

   "version": "1.0",
   "truncation": {
     "direction": "Right",
+    "max_length": 768,
     "strategy": "LongestFirst",
     "stride": 0
   },
   "padding": {
     "strategy": {
+      "Fixed": 768
     },
     "direction": "Right",
     "pad_to_multiple_of": null,

tokenizer_config.json CHANGED Viewed

@@ -1,23 +1,23 @@
-{
-  "add_bos_token": false,
-  "add_prefix_space": false,
-  "added_tokens_decoder": {
-    "50256": {
-      "content": "<|endoftext|>",
-      "lstrip": false,
-      "normalized": true,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    }
-  },
-  "bos_token": "<|endoftext|>",
-  "clean_up_tokenization_spaces": true,
-  "eos_token": "<|endoftext|>",
-  "errors": "replace",
-  "extra_special_tokens": {},
-  "model_max_length": 1024,
-  "pad_token": "<|endoftext|>",
-  "tokenizer_class": "GPT2Tokenizer",
-  "unk_token": "<|endoftext|>"
-}

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "50256": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<|endoftext|>",
+  "clean_up_tokenization_spaces": true,
+  "eos_token": "<|endoftext|>",
+  "errors": "replace",
+  "extra_special_tokens": {},
+  "model_max_length": 1024,
+  "pad_token": "<|endoftext|>",
+  "tokenizer_class": "GPT2Tokenizer",
+  "unk_token": "<|endoftext|>"
+}

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:727c690971fc5ec923ae6674f94581184a426a8d33ff9d1b0381b9e5b434b81f
+size 5777

training_log.json ADDED Viewed

	@@ -0,0 +1,32 @@

+[
+  {
+    "loss": 684.0879,
+    "grad_norm": NaN,
+    "learning_rate": 2.7e-06,
+    "epoch": 0.8,
+    "step": 10
+  },
+  {
+    "loss": 0.0,
+    "grad_norm": NaN,
+    "learning_rate": 5.7000000000000005e-06,
+    "epoch": 1.56,
+    "step": 20
+  },
+  {
+    "loss": 0.0,
+    "grad_norm": NaN,
+    "learning_rate": 8.7e-06,
+    "epoch": 2.32,
+    "step": 30
+  },
+  {
+    "train_runtime": 105.4702,
+    "train_samples_per_second": 2.844,
+    "train_steps_per_second": 0.37,
+    "total_flos": 979278888960000.0,
+    "train_loss": 175.40715144230768,
+    "epoch": 3.0,
+    "step": 39
+  }
+]