Update README.md
Browse files
README.md
CHANGED
|
@@ -31,8 +31,9 @@ The model was fine-tuned using the following datasets:
|
|
| 31 |
1. **LimYeri/LeetCode_with_Solutions**: This dataset contains Leetcode problems along with their hints, user solutions that have received at least 10 votes, and summaries of Leetcode solution videos from YouTube. These summaries have been processed using the Chain of Thought (CoT) method via commercial Large Language Model (LLM). The 'content' column houses the solutions and captions(CoT Summary), providing detailed explanations, thought processes, and step-by-step instructions for solving the coding problems.
|
| 32 |
|
| 33 |
## Training Procedure
|
| 34 |
-
The model was fine-tuned using the Hugging Face Transformer library. The base model, [gemma-7b-it](https://huggingface.co/google/gemma-7b-it), was further trained on the combined dataset of LeetCode user solutions and YouTube video captions(CoT Summary). This fine-tuning process was designed to enhance the model's understanding of coding concepts and problem-solving strategies, and improve its ability to generate relevant code snippets and explanations.
|
| 35 |
-
|
|
|
|
| 36 |
## Bias and Limitations
|
| 37 |
- The model's knowledge is primarily based on the LeetCode user solutions and YouTube video captions(CoT Summary) used for fine-tuning. It may have limitations in handling coding problems or concepts that are not well-represented in the training data.
|
| 38 |
- The model's responses are generated based on patterns and information learned from the training data. It may sometimes produce incorrect or suboptimal solutions. Users should always review and verify the generated code before using it in practice.
|
|
|
|
| 31 |
1. **LimYeri/LeetCode_with_Solutions**: This dataset contains Leetcode problems along with their hints, user solutions that have received at least 10 votes, and summaries of Leetcode solution videos from YouTube. These summaries have been processed using the Chain of Thought (CoT) method via commercial Large Language Model (LLM). The 'content' column houses the solutions and captions(CoT Summary), providing detailed explanations, thought processes, and step-by-step instructions for solving the coding problems.
|
| 32 |
|
| 33 |
## Training Procedure
|
| 34 |
+
- The model was fine-tuned using the Hugging Face Transformer library. The base model, [gemma-7b-it](https://huggingface.co/google/gemma-7b-it), was further trained on the combined dataset of LeetCode user solutions and YouTube video captions(CoT Summary). This fine-tuning process was designed to enhance the model's understanding of coding concepts and problem-solving strategies, and improve its ability to generate relevant code snippets and explanations.
|
| 35 |
+
- The model was trained using the QLoRA technique with 4-bit quantization on the dataset.
|
| 36 |
+
|
| 37 |
## Bias and Limitations
|
| 38 |
- The model's knowledge is primarily based on the LeetCode user solutions and YouTube video captions(CoT Summary) used for fine-tuning. It may have limitations in handling coding problems or concepts that are not well-represented in the training data.
|
| 39 |
- The model's responses are generated based on patterns and information learned from the training data. It may sometimes produce incorrect or suboptimal solutions. Users should always review and verify the generated code before using it in practice.
|