kssrikar4 commited on
Commit
b970670
·
verified ·
1 Parent(s): 001ef8a

Delete README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -153
README.md DELETED
@@ -1,153 +0,0 @@
1
- ---
2
- library_name: transformers
3
- license: llama3.2
4
- base_model: meta-llama/Llama-3.2-1B
5
- tags:
6
- - generated_from_trainer
7
- model-index:
8
- - name: Intellecta
9
- results: []
10
- datasets:
11
- - fka/awesome-chatgpt-prompts
12
- - BAAI/Infinity-Instruct
13
- - allenai/WildChat-1M
14
- - lavita/ChatDoctor-HealthCareMagic-100k
15
- - zjunlp/Mol-Instructions
16
- - garage-bAInd/Open-Platypus
17
- language:
18
- - en
19
- ---
20
-
21
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
22
- should probably proofread and complete it, then remove this comment. -->
23
-
24
- # Intellecta
25
-
26
- This model is a fine-tuned version of [meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B) on an unknown dataset.
27
-
28
- ## Model description
29
-
30
- The model is based on LLaMA (Large Language Model Meta AI), a family of state-of-the-art language models developed for natural language understanding and generation. This specific implementation uses the LLaMA 3.2-1B model, which is fine-tuned for general-purpose conversational AI tasks.
31
-
32
- Architecture: Transformer-based causal language model.
33
- Tokenization: Uses the AutoTokenizer compatible with the LLaMA model, with adjustments to ensure proper padding.
34
- Pre-trained Foundation: The model builds on the pre-trained weights of LLaMA, focusing on improving performance for conversational and instruction-based tasks.
35
- Implementation: Developed with Hugging Face’s Transformers library for extensibility and ease of use.
36
-
37
- ## Intended uses & limitations
38
-
39
- Intended Uses
40
- Instruction-following tasks: Can perform tasks such as answering questions, summarizing, and text generation.
41
- Conversational agents: Suitable for chatbots and virtual assistants, including those in specialized domains like healthcare or education.
42
- Research and Development: Fine-tuning and benchmarking against datasets for downstream tasks.
43
-
44
- ## Training and evaluation data
45
-
46
- Datasets Used
47
- fka/awesome-chatgpt-prompts: General-purpose instruction-following and conversational dataset based on GPT-like interactions.
48
- BAAI/Infinity-Instruct (3M): A large instruction dataset containing a wide variety of tasks and instructions.
49
- allenai/WildChat-1M: Focused on open-ended conversational data.
50
- lavita/ChatDoctor-HealthCareMagic-100k: Healthcare-specific dataset for medical conversational agents.
51
- zjunlp/Mol-Instructions: Molecular biology-related instructions.
52
- garage-bAInd/Open-Platypus: Dataset aimed at general-purpose, open-domain reasoning.
53
- Data Preprocessing
54
- Text prompts and responses are tokenized with padding and truncation.
55
- Labels are derived from input tokens, masking padding tokens with -100 to exclude them from loss computation.
56
-
57
- ## Training procedure
58
- The training procedure for the model fine-tunes the pre-trained LLaMA 3.2-1B model on various datasets with a focus on instruction-following and conversational tasks. Below are the key aspects of the training process:
59
-
60
- 1. Preprocessing
61
- Tokenization:
62
-
63
- The input prompts and their responses are tokenized using the AutoTokenizer configured for LLaMA.
64
- Special considerations:
65
- Padding tokens are explicitly handled using the pad_token (set to the eos_token if undefined).
66
- Inputs are truncated to a maximum length of 512 tokens to fit model constraints.
67
- Label Preparation:
68
-
69
- Input IDs are cloned to create labels for supervised learning.
70
- Padding tokens in labels are masked with -100 to ensure they are ignored during loss computation.
71
- Dataset Mapping:
72
-
73
- Each dataset's prompt field is tokenized and reformatted into the model’s required input-output structure.
74
- Non-standard datasets without a prompt column are skipped to avoid errors.
75
-
76
- 2. Model Setup
77
- Pre-trained Model:
78
-
79
- The base model, meta-llama/Llama-3.2-1B, is loaded with pre-trained weights.
80
- It is fine-tuned for causal language modeling, focusing on instruction-based outputs.
81
- Tokenizer Setup:
82
-
83
- The tokenizer ensures consistency in encoding and decoding for the model.
84
- Padding is fixed (using eos_token as a fallback).
85
-
86
- 3. Training Configuration
87
- TrainingArguments:
88
-
89
- The Hugging Face TrainingArguments object is used to configure the training process:
90
- Output Directory: llama_output stores the model checkpoints and logs.
91
- Epochs: 4 epochs for a balance between training time and generalization.
92
- Batch Size: 4 examples per device to handle memory constraints.
93
- Gradient Accumulation: 4 steps to simulate a larger effective batch size.
94
- Learning Rate: 1e-4 with a warmup phase of 500 steps for stable optimization.
95
- Weight Decay: 0.01 to mitigate overfitting.
96
- Mixed Precision: FP16 (half-precision) is used for faster training and reduced memory usage.
97
- Logging Steps: Logs are generated every 10 steps to monitor training progress.
98
- Checkpointing: Model checkpoints are saved at the end of each epoch.
99
- Push to Hub: The fine-tuned model is uploaded to Hugging Face’s Hub (kssrikar4/Intellecta).
100
- Data Collator:
101
-
102
- The DataCollatorForSeq2Seq ensures that batches are dynamically padded for efficiency during training.
103
-
104
- 4. Fine-Tuning Process
105
- Trainer:
106
-
107
- The Hugging Face Trainer class orchestrates the training process, combining the model, data, and training configuration.
108
- Loss is computed for each batch using the model's outputs (e.g., logits) and the prepared labels.
109
- The optimizer and learning rate scheduler are managed internally by the Trainer.
110
- Training Loop:
111
-
112
- During each epoch:
113
- The model processes batches of tokenized prompts and computes the causal language modeling (CLM) loss.
114
- Gradients are accumulated over multiple steps to simulate a larger batch size.
115
- Optimizer updates are applied after gradient accumulation.
116
- Validation:
117
-
118
- While validation data is not explicitly defined in the code, the Trainer supports evaluation if an eval_dataset is provided.
119
- Saving checkpoints at each epoch allows model evaluation post-training.
120
- 5. Post-Training
121
- Push to Hub:
122
-
123
- The trained model, along with its tokenizer and configuration, is pushed to the Hugging Face Hub under the ID kssrikar4/Intellecta.
124
- Usage:
125
-
126
- The fine-tuned model can be downloaded and directly used for inference or further fine-tuning.
127
-
128
-
129
- ### Training hyperparameters
130
-
131
- The following hyperparameters were used during training:
132
- - learning_rate: 0.0001
133
- - train_batch_size: 4
134
- - eval_batch_size: 8
135
- - seed: 42
136
- - gradient_accumulation_steps: 4
137
- - total_train_batch_size: 16
138
- - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
139
- - lr_scheduler_type: linear
140
- - lr_scheduler_warmup_steps: 500
141
- - num_epochs: 4
142
- - mixed_precision_training: Native AMP
143
-
144
- ### Training results
145
-
146
-
147
-
148
- ### Framework versions
149
-
150
- - Transformers 4.48.0
151
- - Pytorch 2.5.1+cpu
152
- - Datasets 3.2.0
153
- - Tokenizers 0.21.0