kssrikar4
/

Intellecta

@@ -1,153 +0,0 @@
----
-library_name: transformers
-license: llama3.2
-base_model: meta-llama/Llama-3.2-1B
-tags:
-- generated_from_trainer
-model-index:
-- name: Intellecta
-  results: []
-datasets:
-- fka/awesome-chatgpt-prompts
-- BAAI/Infinity-Instruct
-- allenai/WildChat-1M
-- lavita/ChatDoctor-HealthCareMagic-100k
-- zjunlp/Mol-Instructions
-- garage-bAInd/Open-Platypus
-language:
-- en
----
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# Intellecta
-This model is a fine-tuned version of [meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B) on an unknown dataset.
-## Model description
-The model is based on LLaMA (Large Language Model Meta AI), a family of state-of-the-art language models developed for natural language understanding and generation. This specific implementation uses the LLaMA 3.2-1B model, which is fine-tuned for general-purpose conversational AI tasks.
-Architecture: Transformer-based causal language model.
-Tokenization: Uses the AutoTokenizer compatible with the LLaMA model, with adjustments to ensure proper padding.
-Pre-trained Foundation: The model builds on the pre-trained weights of LLaMA, focusing on improving performance for conversational and instruction-based tasks.
-Implementation: Developed with Hugging Face’s Transformers library for extensibility and ease of use.
-## Intended uses & limitations
-Intended Uses
-Instruction-following tasks: Can perform tasks such as answering questions, summarizing, and text generation.
-Conversational agents: Suitable for chatbots and virtual assistants, including those in specialized domains like healthcare or education.
-Research and Development: Fine-tuning and benchmarking against datasets for downstream tasks.
-## Training and evaluation data
-Datasets Used
-fka/awesome-chatgpt-prompts: General-purpose instruction-following and conversational dataset based on GPT-like interactions.
-BAAI/Infinity-Instruct (3M): A large instruction dataset containing a wide variety of tasks and instructions.
-allenai/WildChat-1M: Focused on open-ended conversational data.
-lavita/ChatDoctor-HealthCareMagic-100k: Healthcare-specific dataset for medical conversational agents.
-zjunlp/Mol-Instructions: Molecular biology-related instructions.
-garage-bAInd/Open-Platypus: Dataset aimed at general-purpose, open-domain reasoning.
-Data Preprocessing
-Text prompts and responses are tokenized with padding and truncation.
-Labels are derived from input tokens, masking padding tokens with -100 to exclude them from loss computation.
-## Training procedure
-The training procedure for the model fine-tunes the pre-trained LLaMA 3.2-1B model on various datasets with a focus on instruction-following and conversational tasks. Below are the key aspects of the training process:
-1. Preprocessing
-Tokenization:
-The input prompts and their responses are tokenized using the AutoTokenizer configured for LLaMA.
-Special considerations:
-Padding tokens are explicitly handled using the pad_token (set to the eos_token if undefined).
-Inputs are truncated to a maximum length of 512 tokens to fit model constraints.
-Label Preparation:
-Input IDs are cloned to create labels for supervised learning.
-Padding tokens in labels are masked with -100 to ensure they are ignored during loss computation.
-Dataset Mapping:
-Each dataset's prompt field is tokenized and reformatted into the model’s required input-output structure.
-Non-standard datasets without a prompt column are skipped to avoid errors.
-2. Model Setup
-Pre-trained Model:
-The base model, meta-llama/Llama-3.2-1B, is loaded with pre-trained weights.
-It is fine-tuned for causal language modeling, focusing on instruction-based outputs.
-Tokenizer Setup:
-The tokenizer ensures consistency in encoding and decoding for the model.
-Padding is fixed (using eos_token as a fallback).
-3. Training Configuration
-TrainingArguments:
-The Hugging Face TrainingArguments object is used to configure the training process:
-Output Directory: llama_output stores the model checkpoints and logs.
-Epochs: 4 epochs for a balance between training time and generalization.
-Batch Size: 4 examples per device to handle memory constraints.
-Gradient Accumulation: 4 steps to simulate a larger effective batch size.
-Learning Rate: 1e-4 with a warmup phase of 500 steps for stable optimization.
-Weight Decay: 0.01 to mitigate overfitting.
-Mixed Precision: FP16 (half-precision) is used for faster training and reduced memory usage.
-Logging Steps: Logs are generated every 10 steps to monitor training progress.
-Checkpointing: Model checkpoints are saved at the end of each epoch.
-Push to Hub: The fine-tuned model is uploaded to Hugging Face’s Hub (kssrikar4/Intellecta).
-Data Collator:
-The DataCollatorForSeq2Seq ensures that batches are dynamically padded for efficiency during training.
-4. Fine-Tuning Process
-Trainer:
-The Hugging Face Trainer class orchestrates the training process, combining the model, data, and training configuration.
-Loss is computed for each batch using the model's outputs (e.g., logits) and the prepared labels.
-The optimizer and learning rate scheduler are managed internally by the Trainer.
-Training Loop:
-During each epoch:
-The model processes batches of tokenized prompts and computes the causal language modeling (CLM) loss.
-Gradients are accumulated over multiple steps to simulate a larger batch size.
-Optimizer updates are applied after gradient accumulation.
-Validation:
-While validation data is not explicitly defined in the code, the Trainer supports evaluation if an eval_dataset is provided.
-Saving checkpoints at each epoch allows model evaluation post-training.
-5. Post-Training
-Push to Hub:
-The trained model, along with its tokenizer and configuration, is pushed to the Hugging Face Hub under the ID kssrikar4/Intellecta.
-Usage:
-The fine-tuned model can be downloaded and directly used for inference or further fine-tuning.
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 0.0001
-- train_batch_size: 4
-- eval_batch_size: 8
-- seed: 42
-- gradient_accumulation_steps: 4
-- total_train_batch_size: 16
-- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
-- lr_scheduler_type: linear
-- lr_scheduler_warmup_steps: 500
-- num_epochs: 4
-- mixed_precision_training: Native AMP
-### Training results
-### Framework versions
-- Transformers 4.48.0
-- Pytorch 2.5.1+cpu
-- Datasets 3.2.0
-- Tokenizers 0.21.0