Sathwik3
/

distilbert-emotion-classifier

@@ -145,10 +145,10 @@ print(result)
 ### Training Data
 The model was fine-tuned on an emotion classification dataset. Specific dataset details:
-- **Dataset:** [Dataset name and link - placeholder for specific information]
-- **Size:** [Number of training examples - placeholder]
-- **Emotion categories:** [List of emotion labels - placeholder]
-- **Data split:** [Train/validation/test split information - placeholder]
 ### Training Procedure
@@ -157,24 +157,16 @@ The model was fine-tuned on an emotion classification dataset. Specific dataset
 - Text tokenization using DistilBERT tokenizer
 - Maximum sequence length: 512 tokens
 - Truncation and padding applied as needed
-- Text normalization: [specific preprocessing steps - placeholder]
 #### Training Hyperparameters
-- **Training regime:** Mixed precision (fp16) [placeholder - adjust if different]
 - **Optimizer:** AdamW
-- **Learning rate:** [e.g., 2e-5 - placeholder]
-- **Batch size:** [e.g., 16 or 32 - placeholder]
-- **Number of epochs:** [e.g., 3-5 - placeholder]
-- **Weight decay:** [e.g., 0.01 - placeholder]
-- **Warmup steps:** [placeholder]
-- **Scheduler:** [e.g., Linear with warmup - placeholder]
-#### Training Infrastructure
-- **Hardware:** [GPU type, e.g., NVIDIA Tesla V100 - placeholder]
-- **Training time:** [Approximate duration - placeholder]
-- **Framework:** PyTorch with Hugging Face Transformers
 ## Evaluation
@@ -201,35 +193,10 @@ The model's performance is evaluated using:
 | Metric | Value |
 |--------|-------|
-| Accuracy | [e.g., 0.XX - placeholder] |
-| Macro F1 | [e.g., 0.XX - placeholder] |
-| Weighted F1 | [e.g., 0.XX - placeholder] |
-| Macro Precision | [e.g., 0.XX - placeholder] |
-| Macro Recall | [e.g., 0.XX - placeholder] |
-#### Per-Class Performance
-[Placeholder for per-class metrics table]
-| Emotion | Precision | Recall | F1-Score | Support |
-|---------|-----------|--------|----------|----------|
-| [Class 1] | [0.XX] | [0.XX] | [0.XX] | [N] |
-| [Class 2] | [0.XX] | [0.XX] | [0.XX] | [N] |
-| ... | ... | ... | ... | ... |
-### Summary
-The model demonstrates strong performance on emotion classification tasks, with particular strengths in [specific aspects - placeholder]. Areas for potential improvement include [specific areas - placeholder].
-## Environmental Impact
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [e.g., NVIDIA Tesla V100 - placeholder]
-- **Hours used:** [placeholder]
-- **Cloud Provider:** [e.g., AWS, GCP, Azure, or on-premises - placeholder]
-- **Compute Region:** [e.g., us-east-1 - placeholder]
-- **Carbon Emitted:** [e.g., XX kg CO2eq - placeholder]
 ## Technical Specifications
@@ -244,22 +211,16 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
 - **Max Sequence Length:** 512 tokens
 - **Vocabulary Size:** 30,522 tokens
-### Compute Infrastructure
-#### Hardware
-[Placeholder for specific hardware information - e.g., GPU type, CPU, memory]
 #### Software
 - **Framework:** PyTorch
 - **Library:** Hugging Face Transformers
-- **Python Version:** [e.g., 3.8+ - placeholder]
 - **Key Dependencies:**
   - transformers
   - torch
   - tokenizers
-  - datasets (if applicable)
 ## Citation

 ### Training Data
 The model was fine-tuned on an emotion classification dataset. Specific dataset details:
+- **Dataset:** Emotion dataset
+- **Size:** 16000
+- **Emotion categories:** ['sadness', 'joy', 'love', 'anger', 'fear', 'surprise']
+- **Data split:** Train,Validation,Test
 ### Training Procedure
 - Text tokenization using DistilBERT tokenizer
 - Maximum sequence length: 512 tokens
 - Truncation and padding applied as needed
 #### Training Hyperparameters
+- **Training regime:** Mixed precision (fp16)
 - **Optimizer:** AdamW
+- **Learning rate:** 2e-5
+- **Batch size:** 64
+- **Number of epochs:** 2
+- **Weight decay:** 0.01
 ## Evaluation
 | Metric | Value |
 |--------|-------|
+| Accuracy | 0.9295 |
+| Weighted F1 | 0.9292 |
 ## Technical Specifications
 - **Max Sequence Length:** 512 tokens
 - **Vocabulary Size:** 30,522 tokens
 #### Software
 - **Framework:** PyTorch
 - **Library:** Hugging Face Transformers
+- **Python Version:** 3.10
 - **Key Dependencies:**
   - transformers
   - torch
   - tokenizers
 ## Citation