update readme
Browse files
README.md
CHANGED
|
@@ -81,9 +81,9 @@ print(tokenizer.decode(outputs[0]))
|
|
| 81 |
---
|
| 82 |
## Training Details
|
| 83 |
### Training Data
|
| 84 |
-
- Dataset: Claudette
|
| 85 |
- Balanced: 1000 anomalous, 1000 normal clauses
|
| 86 |
-
- Splits: 70% train (1400), 20%
|
| 87 |
|
| 88 |
### Training Procedure
|
| 89 |
- Quantization: 4-bit (NF4, bitsandbytes)
|
|
@@ -122,25 +122,13 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
|
|
| 122 |
### Model Architecture and Objective
|
| 123 |
|
| 124 |
- Base: Saul-7B (LLaMA-style causal LM)
|
| 125 |
-
- LoRA params:
|
| 126 |
|
| 127 |
### Compute Infrastructure
|
| 128 |
|
| 129 |
- Hardware: 1x NVIDIA Titan X
|
| 130 |
- Software: PyTorch 2.2, Transformers 4.51, PEFT 0.15.2, bitsandbytes
|
| 131 |
|
| 132 |
-
|
| 133 |
-
## Citation
|
| 134 |
-
|
| 135 |
-
**APA:**
|
| 136 |
-
@misc{juttu2025saullm7b,
|
| 137 |
-
author = {Juttu, Noshitha},
|
| 138 |
-
title = {SaulLM-7B-AnomalyDetector: LoRA Fine-Tuned Model for ToS Anomaly Detection},
|
| 139 |
-
year = {2025},
|
| 140 |
-
publisher = {Hugging Face},
|
| 141 |
-
howpublished = {\url{https://huggingface.co/Noshitha98/SaulLM-7B-AnomalyDetector}}
|
| 142 |
-
}
|
| 143 |
-
|
| 144 |
## Glossary
|
| 145 |
|
| 146 |
- **LoRA (Low-Rank Adaptation):** A parameter-efficient fine-tuning method where only small adapter matrices are trained, while the large base model remains frozen. This drastically reduces compute and storage costs.\
|
|
@@ -152,8 +140,8 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
|
|
| 152 |
|
| 153 |
## Model Card Authors
|
| 154 |
|
| 155 |
-
- **Noshitha Juttu** – M.S. in Computer Science, University of Massachusetts Amherst
|
| 156 |
-
|
| 157 |
|
| 158 |
## Model Card Contact
|
| 159 |
|
|
|
|
| 81 |
---
|
| 82 |
## Training Details
|
| 83 |
### Training Data
|
| 84 |
+
- Dataset: Claudette ToS
|
| 85 |
- Balanced: 1000 anomalous, 1000 normal clauses
|
| 86 |
+
- Splits: 70% train (1400), 20% validation (400), 10% test (200)
|
| 87 |
|
| 88 |
### Training Procedure
|
| 89 |
- Quantization: 4-bit (NF4, bitsandbytes)
|
|
|
|
| 122 |
### Model Architecture and Objective
|
| 123 |
|
| 124 |
- Base: Saul-7B (LLaMA-style causal LM)
|
| 125 |
+
- LoRA params: around 13M trainable (approx. 0.18% of total)
|
| 126 |
|
| 127 |
### Compute Infrastructure
|
| 128 |
|
| 129 |
- Hardware: 1x NVIDIA Titan X
|
| 130 |
- Software: PyTorch 2.2, Transformers 4.51, PEFT 0.15.2, bitsandbytes
|
| 131 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 132 |
## Glossary
|
| 133 |
|
| 134 |
- **LoRA (Low-Rank Adaptation):** A parameter-efficient fine-tuning method where only small adapter matrices are trained, while the large base model remains frozen. This drastically reduces compute and storage costs.\
|
|
|
|
| 140 |
|
| 141 |
## Model Card Authors
|
| 142 |
|
| 143 |
+
- **Noshitha Juttu** – M.S. in Computer Science, University of Massachusetts Amherst
|
| 144 |
+
- Research focus: NLP, model compression, On device NLP and Parameter-Efficient Fine-Tuning (PEFT).
|
| 145 |
|
| 146 |
## Model Card Contact
|
| 147 |
|