Improve model card: license, model type, tags, and links (#1)
Browse files- Improve model card: license, model type, tags, and links (288f379074e59b152ca839462391e91a2d2ba070)
Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>
README.md
CHANGED
|
@@ -1,16 +1,15 @@
|
|
| 1 |
---
|
| 2 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
tags:
|
| 4 |
- language-modeling
|
| 5 |
-
- causal-lm
|
| 6 |
- bias-analysis
|
| 7 |
- cognitive-bias
|
| 8 |
-
language:
|
| 9 |
-
- en
|
| 10 |
-
base_model:
|
| 11 |
-
- google/t5-v1_1-xxl
|
| 12 |
-
pipeline_tag: text2text-generation
|
| 13 |
-
library_name: transformers
|
| 14 |
---
|
| 15 |
|
| 16 |
# Model Card for T5-Flan
|
|
@@ -23,12 +22,13 @@ This 🤗 Transformers model was finetuned using LoRA adapters for the arXiv pap
|
|
| 23 |
We study whether cognitive biases in LLMs emerge from pretraining, instruction tuning, or training randomness.
|
| 24 |
This is one of 3 identical versions trained with different random seeds.
|
| 25 |
|
| 26 |
-
-
|
| 27 |
-
-
|
| 28 |
-
-
|
| 29 |
-
-
|
| 30 |
-
-
|
| 31 |
-
-
|
|
|
|
| 32 |
|
| 33 |
## Uses
|
| 34 |
|
|
@@ -41,9 +41,9 @@ Do not use in production, sensitive domains, or decision-critical applications.
|
|
| 41 |
## How to Get Started with the Model
|
| 42 |
|
| 43 |
```python
|
| 44 |
-
from transformers import
|
| 45 |
|
| 46 |
-
model =
|
| 47 |
tokenizer = AutoTokenizer.from_pretrained("itay1itzhak/T5-Flan")
|
| 48 |
|
| 49 |
inputs = tokenizer("Example input?", return_tensors="pt")
|
|
@@ -53,26 +53,26 @@ print(tokenizer.decode(outputs[0]))
|
|
| 53 |
|
| 54 |
## Training Details
|
| 55 |
|
| 56 |
-
-
|
| 57 |
-
-
|
| 58 |
-
-
|
| 59 |
-
-
|
| 60 |
-
-
|
| 61 |
-
-
|
| 62 |
-
-
|
| 63 |
|
| 64 |
## Evaluation
|
| 65 |
|
| 66 |
-
-
|
| 67 |
-
-
|
| 68 |
-
-
|
| 69 |
|
| 70 |
## Environmental Impact
|
| 71 |
|
| 72 |
-
-
|
| 73 |
-
-
|
| 74 |
|
| 75 |
## Technical Specifications
|
| 76 |
|
| 77 |
-
-
|
| 78 |
-
-
|
|
|
|
| 1 |
---
|
| 2 |
+
base_model:
|
| 3 |
+
- google/t5-v1_1-xxl
|
| 4 |
+
language:
|
| 5 |
+
- en
|
| 6 |
+
library_name: transformers
|
| 7 |
+
license: mit
|
| 8 |
+
pipeline_tag: text2text-generation
|
| 9 |
tags:
|
| 10 |
- language-modeling
|
|
|
|
| 11 |
- bias-analysis
|
| 12 |
- cognitive-bias
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
---
|
| 14 |
|
| 15 |
# Model Card for T5-Flan
|
|
|
|
| 22 |
We study whether cognitive biases in LLMs emerge from pretraining, instruction tuning, or training randomness.
|
| 23 |
This is one of 3 identical versions trained with different random seeds.
|
| 24 |
|
| 25 |
+
- **Model type**: Encoder-Decoder transformer
|
| 26 |
+
- **Language(s)**: English
|
| 27 |
+
- **License**: MIT License
|
| 28 |
+
- **Finetuned from**: `google/t5-v1_1-xxl`
|
| 29 |
+
- **Paper**: https://arxiv.org/abs/2507.07186
|
| 30 |
+
- **Project page**: https://itay1itzhak.github.io/planted-in-pretraining
|
| 31 |
+
- **Repository**: https://github.com/itay1itzhak/planted-in-pretraining
|
| 32 |
|
| 33 |
## Uses
|
| 34 |
|
|
|
|
| 41 |
## How to Get Started with the Model
|
| 42 |
|
| 43 |
```python
|
| 44 |
+
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
|
| 45 |
|
| 46 |
+
model = AutoModelForSeq2SeqLM.from_pretrained("itay1itzhak/T5-Flan")
|
| 47 |
tokenizer = AutoTokenizer.from_pretrained("itay1itzhak/T5-Flan")
|
| 48 |
|
| 49 |
inputs = tokenizer("Example input?", return_tensors="pt")
|
|
|
|
| 53 |
|
| 54 |
## Training Details
|
| 55 |
|
| 56 |
+
- Finetuning method: LoRA (high-rank, rank ∈ [64, 512])
|
| 57 |
+
- Instruction data: Flan (350K)
|
| 58 |
+
- Seeds: 3 per setting to evaluate randomness effects
|
| 59 |
+
- Batch size: 128 (OLMo) / 64 (T5)
|
| 60 |
+
- Learning rate: 1e-6 to 1e-3
|
| 61 |
+
- Steps: ~5.5k (OLMo) / ~16k (T5)
|
| 62 |
+
- Mixed precision: fp16 (OLMo) / bf16 (T5)
|
| 63 |
|
| 64 |
## Evaluation
|
| 65 |
|
| 66 |
+
- Evaluated on 32 cognitive biases from Itzhak et al. (2024) and Malberg et al. (2024)
|
| 67 |
+
- Metrics: mean bias score, PCA clustering, MMLU accuracy
|
| 68 |
+
- Findings: Biases primarily originate in pretraining; randomness introduces moderate variation
|
| 69 |
|
| 70 |
## Environmental Impact
|
| 71 |
|
| 72 |
+
- Hardware: 4× NVIDIA A40
|
| 73 |
+
- Estimated time: ~120 GPU hours/model
|
| 74 |
|
| 75 |
## Technical Specifications
|
| 76 |
|
| 77 |
+
- Architecture: T5-11B
|
| 78 |
+
- Instruction dataset: Flan (350K)
|