itay1itzhak nielsr HF Staff commited on
Commit
6fff9d8
·
verified ·
1 Parent(s): ddb8f40

Improve model card: license, model type, tags, and links (#1)

Browse files

- Improve model card: license, model type, tags, and links (288f379074e59b152ca839462391e91a2d2ba070)


Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +30 -30
README.md CHANGED
@@ -1,16 +1,15 @@
1
  ---
2
- license: apache-2.0
 
 
 
 
 
 
3
  tags:
4
  - language-modeling
5
- - causal-lm
6
  - bias-analysis
7
  - cognitive-bias
8
- language:
9
- - en
10
- base_model:
11
- - google/t5-v1_1-xxl
12
- pipeline_tag: text2text-generation
13
- library_name: transformers
14
  ---
15
 
16
  # Model Card for T5-Flan
@@ -23,12 +22,13 @@ This 🤗 Transformers model was finetuned using LoRA adapters for the arXiv pap
23
  We study whether cognitive biases in LLMs emerge from pretraining, instruction tuning, or training randomness.
24
  This is one of 3 identical versions trained with different random seeds.
25
 
26
- - **Model type**: Causal decoder-based transformer
27
- - **Language(s)**: English
28
- - **License**: Apache 2.0
29
- - **Finetuned from**: `google/t5-v1_1-xxl`
30
- - **Paper**: https://arxiv.org/abs/2507.07186
31
- - **Repository**: https://github.com/itay1itzhak/planted-in-pretraining
 
32
 
33
  ## Uses
34
 
@@ -41,9 +41,9 @@ Do not use in production, sensitive domains, or decision-critical applications.
41
  ## How to Get Started with the Model
42
 
43
  ```python
44
- from transformers import AutoModelForCausalLM, AutoTokenizer
45
 
46
- model = AutoModelForCausalLM.from_pretrained("itay1itzhak/T5-Flan")
47
  tokenizer = AutoTokenizer.from_pretrained("itay1itzhak/T5-Flan")
48
 
49
  inputs = tokenizer("Example input?", return_tensors="pt")
@@ -53,26 +53,26 @@ print(tokenizer.decode(outputs[0]))
53
 
54
  ## Training Details
55
 
56
- - Finetuning method: LoRA (high-rank, rank ∈ [64, 512])
57
- - Instruction data: Flan (350K)
58
- - Seeds: 3 per setting to evaluate randomness effects
59
- - Batch size: 128 (OLMo) / 64 (T5)
60
- - Learning rate: 1e-6 to 1e-3
61
- - Steps: ~5.5k (OLMo) / ~16k (T5)
62
- - Mixed precision: fp16 (OLMo) / bf16 (T5)
63
 
64
  ## Evaluation
65
 
66
- - Evaluated on 32 cognitive biases from Itzhak et al. (2024) and Malberg et al. (2024)
67
- - Metrics: mean bias score, PCA clustering, MMLU accuracy
68
- - Findings: Biases primarily originate in pretraining; randomness introduces moderate variation
69
 
70
  ## Environmental Impact
71
 
72
- - Hardware: 4× NVIDIA A40
73
- - Estimated time: ~120 GPU hours/model
74
 
75
  ## Technical Specifications
76
 
77
- - Architecture: T5-11B
78
- - Instruction dataset: Flan (350K)
 
1
  ---
2
+ base_model:
3
+ - google/t5-v1_1-xxl
4
+ language:
5
+ - en
6
+ library_name: transformers
7
+ license: mit
8
+ pipeline_tag: text2text-generation
9
  tags:
10
  - language-modeling
 
11
  - bias-analysis
12
  - cognitive-bias
 
 
 
 
 
 
13
  ---
14
 
15
  # Model Card for T5-Flan
 
22
  We study whether cognitive biases in LLMs emerge from pretraining, instruction tuning, or training randomness.
23
  This is one of 3 identical versions trained with different random seeds.
24
 
25
+ - **Model type**: Encoder-Decoder transformer
26
+ - **Language(s)**: English
27
+ - **License**: MIT License
28
+ - **Finetuned from**: `google/t5-v1_1-xxl`
29
+ - **Paper**: https://arxiv.org/abs/2507.07186
30
+ - **Project page**: https://itay1itzhak.github.io/planted-in-pretraining
31
+ - **Repository**: https://github.com/itay1itzhak/planted-in-pretraining
32
 
33
  ## Uses
34
 
 
41
  ## How to Get Started with the Model
42
 
43
  ```python
44
+ from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
45
 
46
+ model = AutoModelForSeq2SeqLM.from_pretrained("itay1itzhak/T5-Flan")
47
  tokenizer = AutoTokenizer.from_pretrained("itay1itzhak/T5-Flan")
48
 
49
  inputs = tokenizer("Example input?", return_tensors="pt")
 
53
 
54
  ## Training Details
55
 
56
+ - Finetuning method: LoRA (high-rank, rank ∈ [64, 512])
57
+ - Instruction data: Flan (350K)
58
+ - Seeds: 3 per setting to evaluate randomness effects
59
+ - Batch size: 128 (OLMo) / 64 (T5)
60
+ - Learning rate: 1e-6 to 1e-3
61
+ - Steps: ~5.5k (OLMo) / ~16k (T5)
62
+ - Mixed precision: fp16 (OLMo) / bf16 (T5)
63
 
64
  ## Evaluation
65
 
66
+ - Evaluated on 32 cognitive biases from Itzhak et al. (2024) and Malberg et al. (2024)
67
+ - Metrics: mean bias score, PCA clustering, MMLU accuracy
68
+ - Findings: Biases primarily originate in pretraining; randomness introduces moderate variation
69
 
70
  ## Environmental Impact
71
 
72
+ - Hardware: 4× NVIDIA A40
73
+ - Estimated time: ~120 GPU hours/model
74
 
75
  ## Technical Specifications
76
 
77
+ - Architecture: T5-11B
78
+ - Instruction dataset: Flan (350K)