Improve model card: license, model type, tags, and links (#1)

Browse files

- Improve model card: license, model type, tags, and links (288f379074e59b152ca839462391e91a2d2ba070)

Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show

README.md +30 -30

README.md CHANGED Viewed

@@ -1,16 +1,15 @@
 ---
-license: apache-2.0
 tags:
 - language-modeling
-- causal-lm
 - bias-analysis
 - cognitive-bias
-language:
-- en
-base_model:
-- google/t5-v1_1-xxl
-pipeline_tag: text2text-generation
-library_name: transformers
 ---
 # Model Card for T5-Flan
@@ -23,12 +22,13 @@ This 🤗 Transformers model was finetuned using LoRA adapters for the arXiv pap
 We study whether cognitive biases in LLMs emerge from pretraining, instruction tuning, or training randomness.
 This is one of 3 identical versions trained with different random seeds.
-- **Model type**: Causal decoder-based transformer
-- **Language(s)**: English
-- **License**: Apache 2.0
-- **Finetuned from**: `google/t5-v1_1-xxl`
-- **Paper**: https://arxiv.org/abs/2507.07186
-- **Repository**: https://github.com/itay1itzhak/planted-in-pretraining
 ## Uses
@@ -41,9 +41,9 @@ Do not use in production, sensitive domains, or decision-critical applications.
 ## How to Get Started with the Model
 ```python
-from transformers import AutoModelForCausalLM, AutoTokenizer
-model = AutoModelForCausalLM.from_pretrained("itay1itzhak/T5-Flan")
 tokenizer = AutoTokenizer.from_pretrained("itay1itzhak/T5-Flan")
 inputs = tokenizer("Example input?", return_tensors="pt")
@@ -53,26 +53,26 @@ print(tokenizer.decode(outputs[0]))
 ## Training Details
-- Finetuning method: LoRA (high-rank, rank ∈ [64, 512])
-- Instruction data: Flan (350K)
-- Seeds: 3 per setting to evaluate randomness effects
-- Batch size: 128 (OLMo) / 64 (T5)
-- Learning rate: 1e-6 to 1e-3
-- Steps: ~5.5k (OLMo) / ~16k (T5)
-- Mixed precision: fp16 (OLMo) / bf16 (T5)
 ## Evaluation
-- Evaluated on 32 cognitive biases from Itzhak et al. (2024) and Malberg et al. (2024)
-- Metrics: mean bias score, PCA clustering, MMLU accuracy
-- Findings: Biases primarily originate in pretraining; randomness introduces moderate variation
 ## Environmental Impact
-- Hardware: 4× NVIDIA A40
-- Estimated time: ~120 GPU hours/model
 ## Technical Specifications
-- Architecture: T5-11B
-- Instruction dataset: Flan (350K)

 ---
+base_model:
+- google/t5-v1_1-xxl
+language:
+- en
+library_name: transformers
+license: mit
+pipeline_tag: text2text-generation
 tags:
 - language-modeling
 - bias-analysis
 - cognitive-bias
 ---
 # Model Card for T5-Flan
 We study whether cognitive biases in LLMs emerge from pretraining, instruction tuning, or training randomness.
 This is one of 3 identical versions trained with different random seeds.
+-   **Model type**: Encoder-Decoder transformer
+-   **Language(s)**: English
+-   **License**: MIT License
+-   **Finetuned from**: `google/t5-v1_1-xxl`
+-   **Paper**: https://arxiv.org/abs/2507.07186
+-   **Project page**: https://itay1itzhak.github.io/planted-in-pretraining
+-   **Repository**: https://github.com/itay1itzhak/planted-in-pretraining
 ## Uses
 ## How to Get Started with the Model
 ```python
+from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
+model = AutoModelForSeq2SeqLM.from_pretrained("itay1itzhak/T5-Flan")
 tokenizer = AutoTokenizer.from_pretrained("itay1itzhak/T5-Flan")
 inputs = tokenizer("Example input?", return_tensors="pt")
 ## Training Details
+-   Finetuning method: LoRA (high-rank, rank ∈ [64, 512])
+-   Instruction data: Flan (350K)
+-   Seeds: 3 per setting to evaluate randomness effects
+-   Batch size: 128 (OLMo) / 64 (T5)
+-   Learning rate: 1e-6 to 1e-3
+-   Steps: ~5.5k (OLMo) / ~16k (T5)
+-   Mixed precision: fp16 (OLMo) / bf16 (T5)
 ## Evaluation
+-   Evaluated on 32 cognitive biases from Itzhak et al. (2024) and Malberg et al. (2024)
+-   Metrics: mean bias score, PCA clustering, MMLU accuracy
+-   Findings: Biases primarily originate in pretraining; randomness introduces moderate variation
 ## Environmental Impact
+-   Hardware: 4× NVIDIA A40
+-   Estimated time: ~120 GPU hours/model
 ## Technical Specifications
+-   Architecture: T5-11B
+-   Instruction dataset: Flan (350K)