haoranxu
/

ALMA-13B-R

Text Generation

text-generation-inference

Model card Files Files and versions

Update README.md

#13

by ReactionControl - opened Jan 22

base: refs/heads/main

←

from: refs/pr/13

Discussion Files changed

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -1,5 +1,7 @@
 ---
 license: mit
 ---
 **[ALMA-R](https://arxiv.org/abs/2401.08417)** builds upon [ALMA models](https://arxiv.org/abs/2309.11674), with further LoRA fine-tuning with our proposed **Contrastive Preference Optimization (CPO)** as opposed to the Supervised Fine-tuning used in ALMA. CPO fine-tuning requires our [triplet preference data](https://huggingface.co/datasets/haoranxu/ALMA-R-Preference) for preference learning. ALMA-R now can matches or even exceeds GPT-4 or WMT winners!
 ```

 ---
 license: mit
+base_model:
+- haoranxu/ALMA-13B-Pretrain
 ---
 **[ALMA-R](https://arxiv.org/abs/2401.08417)** builds upon [ALMA models](https://arxiv.org/abs/2309.11674), with further LoRA fine-tuning with our proposed **Contrastive Preference Optimization (CPO)** as opposed to the Supervised Fine-tuning used in ALMA. CPO fine-tuning requires our [triplet preference data](https://huggingface.co/datasets/haoranxu/ALMA-R-Preference) for preference learning. ALMA-R now can matches or even exceeds GPT-4 or WMT winners!
 ```