Improve model card: Add metadata, paper and code links, and citation

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +29 -3
README.md CHANGED
@@ -1,6 +1,14 @@
 
 
 
 
 
 
1
  # CSQA GPT2-Large Context-Aware Model
2
 
3
- This model is a GPT2-large based model fine-tuned for the CommonsenseQA (CSQA) task with context-aware capabilities.
 
 
4
 
5
  ## Model Architecture
6
 
@@ -22,8 +30,26 @@ This is a multi-component model that includes:
22
 
23
  ## Usage
24
 
25
- This model was trained for the CommonsenseQA task and includes specialized components for context-aware reasoning.
26
 
27
  ## Training
28
 
29
- The model was trained in multiple stages on the CommonsenseQA dataset, incorporating context-aware mechanisms to improve reasoning capabilities.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ library_name: transformers
4
+ pipeline_tag: text-generation
5
+ ---
6
+
7
  # CSQA GPT2-Large Context-Aware Model
8
 
9
+ This model is a GPT2-large based model fine-tuned for the CommonsenseQA (CSQA) task with context-aware capabilities. It is part of the "Let's Predict Sentence by Sentence" framework, presented in the paper [Let's Predict Sentence by Sentence](https://huggingface.co/papers/2505.22202) (arXiv:2505.22202). This work investigates adapting pretrained token-level Language Models to operate in sentence space by autoregressively predicting continuous embeddings of next sentences, enabling abstract reasoning.
10
+
11
+ For the official implementation and further details, please refer to the [GitHub repository](https://github.com/hbin0701/pred-sent).
12
 
13
  ## Model Architecture
14
 
 
30
 
31
  ## Usage
32
 
33
+ This model was trained for the CommonsenseQA task and includes specialized components for context-aware reasoning. For detailed usage, particularly with the SentenceLens visualization tool or the full training pipeline, please refer to the [GitHub repository](https://github.com/hbin0701/pred-sent).
34
 
35
  ## Training
36
 
37
+ The model was trained in multiple stages on the CommonsenseQA dataset, incorporating context-aware mechanisms to improve reasoning capabilities. More details on the training pipeline (SFT, Embedding Training, Latent Model Training) can be found in the [GitHub repository](https://github.com/hbin0701/pred-sent) and the [paper](https://huggingface.co/papers/2505.22202).
38
+
39
+ ## Citation
40
+
41
+ If you find this work useful in your research, please cite our paper:
42
+
43
+ ```bibtex
44
+ @misc{hwang2025letspredictsentencesentence,
45
+ title={Let's Predict Sentence by Sentence},
46
+ author={Hyeonbin Hwang and Byeongguk Jeon and Seungone Kim and Jiyeon Kim
47
+ and Hoyeon Chang and Sohee Yang and Seungpil Won and Dohaeng Lee
48
+ and Youbin Ahn and Minjoon Seo},
49
+ year={2025},
50
+ eprint={2505.22202},
51
+ archivePrefix={arXiv},
52
+ primaryClass={cs.CL},
53
+ url={https://arxiv.org/abs/2505.22202}
54
+ }
55
+ ```