hbin0701
/

csqa-gpt2-large-ctx-c

Safetensors

Model card Files Files and versions

xet

Community

Improve model card: Add metadata, paper and code links, and citation

by nielsr HF Staff - opened Oct 27, 2025

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+29

-3

Files changed (1) hide show

README.md +29 -3

README.md CHANGED Viewed

@@ -1,6 +1,14 @@
 # CSQA GPT2-Large Context-Aware Model
-This model is a GPT2-large based model fine-tuned for the CommonsenseQA (CSQA) task with context-aware capabilities.
 ## Model Architecture
@@ -22,8 +30,26 @@ This is a multi-component model that includes:
 ## Usage
-This model was trained for the CommonsenseQA task and includes specialized components for context-aware reasoning.
 ## Training
-The model was trained in multiple stages on the CommonsenseQA dataset, incorporating context-aware mechanisms to improve reasoning capabilities.

+---
+license: mit
+library_name: transformers
+pipeline_tag: text-generation
+---
 # CSQA GPT2-Large Context-Aware Model
+This model is a GPT2-large based model fine-tuned for the CommonsenseQA (CSQA) task with context-aware capabilities. It is part of the "Let's Predict Sentence by Sentence" framework, presented in the paper [Let's Predict Sentence by Sentence](https://huggingface.co/papers/2505.22202) (arXiv:2505.22202). This work investigates adapting pretrained token-level Language Models to operate in sentence space by autoregressively predicting continuous embeddings of next sentences, enabling abstract reasoning.
+For the official implementation and further details, please refer to the [GitHub repository](https://github.com/hbin0701/pred-sent).
 ## Model Architecture
 ## Usage
+This model was trained for the CommonsenseQA task and includes specialized components for context-aware reasoning. For detailed usage, particularly with the SentenceLens visualization tool or the full training pipeline, please refer to the [GitHub repository](https://github.com/hbin0701/pred-sent).
 ## Training
+The model was trained in multiple stages on the CommonsenseQA dataset, incorporating context-aware mechanisms to improve reasoning capabilities. More details on the training pipeline (SFT, Embedding Training, Latent Model Training) can be found in the [GitHub repository](https://github.com/hbin0701/pred-sent) and the [paper](https://huggingface.co/papers/2505.22202).
+## Citation
+If you find this work useful in your research, please cite our paper:
+```bibtex
+@misc{hwang2025letspredictsentencesentence,
+      title={Let's Predict Sentence by Sentence},
+      author={Hyeonbin Hwang and Byeongguk Jeon and Seungone Kim and Jiyeon Kim
+              and Hoyeon Chang and Sohee Yang and Seungpil Won and Dohaeng Lee
+              and Youbin Ahn and Minjoon Seo},
+      year={2025},
+      eprint={2505.22202},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2505.22202}
+}
+```