jbrandin
/

stance_classification

@@ -3,8 +3,6 @@ library_name: transformers
 license: mit
 ---
-# Model Card for Model ID
 ## Introduction
 LLMs have the potential to support representative democracy by providing constituents with valuable information about their representatives. According to the polling aggregator 538, Congress’ approval rating is 21.7% favorable and 61.4% unfavorable at the time of writing.  Further, according to a Pew survey, only 22% of US adults said that they trust the federal government to do the right thing.  With trust in institutions at historic lows, it is important to explore novel ways to help solve that problem. The goal of this model is to prepare an LLM which can, based on provided input text, tell the model user what the author of that text’s stance towards a given topic or claim would be.  My hope is that similar approaches can give voters a deeper understanding of their representatives’ positions. This is a challenging problem that LLMs would need additional training to solve effectively as this task involves both classification and chain-of-thought reasoning. For this project, I began with the Qwen2-7B-Instruct model and performed Parameter Efficient Fine Tuning (PEFT) using Low-Rank Adaptation (LoRA). The model was trained specifically on the stance classification portion of the task. Ultimately, the results were inconclusive. The post-training model performed only very slightly better at the classification task – accuracy was about 47% for both. Therefore, further attempts at training would be necessary to develop a model that can be considered truly successful at this tax.
@@ -20,11 +18,34 @@ Stance: The stance label for the Target text (Favorable, Unfavorable, or Neutral
 Using this dataset, I was able to provide the model with the source text and ask it to determine whether the author of that text would have a favorable, unfavorable, or no stance towards the target topic or claim. I did not make any modifications to those fields in the training dataset, other than adding structure around the data in the prompt to clarify to the model what I wanted it to provide. The second component of the task was to have the model provide step-by-step reasoning behind the stance it provided. This reasoning was not included in the training dataset, but I thought it was important to have the model generate this reasoning because an explanation for the user to reference is important when considering that the original reason behind this model is to help build trust.
 ### Training Method
 The base model used for this project was Qwen2.5-7B-Instruct-1M . I chose this model because it could handle large context windows, was instruction tuned, and its relatively low number of parameters would make it more efficient to train. The final model was trained on the stance classification task using the LoRA method of PEFT.  Then, few-shot chain-of-thought prompting was used to ask the final model for reasoning behind the stances it generated. When reviewing the output of the model on my task, I observed that few-shot prompting alone went a very long way in improving the output of the model when having it explain its reasoning, which is why I only trained the model on the stance classification component of the task. I used PEFT over full fine-tuning because I did not want to drastically change my model since it was already performing well on the reasoning task. Also, since I am using a 7B parameter model and my desired model output is open-ended, I had concerns around the efficiency of full-fine tuning. My aim was to take a targeted training approach to assist the model on its classification task.
 That left me deciding between PEFT and Prompt Tuning. My model was already performing well without any tuning, which led me to first consider using prompt tuning as it was the least invasive approach. However, my task does involve asking the model to perform a somewhat specific stance classification task in addition to generating its reasoning, so I thought the somewhat more in-depth approach of PEFT could be useful. Also, since my model is small to medium sized at 7B parameters, I did not have the same concern with resource usage using PEFT as I did with full fine-tuning. Therefore, I decided to take the middle-ground approach of PEFT. Within PEFT, I chose to use LoRA because it is a common approach with a lot of resources and guidance available, which gave me confidence in my ability to implement it effectively. LoRA is also much more efficient than full fine-tuning, and has been shown to perform almost as well, including logical reasoning tasks.
 Finally, when I prompt the model for reasoning behind the stance it selected, I used few-shot prompting. Min et al. found that giving the model about 16 examples in the prompt resulted in the best performance on classification and multi-choice tasks.  Since I have three possible stance options (FAVOR, AGAINST, NONE) I will provide the model with 15 examples (5 for each stance). The 15 examples included in the prompt were hand-written by me since no training data existed for the logical reasoning portion of this task.
 ### Evaluation

 license: mit
 ---
 ## Introduction
 LLMs have the potential to support representative democracy by providing constituents with valuable information about their representatives. According to the polling aggregator 538, Congress’ approval rating is 21.7% favorable and 61.4% unfavorable at the time of writing.  Further, according to a Pew survey, only 22% of US adults said that they trust the federal government to do the right thing.  With trust in institutions at historic lows, it is important to explore novel ways to help solve that problem. The goal of this model is to prepare an LLM which can, based on provided input text, tell the model user what the author of that text’s stance towards a given topic or claim would be.  My hope is that similar approaches can give voters a deeper understanding of their representatives’ positions. This is a challenging problem that LLMs would need additional training to solve effectively as this task involves both classification and chain-of-thought reasoning. For this project, I began with the Qwen2-7B-Instruct model and performed Parameter Efficient Fine Tuning (PEFT) using Low-Rank Adaptation (LoRA). The model was trained specifically on the stance classification portion of the task. Ultimately, the results were inconclusive. The post-training model performed only very slightly better at the classification task – accuracy was about 47% for both. Therefore, further attempts at training would be necessary to develop a model that can be considered truly successful at this tax.
 Using this dataset, I was able to provide the model with the source text and ask it to determine whether the author of that text would have a favorable, unfavorable, or no stance towards the target topic or claim. I did not make any modifications to those fields in the training dataset, other than adding structure around the data in the prompt to clarify to the model what I wanted it to provide. The second component of the task was to have the model provide step-by-step reasoning behind the stance it provided. This reasoning was not included in the training dataset, but I thought it was important to have the model generate this reasoning because an explanation for the user to reference is important when considering that the original reason behind this model is to help build trust.
+The dataset did not have a test/train split, so I randomly shuffled the dataset, and then split it into training and validation sets at 80%/20% respsectively using the code below.
+```python
+# Split into 80% train, 20% test
+train, val = train_test_split(responses_df, test_size=0.2, random_state=42)
+```
 ### Training Method
 The base model used for this project was Qwen2.5-7B-Instruct-1M . I chose this model because it could handle large context windows, was instruction tuned, and its relatively low number of parameters would make it more efficient to train. The final model was trained on the stance classification task using the LoRA method of PEFT.  Then, few-shot chain-of-thought prompting was used to ask the final model for reasoning behind the stances it generated. When reviewing the output of the model on my task, I observed that few-shot prompting alone went a very long way in improving the output of the model when having it explain its reasoning, which is why I only trained the model on the stance classification component of the task. I used PEFT over full fine-tuning because I did not want to drastically change my model since it was already performing well on the reasoning task. Also, since I am using a 7B parameter model and my desired model output is open-ended, I had concerns around the efficiency of full-fine tuning. My aim was to take a targeted training approach to assist the model on its classification task.
 That left me deciding between PEFT and Prompt Tuning. My model was already performing well without any tuning, which led me to first consider using prompt tuning as it was the least invasive approach. However, my task does involve asking the model to perform a somewhat specific stance classification task in addition to generating its reasoning, so I thought the somewhat more in-depth approach of PEFT could be useful. Also, since my model is small to medium sized at 7B parameters, I did not have the same concern with resource usage using PEFT as I did with full fine-tuning. Therefore, I decided to take the middle-ground approach of PEFT. Within PEFT, I chose to use LoRA because it is a common approach with a lot of resources and guidance available, which gave me confidence in my ability to implement it effectively. LoRA is also much more efficient than full fine-tuning, and has been shown to perform almost as well, including logical reasoning tasks.
+LoRA hyperparameters used were as follows:
+```pyhton
+LORA_R = 64
+LORA_ALPHA = 64
+LORA_DROPOUT = .05
+lora_config = LoraConfig(
+    r = LORA_R,
+    lora_alpha = LORA_ALPHA,
+    lora_dropout = LORA_DROPOUT,
+    bias = "none",
+    task_type = "CAUSAL_LM",
+    target_modules = ['q_proj','v_proj']
+)
+```
 Finally, when I prompt the model for reasoning behind the stance it selected, I used few-shot prompting. Min et al. found that giving the model about 16 examples in the prompt resulted in the best performance on classification and multi-choice tasks.  Since I have three possible stance options (FAVOR, AGAINST, NONE) I will provide the model with 15 examples (5 for each stance). The 15 examples included in the prompt were hand-written by me since no training data existed for the logical reasoning portion of this task.
 ### Evaluation