am5uc
/

ServiceNow_Table_Question_Answering

@@ -43,12 +43,12 @@ These were the arguments/hyperparameters, I used. I tried using higher epochs, b
 ## Evaluation
 I had three benchmarks, the WikiTableQuestions dataset, the TabFact dataset, and the Synthetic Validation set. Fine-tuning did not harm the results of on the WTQ Validation Set and the TabFact Dataset, in which I got accuracies of .3405 and .5005, respectively for both the pre-trained and fine-tuned model. There were improvements in the validation and test results after training though. On Validation, there was a jump from 0.4000 to 0.4222. On the test set, there was quite a larger jump in accuracy from 0.2033 to 0.4667 after fine-tuning.
-| Model                                                | Test Set of Synthetic Dataset | Benchmark 1 (WTQ Validation Set)         | Benchmark 2 (TabFact) | Benchmark 3 (SQA) |   |   |
-|------------------------------------------------------|-------------------------------|------------------------------------------|-----------------------|-------------------|---|---|
-| google/tapas-base-finetuned-wtq (before Fine-tuning) | 0.2933                        | 0.3405                                   | 0.5005                | 0.2512            |   |   |
-| google/tapas-base-finetuned-wtq (Fine-tuned)         | 0.4667                        | 0.3405                                   | 0.5005                | 0.2525            |   |   |
-| mistralai/Mistral-7B-Instruct-v0.3                   | 0                             | Exact Match: 0.0346  Fuzzy Match: 0.4744 |                       |                   |   |   |
-| meta-llama/Llama-3.2-1B                              | 0.0133                        |                                          |                       |                   |   |   |
 ## Usage and Intended Uses

 ## Evaluation
 I had three benchmarks, the WikiTableQuestions dataset, the TabFact dataset, and the Synthetic Validation set. Fine-tuning did not harm the results of on the WTQ Validation Set and the TabFact Dataset, in which I got accuracies of .3405 and .5005, respectively for both the pre-trained and fine-tuned model. There were improvements in the validation and test results after training though. On Validation, there was a jump from 0.4000 to 0.4222. On the test set, there was quite a larger jump in accuracy from 0.2033 to 0.4667 after fine-tuning.
+| Model                                                | Test Set of Synthetic Dataset | Benchmark 1 (WTQ Validation Set)         | Benchmark 2 (TabFact) | Benchmark 3 (SQA) |
+|------------------------------------------------------|-------------------------------|------------------------------------------|-----------------------|-------------------|
+| google/tapas-base-finetuned-wtq (before Fine-tuning) | 0.2933                        | 0.3405                                   | 0.5005                | 0.2512            |
+| google/tapas-base-finetuned-wtq (Fine-tuned)         | 0.4667                        | 0.3405                                   | 0.5005                | 0.2525            |
+| mistralai/Mistral-7B-Instruct-v0.3                   | 0                             | Exact Match: 0.0346  Fuzzy Match: 0.4744 |                       |                   |
+| meta-llama/Llama-3.2-1B                              | 0.0133                        | Exact Match: 0.0593  Fuzzy Match: 0.2769 |                       |                   |
 ## Usage and Intended Uses