Update README.md
Browse files
README.md
CHANGED
|
@@ -64,4 +64,21 @@ learning_rate = 0.0003995209593890016
|
|
| 64 |
lora_alpha = 128
|
| 65 |
lora_dropout = .1
|
| 66 |
lora_r = 64
|
| 67 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 64 |
lora_alpha = 128
|
| 65 |
lora_dropout = .1
|
| 66 |
lora_r = 64
|
| 67 |
+
```
|
| 68 |
+
|
| 69 |
+
I experimented with full fine tuning, however the model lost a lot of it's functionality and become a repeater. For this reason, I leveraged PEFT methods. I settled on LoRA as it was fairly simple to implement.
|
| 70 |
+
|
| 71 |
+
### Evaluation
|
| 72 |
+
|
| 73 |
+
For this models evalutation I used three metrics that are common in natural language tasks.
|
| 74 |
+
|
| 75 |
+
* BERT
|
| 76 |
+
* ROUGE
|
| 77 |
+
* BLEU
|
| 78 |
+
|
| 79 |
+
The primary evaluation is BERT score, this is a way to calcuate the similarity between two text inputs. BERT aims to to assess semantic similarity, it measure the difference between the actual forecast, and the generated forecast to see if they have similar semantic meanings, a higher BERT score is better.
|
| 80 |
+
|
| 81 |
+
ROUGE (Recall-Oriented Understudy for Gisting Evaluation) This is used to see if the general gist is similar between the generated forecast and the actual human forecast. A higher ROUGE score is better.
|
| 82 |
+
|
| 83 |
+
BLEU (Bilingual Evaluation Understudy) measures how many words appear in the reference generated human text. This should show if the model is picking up on the "surfer lingo" a higher BLEU score is better.
|
| 84 |
+
|