Update README.md
Browse files
README.md
CHANGED
|
@@ -82,9 +82,12 @@ ROUGE (Recall-Oriented Understudy for Gisting Evaluation) This is used to see if
|
|
| 82 |
|
| 83 |
BLEU (Bilingual Evaluation Understudy) measures how many words appear in the reference generated human text. This should show if the model is picking up on the "surfer lingo" a higher BLEU score is better.
|
| 84 |
|
|
|
|
|
|
|
| 85 |
| | QWEN-4B-Instruct-2507 | Llama-3.2-3B-Instruct | google/gemma-2-2b-it | SurfMine |
|
| 86 |
|-------|:---------------------:|:---------------------:|:--------------------:|:------------:|
|
| 87 |
| BERT | **.8215** | .8141 | .8201 | **.8717** |
|
| 88 |
| ROUGE | **.1097** | .1053 | .1075 | **.2074** |
|
| 89 |
| BLEU | .0051 | .0032 | **.0059** | **.0702** |
|
| 90 |
|
|
|
|
|
|
| 82 |
|
| 83 |
BLEU (Bilingual Evaluation Understudy) measures how many words appear in the reference generated human text. This should show if the model is picking up on the "surfer lingo" a higher BLEU score is better.
|
| 84 |
|
| 85 |
+
I chose two models of a similar size from large AI researchers as benchmarks.
|
| 86 |
+
|
| 87 |
| | QWEN-4B-Instruct-2507 | Llama-3.2-3B-Instruct | google/gemma-2-2b-it | SurfMine |
|
| 88 |
|-------|:---------------------:|:---------------------:|:--------------------:|:------------:|
|
| 89 |
| BERT | **.8215** | .8141 | .8201 | **.8717** |
|
| 90 |
| ROUGE | **.1097** | .1053 | .1075 | **.2074** |
|
| 91 |
| BLEU | .0051 | .0032 | **.0059** | **.0702** |
|
| 92 |
|
| 93 |
+
SurfMine does better in all metrics whenm compared to the base model as well as the chosen benchmark models.
|