Instructions to use oeg/software_benchmark_multidomain with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use oeg/software_benchmark_multidomain with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="oeg/software_benchmark_multidomain")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("oeg/software_benchmark_multidomain") model = AutoModelForTokenClassification.from_pretrained("oeg/software_benchmark_multidomain") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -38,10 +38,34 @@ The training code can be found on [Github](https://github.com/oeg-upm/software_m
|
|
| 38 |
|
| 39 |
## Evaluation Results
|
| 40 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 41 |
* Precision: 0.8928176795580111
|
| 42 |
* Recall: 0.8568398727465536
|
| 43 |
* F1-score: 0.8744588744588745
|
| 44 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 45 |
## Acknoledgements
|
| 46 |
|
| 47 |
This is a work done thank to the effort of other projects:
|
|
|
|
| 38 |
|
| 39 |
## Evaluation Results
|
| 40 |
|
| 41 |
+
These are the hyperparameters used to train the model:
|
| 42 |
+
* evaluation_strategy = "epoch"
|
| 43 |
+
* save_strategy="no"
|
| 44 |
+
* per_device_train_batch_size=16
|
| 45 |
+
* per_device_eval_batch_size=16
|
| 46 |
+
* num_train_epochs=3
|
| 47 |
+
* weight_decay=1e-5
|
| 48 |
+
* learning_rate=1e-4
|
| 49 |
+
|
| 50 |
+
The evaluation results are:
|
| 51 |
+
|
| 52 |
* Precision: 0.8928176795580111
|
| 53 |
* Recall: 0.8568398727465536
|
| 54 |
* F1-score: 0.8744588744588745
|
| 55 |
|
| 56 |
+
This model has been compared with some generative models such as llama2 and hermes using the testing part of the benchmark. Following, we present the results of partial matches, it means, the predictions are included in the corpus
|
| 57 |
+
|
| 58 |
+
### Llama2 (7B)
|
| 59 |
+
* Precision: 0.6342857142857142
|
| 60 |
+
* Recall: 0.7161290322580646
|
| 61 |
+
* F1-score: 0.67
|
| 62 |
+
|
| 63 |
+
### Hermes (13B)
|
| 64 |
+
|
| 65 |
+
* Precision: 0.4666666666666667
|
| 66 |
+
* Recall: 0.509090909090909
|
| 67 |
+
* F1-score: 0.4869565217391304
|
| 68 |
+
|
| 69 |
## Acknoledgements
|
| 70 |
|
| 71 |
This is a work done thank to the effort of other projects:
|