Update README.md
Browse files
README.md
CHANGED
|
@@ -1,88 +1,88 @@
|
|
| 1 |
### Model Card ###
|
| 2 |
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
|
| 5 |
-
Base Model: facebook/bart-base
|
| 6 |
|
| 7 |
-
|
| 8 |
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
Language: English
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
# Performance Metrics on Evaluation Set:
|
| 17 |
-
|
| 18 |
-
Training Loss: 1.1.1958
|
| 19 |
-
|
| 20 |
-
Evaluation Loss: 1.109059
|
| 21 |
|
| 22 |
|
| 23 |
### Loading the model ###
|
| 24 |
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
|
| 37 |
### At inference time ###
|
| 38 |
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
|
| 51 |
### Training parameters and hyperparameters ###
|
| 52 |
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
# For Lora:
|
| 56 |
-
|
| 57 |
-
r=18
|
| 58 |
-
|
| 59 |
-
alpha=8
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
# For training arguments:
|
| 63 |
-
|
| 64 |
-
gradient_accumulation_steps=24
|
| 65 |
-
|
| 66 |
-
per_device_train_batch_size=8
|
| 67 |
|
| 68 |
-
|
| 69 |
|
| 70 |
-
|
| 71 |
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
learning_rate=3e-3
|
| 77 |
|
| 78 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 79 |
|
| 80 |
### Training Results ###
|
| 81 |
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
|
|
|
|
| 1 |
### Model Card ###
|
| 2 |
|
| 3 |
+
# Information:
|
| 4 |
+
|
| 5 |
+
Base Model: facebook/bart-base
|
| 6 |
+
|
| 7 |
+
Fine-tuned : using PEFT-LoRa
|
| 8 |
+
|
| 9 |
+
Datasets : squad_v2, drop, mou3az/IT_QA-QG
|
| 10 |
+
|
| 11 |
+
Task: Generating questions from context and answers
|
| 12 |
+
|
| 13 |
+
Language: English
|
| 14 |
|
|
|
|
| 15 |
|
| 16 |
+
# Performance Metrics on Evaluation Set:
|
| 17 |
|
| 18 |
+
Training Loss: 1.1.1958
|
| 19 |
+
|
| 20 |
+
Evaluation Loss: 1.109059
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
|
| 22 |
|
| 23 |
### Loading the model ###
|
| 24 |
|
| 25 |
+
```python
|
| 26 |
+
from peft import PeftModel, PeftConfig
|
| 27 |
+
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
|
| 28 |
+
HUGGING_FACE_USER_NAME = "mou3az"
|
| 29 |
+
model_name = "ITandGeneral_Question-Generation"
|
| 30 |
+
peft_model_id = f"{HUGGING_FACE_USER_NAME}/{model_name}"
|
| 31 |
+
config = PeftConfig.from_pretrained(peft_model_id)
|
| 32 |
+
model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path, return_dict=True, load_in_8bit=False, device_map='auto')
|
| 33 |
+
QG_tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
|
| 34 |
+
QG_model = PeftModel.from_pretrained(model, peft_model_id)
|
| 35 |
+
```
|
| 36 |
|
| 37 |
### At inference time ###
|
| 38 |
|
| 39 |
+
```python
|
| 40 |
+
def get_question(context, answer):
|
| 41 |
+
device = next(L_model.parameters()).device
|
| 42 |
+
input_text = f"Given the context '{context}' and the answer '{answer}', what question can be asked?"
|
| 43 |
+
encoding = G_tokenizer.encode_plus(input_text, padding=True, return_tensors="pt").to(device)
|
| 44 |
+
|
| 45 |
+
output_tokens = L_model.generate(**encoding, early_stopping=True, num_beams=5, num_return_sequences=1, no_repeat_ngram_size=2, max_length=100)
|
| 46 |
+
out = G_tokenizer.decode(output_tokens[0], skip_special_tokens=True).replace("question:", "").strip()
|
| 47 |
+
|
| 48 |
+
return out
|
| 49 |
+
```
|
| 50 |
|
| 51 |
### Training parameters and hyperparameters ###
|
| 52 |
|
| 53 |
+
The following were used during training:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 54 |
|
| 55 |
+
# For Lora:
|
| 56 |
|
| 57 |
+
r=18
|
| 58 |
|
| 59 |
+
alpha=8
|
| 60 |
+
|
| 61 |
+
|
| 62 |
+
# For training arguments:
|
|
|
|
| 63 |
|
| 64 |
+
gradient_accumulation_steps=24
|
| 65 |
+
|
| 66 |
+
per_device_train_batch_size=8
|
| 67 |
+
|
| 68 |
+
per_device_eval_batch_size=8
|
| 69 |
+
|
| 70 |
+
max_steps=1000
|
| 71 |
+
|
| 72 |
+
warmup_steps=50
|
| 73 |
+
|
| 74 |
+
weight_decay=0.05
|
| 75 |
+
|
| 76 |
+
learning_rate=3e-3
|
| 77 |
+
|
| 78 |
+
lr_scheduler_type="linear"
|
| 79 |
|
| 80 |
### Training Results ###
|
| 81 |
|
| 82 |
+
| Epoch | Training Loss | Validation Loss |
|
| 83 |
+
|-------|---------------|-----------------|
|
| 84 |
+
| 0.0 | 4.6426 | 4.704238 |
|
| 85 |
+
| 3.0 | 1.5094 | 1.202135 |
|
| 86 |
+
| 6.0 | 1.2677 | 1.146177 |
|
| 87 |
+
| 9.0 | 1.2613 | 1.112074 |
|
| 88 |
+
| 12.0 | 1.1958 | 1.109059 |
|