raalst
/

RobBERT-v2-nl-qa

Question Answering

Model card Files Files and versions

RobBERT-v2-nl-qa / README.md

raalst's picture

Update README.md

b87f40f over 2 years ago

|

history blame contribute delete

2.87 kB

	---
	datasets:
	- raalst/squad_v2_dutch
	language:
	- nl
	---

	The used dataset raalst/squad_v2_dutch was kindly provided by Henryk Borzymowski.
	It is a translated version of SQuAD V2. I converted it from json to jsonl format.
	it contains train and validation splits, no test split.
	I declared 20% of Train to be used as Testset in my finetuning run.
	That testset is what the evaluation is based on.

	when using raalst/squad_v2_dutch, be sure to clean up quotes and double quotes in the contexts

	The pretrained model was pdelobelle/robbert-v2-dutch-base, a dutch RoBERTa model

	results obtained in training are :

	metric = load("evaluate-metric/squad_v2" if squad_v2 else "evaluate-metric/squad")

	{'exact': 61.75389109958193,
	'f1': 66.89717170237417,
	'total': 19853,
	'HasAns_exact': 48.967182330322814,
	'HasAns_f1': 58.09796564493008,
	'HasAns_total': 11183,
	'NoAns_exact': 78.24682814302192,
	'NoAns_f1': 78.24682814302192,
	'NoAns_total': 8670,
	'best_exact': 61.75389109958193,
	'best_exact_thresh': 0.0,
	'best_f1': 66.89717170237276,
	'best_f1_thresh': 0.0}

	This seems mediocre to me.

	settings (until I figured out how to report them properly):

	DatasetDict({
	train: Dataset({
	features: ['id', 'title', 'context', 'question', 'answers'],
	num_rows: 79412
	})
	test: Dataset({
	features: ['id', 'title', 'context', 'question', 'answers'],
	num_rows: 19853
	})
	validation: Dataset({
	features: ['id', 'title', 'context', 'question', 'answers'],
	num_rows: 9669
	})
	})

	tokenizer = AutoTokenizer.from_pretrained("pdelobelle/robbert-v2-dutch-base")

	from transformers import AutoModelForQuestionAnswering, TrainingArguments, Trainer

	model = AutoModelForQuestionAnswering.from_pretrained("pdelobelle/robbert-v2-dutch-base")
	training_args = TrainingArguments(
	output_dir="./qa_model",
	evaluation_strategy="epoch",
	learning_rate=2e-5,
	per_device_train_batch_size=16,
	per_device_eval_batch_size=16,
	num_train_epochs=3,
	weight_decay=0.01,
	push_to_hub=False,
	)

	trainer = Trainer(
	model=model,
	args=training_args,
	train_dataset=tokenized_squad["train"],
	eval_dataset=tokenized_squad["validation"],
	tokenizer=tokenizer,
	data_collator=data_collator,
	)

	trainer.train()

	[15198/15198 2:57:03, Epoch 3/3]
	Epoch Training Loss Validation Loss
	1 1.380700 1.177431
	2 1.093000 1.052601
	3 0.849700 1.143632

	TrainOutput(global_step=15198, training_loss=1.1917077029499668, metrics={'train_runtime': 10623.9565,
	'train_samples_per_second': 22.886, 'train_steps_per_second': 1.431, 'total_flos': 4.764955396486349e+16,
	'train_loss': 1.1917077029499668, 'epoch': 3.0})

	Trained on Ubuntu with 1080Ti