Adding the Open Portuguese LLM Leaderboard Evaluation Results (#1)

777ceb2 verified almost 2 years ago

7.29 kB

	---
	language:
	- pt
	license: mit
	library_name: peft
	tags:
	- gptq
	- ptbr
	base_model: TheBloke/zephyr-7B-beta-GPTQ
	revision: gptq-8bit-32g-actorder_True
	model-index:
	- name: cesar-ptbr
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: ENEM Challenge (No Images)
	type: eduagarcia/enem_challenge
	split: train
	args:
	num_few_shot: 3
	metrics:
	- type: acc
	value: 53.74
	name: accuracy
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=matheusrdgsf/cesar-ptbr
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: BLUEX (No Images)
	type: eduagarcia-temp/BLUEX_without_images
	split: train
	args:
	num_few_shot: 3
	metrics:
	- type: acc
	value: 46.87
	name: accuracy
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=matheusrdgsf/cesar-ptbr
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: OAB Exams
	type: eduagarcia/oab_exams
	split: train
	args:
	num_few_shot: 3
	metrics:
	- type: acc
	value: 38.27
	name: accuracy
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=matheusrdgsf/cesar-ptbr
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Assin2 RTE
	type: assin2
	split: test
	args:
	num_few_shot: 15
	metrics:
	- type: f1_macro
	value: 58.32
	name: f1-macro
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=matheusrdgsf/cesar-ptbr
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Assin2 STS
	type: eduagarcia/portuguese_benchmark
	split: test
	args:
	num_few_shot: 15
	metrics:
	- type: pearson
	value: 68.49
	name: pearson
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=matheusrdgsf/cesar-ptbr
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: FaQuAD NLI
	type: ruanchaves/faquad-nli
	split: test
	args:
	num_few_shot: 15
	metrics:
	- type: f1_macro
	value: 73.81
	name: f1-macro
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=matheusrdgsf/cesar-ptbr
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: HateBR Binary
	type: ruanchaves/hatebr
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: f1_macro
	value: 83.3
	name: f1-macro
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=matheusrdgsf/cesar-ptbr
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: PT Hate Speech Binary
	type: hate_speech_portuguese
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: f1_macro
	value: 67.49
	name: f1-macro
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=matheusrdgsf/cesar-ptbr
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: tweetSentBR
	type: eduagarcia/tweetsentbr_fewshot
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: f1_macro
	value: 42.71
	name: f1-macro
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=matheusrdgsf/cesar-ptbr
	name: Open Portuguese LLM Leaderboard
	---
	## Training procedure


	The following `bitsandbytes` quantization config was used during training:
	- quant_method: gptq
	- bits: 8
	- tokenizer: None
	- dataset: None
	- group_size: 32
	- damp_percent: 0.1
	- desc_act: True
	- sym: True
	- true_sequential: True
	- use_cuda_fp16: False
	- model_seqlen: 4096
	- block_name_to_quantize: model.layers
	- module_name_preceding_first_block: ['model.embed_tokens']
	- batch_size: 1
	- pad_token_id: None
	- disable_exllama: True
	- max_input_length: None
	### Framework versions



	# Load model AutoModel
	```python
	from peft import PeftModel, PeftConfig
	from transformers import AutoModelForCausalLM

	config = PeftConfig.from_pretrained("matheusrdgsf/cesar-ptbr")
	model = AutoModelForCausalLM.from_pretrained("TheBloke/zephyr-7B-beta-GPTQ", revision="gptq-8bit-32g-actorder_True", device_map='auto')
	model = PeftModel.from_pretrained(model, "matheusrdgsf/cesar-ptbr")
	```

	# Easy inference
	```python
	from transformers import GenerationConfig
	from transformers import AutoTokenizer

	tokenizer_model = AutoTokenizer.from_pretrained('TheBloke/zephyr-7B-beta-GPTQ')
	tokenizer_template = AutoTokenizer.from_pretrained('HuggingFaceH4/zephyr-7b-alpha')

	generation_config = GenerationConfig(
	do_sample=True,
	temperature=0.1,
	top_p=0.25,
	top_k=0,
	max_new_tokens=512,
	repetition_penalty=1.1,
	eos_token_id=tokenizer_model.eos_token_id,
	pad_token_id=tokenizer_model.eos_token_id,
	)


	def get_inference(
	text,
	model,
	tokenizer_model=tokenizer_model,
	tokenizer_template=tokenizer_template,
	generation_config=generation_config,
	):
	st_time = time.time()
	inputs = tokenizer_model(
	tokenizer_template.apply_chat_template(
	[
	{
	"role": "system",
	"content": "Você é um chatbot para indicação de filmes. Responda em português e de maneira educada sugestões de filmes para os usuários.",
	},
	{"role": "user", "content": text},
	],
	tokenize=False,
	),
	return_tensors="pt",
	).to("cuda")

	outputs = model.generate(**inputs, generation_config=generation_config)

	print('inference time:', time.time() - st_time)
	return tokenizer_model.decode(outputs[0], skip_special_tokens=True).split('\n')[-1]

	get_inference('Poderia indicar filmes de ação de até 2 horas?', model)
	```


	- PEFT 0.5.0


	# Open Portuguese LLM Leaderboard Evaluation Results

	Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/matheusrdgsf/cesar-ptbr) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)

	\| Metric \| Value \|
	\|--------------------------\|---------\|
	\|Average \|59.22\|
	\|ENEM Challenge (No Images)\| 53.74\|
	\|BLUEX (No Images) \| 46.87\|
	\|OAB Exams \| 38.27\|
	\|Assin2 RTE \| 58.32\|
	\|Assin2 STS \| 68.49\|
	\|FaQuAD NLI \| 73.81\|
	\|HateBR Binary \| 83.30\|
	\|PT Hate Speech Binary \| 67.49\|
	\|tweetSentBR \| 42.71\|