Update README.md

65fedc6 verified over 1 year ago

7.15 kB

	---
	license: apache-2.0
	language:
	- en
	metrics:
	- accuracy
	pipeline_tag: text-generation

	model-index:
	- name: Qwen2-Simple-Arguments
	results:
	- task:
	type: text-generation
	dataset:
	name: Argument-parsing
	type: Argument-parsing
	metrics:
	- name: Accuracy
	type: Accuracy
	value: 100
	---
	# Qwen2 Simple Arguments
	![image](assets/qwen_arguments_logo.png)
	[![image](assets/hire_me.png)](https://www.freelancer.com/u/cdesivo92)

	This model aims to parse simple english arguments, arguments formed of two premises and a conclusion, including two propositions.

	## Model Details

	### Model Description

	<!-- Provide a longer summary of what this model is. -->

	- Developed by: Cristian Desivo
	- Model type: LLM
	- Language(s) (NLP): English
	- License: Apache-2.0
	- Finetuned from model: Qwen2-0.5b

	### Model Sources

	<!-- Provide the basic links for the model. -->

	- Repository: TBD
	- Demo: TBD

	### Quantization

	- Q4_K_M.gguf https://huggingface.co/cris177/Qwen2-Simple-Arguments/resolve/main/Qwen2_arguments.Q4_K_M.gguf?download=true

	## Usage

	Below we share some code snippets on how to get quickly started with running the model.

	### llama.cpp server [Recommended]

	The recommended way of running the model is with a llama.cpp server running the quantized https://huggingface.co/cris177/Qwen2-Simple-Arguments/resolve/main/Qwen2_arguments.Q4_K_M.gguf?download=true

	Then you can use the following script to use the server's model for inference:

	```python
	import json
	import requests

	def llmCompletion(prompt, **args):
	url = "http://localhost:8080/completions"
	headers = {
	"Content-Type": "application/json"
	}
	data = {
	'prompt': prompt
	}
	for arg in args:
	data[arg] = args[arg]
	response = requests.post(url, headers=headers, json=data)
	return response.json()

	def analyze_argument(argument):
	instruction = 'Based on the following argument, identify the following elements: premises, conclusion, propositions, type of argument, negation of propositions and validity.'
	alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

	### Instruction:
	{}

	### Input:
	{}

	### Response:"""
	prompt = alpaca_prompt.format(instruction, argument)
	with open("prompt.txt", "w") as f:
	f.write(prompt)
	properties = {
	"Premise 1": {"type": "string"},
	"Premise 2": {"type": "string"},
	"Conclusion": {"type": "string"},
	"Proposition 1": {"type": "string"},
	"Proposition 2": {"type": "string"},
	"Type of argument": {"type": "string"},
	"Negation of Proposition 1": {"type": "string"},
	"Negation of Proposition 2": {"type": "string"},
	"Validity": {"type": "boolean"},
	}
	analysis = llmCompletion(prompt,
	max_tokens=1000,
	temperature=0,
	json_schema={
	"type": "object",
	"properties": properties,
	"required": list(properties.keys()),
	},
	)
	return analysis['content']

	argument = "If it's wednesday it's cold, and it's cold, therefore it's wednesday."
	output = analyze_argument("If it's wednesday it's cold, and it's cold, therefore it's wednesday.")
	print(output)
	```
	Output:
	```
	{"Premise 1": "If it's wednesday it's cold",
	"Premise 2": "It's cold",
	"Conclusion": "It is Wednesday",
	"Proposition 1": "It is Wednesday",
	"Proposition 2": "It is cold",
	"Type of argument": "affirming the consequent",
	"Negation of Proposition 1": "It is not Wednesday",
	"Negation of Proposition 2": "It is not cold",
	"Validity": true}
	```

	### transformers 🤗
	First make sure to pip install -U transformers, then use the code below replacing the `argument` variable for the argument you want to parse:

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained("cris177/Qwen2-Simple-Arguments",
	device_map="auto",)
	tokenizer = AutoTokenizer.from_pretrained("cris177/Qwen2-Simple-Arguments")

	argument = "If it's wednesday it's cold, and it's cold, therefore it's wednesday."

	instruction = 'Based on the following argument, identify the following elements: premises, conclusion, propositions, type of argument, negation of propositions and validity.'
	alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

	### Instruction:
	{}

	### Input:
	{}

	### Response:"""
	prompt = alpaca_prompt.format(instruction, argument)
	input_ids = tokenizer(prompt, return_tensors="pt").to("cuda")

	outputs = model.generate(**input_ids, max_length=1000, num_return_sequences=1)
	print(tokenizer.decode(outputs[0]))
	```
	Output:
	```
	Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

	### Instruction:
	Based on the following argument, identify the following elements: premises, conclusion, propositions, type of argument, negation of propositions and validity.

	### Input:
	If it's wednesday it's cold, and it's cold, therefore it's wednesday.

	### Response:
	{"Premise 1": "If it's wednesday it's cold",
	"Premise 2": "It's cold",
	"Conclusion": "It is Wednesday",
	"Proposition 1": "It is Wednesday",
	"Proposition 2": "It is cold",
	"Type of argument": "affirming the consequent",
	"Negation of Proposition 1": "It is not Wednesday",
	"Negation of Proposition 2": "It is not cold",
	"Validity": "false"}<\|endoftext\|>
	```


	## Training Details

	### Training Data

	<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

	The model was trained on syntethic data, based on the following types of arguments:
	- Modus Ponen
	- Modus Tollen
	- Affirming Consequent
	- Disjunctive Syllogism
	- Denying Antecedent
	- Invalid Conditional Syllogism

	Each argument was constructed by selecting two random propositions (from a list of 400 propositions that was generated beforehand), choosing a type of argument and combining it all with randomly selected connectors (therefore, since, hence, thus, etc).

	50k arguments were created to train the model, and 100 to test.

	### Training Procedure

	<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->

	#### Preprocessing

	[More Information Needed]
	We converted the data to the Alpaca chat format before feeding it to the model.

	#### Training

	We used unsloth for memory reduced sped up training.

	We trained for one epoch.

	Less than 2.5 GB of VRAM were used for training, and it took 2.5 hours.

	## Evaluation

	<!-- This section describes the evaluation protocols and provides the results. -->

	The model obtains 100% train and test accuracy on our synthetic dataset.