dangermouse77
/

FromAnswerToQuestion-T5-small-60M

Model card Files Files and versions

FromAnswerToQuestion-T5-small-60M / README.md

dangermouse77's picture

Update README.md

f78d6a0 verified 11 months ago

|

history blame contribute delete

1.28 kB

	---
	license: apache-2.0
	datasets:
	- dangermouse77/invertedAnswerQuestion
	language:
	- en
	base_model:
	- google-t5/t5-small
	---
	A finetuned model based on t5-small (~60M parameters)
	that given an answer it responds with a question. I call it AQ model because it does the opposite of the usual question answering LLM model.

	This AQ-model is useful in coversations with another LLM-QA-chatbot, so that the conversation does not get stuck but moves continously to new topics.
	If you have an automatic conversation between two LLMs, one QA-LLM and one AQ-LLM the conversation will not get stuck and repetitive but continue forever :-)

	The model was finetuned starting from t5-small on a NVidia RTX 3090 in about 1 1/2h with a batch size of 8, using 4 GB of RAM on the GPU.
	As the GPU was running at 320W, the energy to train this model was 480Wh.
	The same model trained with a batch size of 32 gave sligthly worse results (14.3 RAM GB on the GPU in 1 hour).

	Test with

	./test_aqmodel.py "The hypothesis fails because of the decay with radius to the power of 3"

	Output: What is the reason the hypothesis is a faulty hypothesis?

	Last but not least: this model was finetuned with help of python scripts suggested by ChatGPT-4o 8-)
	Using vibe programming as Karpathy names it ...