Adding Evaluation Results

81ca2a3 verified almost 2 years ago

10.7 kB

	---
	language:
	- en
	license: apache-2.0
	tags:
	- text-generation-inference
	- transformers
	- leaderboard
	- mistral
	- trl
	base_model: LeroyDyer/Mixtral_AI_CyberTron_DeepMind_III
	datasets:
	- gretelai/synthetic_text_to_sql
	- HuggingFaceTB/cosmopedia
	- teknium/OpenHermes-2.5
	- Open-Orca/SlimOrca
	- Open-Orca/OpenOrca
	- cognitivecomputations/dolphin-coder
	- databricks/databricks-dolly-15k
	- yahma/alpaca-cleaned
	- uonlp/CulturaX
	- mwitiderrick/SwahiliPlatypus
	- swahili
	- Rogendo/English-Swahili-Sentence-Pairs
	- ise-uiuc/Magicoder-Evol-Instruct-110K
	- meta-math/MetaMathQA
	- abacusai/ARC_DPO_FewShot
	- abacusai/MetaMath_DPO_FewShot
	- abacusai/HellaSwag_DPO_FewShot
	- HaltiaAI/Her-The-Movie-Samantha-and-Theodore-Dataset
	- gretelai/synthetic_text_to_sql
	- HuggingFaceTB/cosmopedia
	- teknium/OpenHermes-2.5
	- cognitivecomputations/dolphin-coder
	- databricks/databricks-dolly-15k
	- yahma/alpaca-cleaned
	- uonlp/CulturaX
	- mwitiderrick/SwahiliPlatypus
	- swahili
	- Rogendo/English-Swahili-Sentence-Pairs
	- ise-uiuc/Magicoder-Evol-Instruct-110K
	- meta-math/MetaMathQA
	metrics:
	- accuracy
	- bertscore
	- bleu
	- brier_score
	- cer
	- character
	- charcut_mt
	- chrf
	- code_eval
	y-Gene:
	- LeroyDyer/Mixtral_AI_DeepMind
	- LeroyDyer/Mixtral_AI_CyberUltron_DPO
	- LeroyDyer/Mixtral_AI_Chat_2.0
	- LeroyDyer/Mixtral_AI_DeepMedicalMind
	- LeroyDyer/Mixtral_AI_Samantha
	x-Gene:
	- LeroyDyer/Mixtral_AI_Chat_2.0
	- LeroyDyer/Mixtral_BioMedical
	- LeroyDyer/Mixtral_AI_Medic
	- LeroyDyer/Mixtral_Cyber_BioMedic
	- LeroyDyer/Mixtral_AI_DeepMedicalMind
	Variant:
	- LeroyDyer/MetaMath_LLM
	- LeroyDyer/TruthfulQA_LLM
	- LeroyDyer/HellaSwag_LLM
	- LeroyDyer/Mixtral_AI_DeepMedicalMind
	model-index:
	- name: Mixtral_AI_CyberTron_DeepMind_III_UFT
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: AI2 Reasoning Challenge (25-Shot)
	type: ai2_arc
	config: ARC-Challenge
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: acc_norm
	value: 61.86
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=LeroyDyer/Mixtral_AI_CyberTron_DeepMind_III_UFT
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: HellaSwag (10-Shot)
	type: hellaswag
	split: validation
	args:
	num_few_shot: 10
	metrics:
	- type: acc_norm
	value: 83.15
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=LeroyDyer/Mixtral_AI_CyberTron_DeepMind_III_UFT
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU (5-Shot)
	type: cais/mmlu
	config: all
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 61.95
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=LeroyDyer/Mixtral_AI_CyberTron_DeepMind_III_UFT
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: TruthfulQA (0-shot)
	type: truthful_qa
	config: multiple_choice
	split: validation
	args:
	num_few_shot: 0
	metrics:
	- type: mc2
	value: 49.41
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=LeroyDyer/Mixtral_AI_CyberTron_DeepMind_III_UFT
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Winogrande (5-shot)
	type: winogrande
	config: winogrande_xl
	split: validation
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 77.98
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=LeroyDyer/Mixtral_AI_CyberTron_DeepMind_III_UFT
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GSM8k (5-shot)
	type: gsm8k
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 51.86
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=LeroyDyer/Mixtral_AI_CyberTron_DeepMind_III_UFT
	name: Open LLM Leaderboard
	---


	# ::: DEEP MIND PROJECT :::
	OH MY GOSH , GOOD WOW!
	ARE WE MAKING BRAINS NOW!!!!! (Contact me to Sponser me PLEASE)

	---- I NEED A CLOUD TO DESIGN THIS MIND! --(freeColab takes years! - i need the large data-sets in...
	which need a few days on a server fine tuning until fully complete ! i NEED A COLABORATOR!! )

	- Mistral models are GREAT!!!!!!! - we have supassed ChatGPT : (- without langchain!!!! )
	- I now have amethodolgy to add any functionality to the model !
	- we are in the future now :
	- we do not want to code or buy software!


	Lovely model !!! Very knowledgeabe :: (sometimes requires coaxing !! but it has options to choose from so for a single thing there may be multiple response so you can ask in another way !
	good for oneshot prompts and it actually uses the history in the chat !!! )

	but we have TASKS!

	we can now ask the model to perform these tasks and get the right output without special programming !

	take a model !!! This model CONVERGES on ANYTHING! ( i also previously trained it will the clip training for captioning also but never used it ! but i pluged it in and it was spot on!(so if you choose to incorperate the model into a decoder/encoder model (vision) its ready !))

	VERY HAPPY! (need more good data (my problem acually is not data (its converting it to json from CSV and other forms! (pre-structured ))))

	here we begin the models for Deep mind : Whoop! as we move forwards we have begun to let the model teach itself like a child and optimize!


	this model created from the first trained models : deepmind!
	these models contain:

	## thoughts and processes :

	## SelfRAG:

	## Agent Generation:

	## Chain of thoughts :

	## Deep thinking and memory recall:




	## Training Prompt version - Working GREAT! -(cant blow my own horn enough!!!!)


	checks itsef discussing complex questions (question it does not know the answer to ... it trys to discuss with itself to find a result(sometimes unsucessfully))

	It generates Mini agents to perform small tasks such as entity recognition; step by step definitions, write psuedo codebases , generare uscases... perform calculations, analize content

	It thinks.... sometimes sarcasim , sometimes reflection... sometimes random thoughts ...

	it has personalitys : by installing various long discussions with chat gpt in persona it weas able to generate role coversation data, which was added to its conversation chat Q/A; as well as a datset from the samantha tv show ... and HER!.... so it is a personal assistant and very friendly;

	It has been really training mainly on coding datasets and medical information : from experiments to research to patient/doctor .. to diagnosis ... to problem solving :

	it has been trained to be a counseller and assist with psycological problems :: empathtetic discussion :

	this one has its own thoughts despite the prompt given : (if you allow the thought prompt it will display the thoughts)

	this is a highly focused model :


	### Methodology:
	many functions such as defining words andnlp task we also added via datsets and very complexed datstructures and prompts :
	These prompts are removed after training and standard alpaca training given on top:(this enables for the previous highly over fit task to become embedded underneath the previous layer):
	its important to Change Lora configuration for Embedding layers within the model as well as fine tuning above previous training:
	Usually i deploy a factor of 8 calcuculation for my loras by this one i chose factor of 9 (9-18/18/36) .... which actually trained so smoothly that i was able to train many different datsets in a signle sitting ; to below 0.9 all varioations of the alpaca prompt !
	after testing the was absolutly 0 loss from previous knowledge as well as enhancing some responses and providing comparitive responses for others;
	I personally use a topK of 1000....
	this allows the model to have many choices (this is the context window of results),
	i put my topP to 0.68(68%)....
	hence it will select from that percentage of probabiltys...
	enabling for my temp to be 1 ..
	therfore it will normalize the selected quartile of next probablity selection enabling for the lower probabiltys to have a scaled chace in being selected :
	It is important to have a degree of randomness in the respopnse or you will ask the same question and get the same answer ! .... we need varied answer to ome querys and focues for other ? how do we do this ?..... Duplicates!!!!! raising the probability of some information by repetition : as this is how the human learns truth ! truth is that which has been repeated so many times it cannot be disputed!
	hence some information being absolute and others being transient and constantly updateing:
	As a predictve model it needs to be ables to have the ability to calculate and predicte and cclassify as wel as recall exact information :
	hence when utilizing a rag : the conversation history is the dats to be fine tuned into the model as frequent data!
	as well as producing multiple simular querys to query the rag system for Q/A pairs : also to be updted onto the model :
	as we are in this development period we are focused on BRAIN cureently .......



	# Uploaded model

	- Developed by: LeroyDyer
	- License: apache-2.0
	- Finetuned from model : LeroyDyer/Mixtral_AI_CyberTron_DeepMind_III

	This mistral model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

	[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_LeroyDyer__Mixtral_AI_CyberTron_DeepMind_III_UFT)

	\| Metric \|Value\|
	\|---------------------------------\|----:\|
	\|Avg. \|64.37\|
	\|AI2 Reasoning Challenge (25-Shot)\|61.86\|
	\|HellaSwag (10-Shot) \|83.15\|
	\|MMLU (5-Shot) \|61.95\|
	\|TruthfulQA (0-shot) \|49.41\|
	\|Winogrande (5-shot) \|77.98\|
	\|GSM8k (5-shot) \|51.86\|