Adding Evaluation Results

2a6a8e7 verified about 1 year ago

8.2 kB

	---
	license: apache-2.0
	language:
	- en
	base_model:
	- Qwen/Qwen2.5-7B-Instruct
	pipeline_tag: text-generation
	library_name: transformers
	tags:
	- opus
	- code
	- cot
	- lcot
	- LlaMa
	model-index:
	- name: Taurus-Opus-7B
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: IFEval (0-Shot)
	type: wis-k/instruction-following-eval
	split: train
	args:
	num_few_shot: 0
	metrics:
	- type: inst_level_strict_acc and prompt_level_strict_acc
	value: 42.23
	name: averaged accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FTaurus-Opus-7B
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: BBH (3-Shot)
	type: SaylorTwift/bbh
	split: test
	args:
	num_few_shot: 3
	metrics:
	- type: acc_norm
	value: 34.23
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FTaurus-Opus-7B
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MATH Lvl 5 (4-Shot)
	type: lighteval/MATH-Hard
	split: test
	args:
	num_few_shot: 4
	metrics:
	- type: exact_match
	value: 22.73
	name: exact match
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FTaurus-Opus-7B
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GPQA (0-shot)
	type: Idavidrein/gpqa
	split: train
	args:
	num_few_shot: 0
	metrics:
	- type: acc_norm
	value: 10.18
	name: acc_norm
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FTaurus-Opus-7B
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MuSR (0-shot)
	type: TAUR-Lab/MuSR
	args:
	num_few_shot: 0
	metrics:
	- type: acc_norm
	value: 14.22
	name: acc_norm
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FTaurus-Opus-7B
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU-PRO (5-shot)
	type: TIGER-Lab/MMLU-Pro
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 32.79
	name: accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FTaurus-Opus-7B
	name: Open LLM Leaderboard
	---

	# Taurus-Opus-7B

	Taurus-Opus-7B is built upon the LLaMA (Large Language Model Meta AI) 7B architecture, optimized to provide advanced reasoning capabilities while maintaining efficiency. With 7 billion parameters, it strikes a balance between performance and computational resource requirements. The model has been fine-tuned with a focus on chain-of-thought (CoT) reasoning, leveraging specialized datasets to enhance its problem-solving abilities. Taurus-Opus-7B is designed for tasks requiring logical reasoning, detailed explanations, and multi-step problem-solving, making it ideal for applications such as instruction-following, text generation, and coding assistance.


	# Key Features and Improvements

	1. Optimized Reasoning Capabilities:
	The model showcases significant improvements in context understanding, reasoning, and mathematical problem-solving through fine-tuning with long CoT datasets.

	2. Enhanced Instruction Following:
	Taurus-Opus-7B excels in generating long, coherent outputs (up to 4K tokens), understanding structured data, and producing structured outputs like JSON.

	3. Lightweight Efficiency:
	Its 7B parameter size makes it more resource-efficient compared to larger models while retaining high-quality performance for reasoning and content generation tasks.

	4. Long-Context Support:
	Offers support for long contexts of up to 64K tokens, enabling the handling of large datasets or extended conversations.

	5. Multilingual Proficiency:
	The model supports 20+ languages, including English, Spanish, French, German, Portuguese, Chinese, Japanese, and more, making it suitable for global applications.

	# Quickstart with transformers

	Here’s a code snippet to load Taurus-Opus-7B using the `transformers` library:

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "prithivMLmods/Taurus-Opus-7B"

	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype="auto",
	device_map="auto"
	)
	tokenizer = AutoTokenizer.from_pretrained(model_name)

	prompt = "Explain the importance of chain-of-thought reasoning in large language models."
	messages = [
	{"role": "system", "content": "You are a helpful assistant with expertise in logical reasoning and problem-solving."},
	{"role": "user", "content": prompt}
	]
	text = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True
	)
	model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

	generated_ids = model.generate(
	**model_inputs,
	max_new_tokens=512
	)
	generated_ids = [
	output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
	]

	response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
	```
	# Intended Use

	1. Reasoning and Context Understanding:
	Taurus-Opus-7B is tailored for complex reasoning tasks, contextual understanding, and solving problems requiring logical deduction.

	2. Mathematical Problem-Solving:
	Designed for advanced mathematical reasoning and calculations, making it valuable for education, research, and engineering tasks.

	3. Code Assistance:
	Provides robust coding support, including writing, debugging, and optimizing code across multiple programming languages.

	4. Data Analysis:
	Excels in analyzing structured data and generating structured outputs, aiding automation workflows and data-driven insights.

	5. Multilingual Support:
	Facilitates applications such as multilingual chatbots, content generation, and translation in 20+ languages.

	6. Extended Content Generation:
	Suitable for generating detailed reports, articles, and instructional guides, handling outputs up to 4K tokens.

	# Limitations

	1. Hardware Requirements:
	While more efficient than larger models, Taurus-Opus-7B still requires high-memory GPUs or TPUs for optimal performance.

	2. Language Quality Variations:
	Output quality may vary across supported languages, especially for less commonly used languages.

	3. Creativity Limitations:
	The model may sometimes generate repetitive or inconsistent results in creative or highly subjective tasks.

	4. Real-Time Knowledge Constraints:
	The model lacks awareness of events or knowledge updates beyond its training data.

	5. Prompt Dependency:
	Results heavily depend on the specificity and clarity of input prompts, requiring well-structured queries for the best performance.
	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/prithivMLmods__Taurus-Opus-7B-details)!
	Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=prithivMLmods%2FTaurus-Opus-7B&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)!

	\| Metric \|Value (%)\|
	\|-------------------\|--------:\|
	\|Average \| 26.06\|
	\|IFEval (0-Shot) \| 42.23\|
	\|BBH (3-Shot) \| 34.23\|
	\|MATH Lvl 5 (4-Shot)\| 22.73\|
	\|GPQA (0-shot) \| 10.18\|
	\|MuSR (0-shot) \| 14.22\|
	\|MMLU-PRO (5-shot) \| 32.79\|