saishshinde15 commited on
Commit
97e0c62
·
verified ·
1 Parent(s): 10e2a91

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +84 -9
README.md CHANGED
@@ -1,23 +1,98 @@
1
  ---
2
- base_model: saishshinde15/TethysAI_Base_Reasoning
 
3
  tags:
4
  - text-generation-inference
5
  - transformers
6
- - unsloth
7
  - qwen2
8
  - trl
9
- - sft
 
 
10
  license: apache-2.0
11
  language:
12
  - en
 
13
  ---
14
 
15
- # Uploaded model
16
 
17
- - **Developed by:** saishshinde15
18
- - **License:** apache-2.0
19
- - **Finetuned from model :** saishshinde15/TethysAI_Base_Reasoning
 
20
 
21
- This qwen2 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
22
 
23
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model:
3
+ - saishshinde15/TethysAI_Base_Reasoning
4
  tags:
5
  - text-generation-inference
6
  - transformers
 
7
  - qwen2
8
  - trl
9
+ - reasoning
10
+ - deepseekR1
11
+ - advanced-finetuning
12
  license: apache-2.0
13
  language:
14
  - en
15
+ pipeline_tag: text-generation
16
  ---
17
 
18
+ # TethysAI Vortex Reasoning
19
 
20
+ - **Developed by:** TethysAI
21
+ - **License:** apache-2.0
22
+ - **Fine-tuned from:** [saishshinde15/TethysAI_Base_Reasoning](https://huggingface.co/saishshinde15/TethysAI_Base_Reasoning)
23
+ - **Category:** Experimental, Research
24
 
25
+ ## **Introduction**
26
 
27
+ TethysAI Vortex Reasoning is an **experimental model** that advances the structured reasoning capabilities pioneered by [TethysAI Base Reasoning](https://huggingface.co/saishshinde15/TethysAI_Base_Reasoning). While the Base Reasoning model utilized **Generalized Reinforced Policy Optimization (GRPO)** to enhance step-by-step logical thought processes similar to **DeepSeek-R1**, this model takes a different approach—**eliminating GRPO and instead relying on high-end Supervised Fine-Tuning (SFT) techniques**.
28
+
29
+ The core objective was to investigate whether **deep reasoning and self-questioning behavior could emerge purely through SFT on high-quality datasets**. The results were highly promising: the model successfully **questions itself internally**, improves reasoning depth, and consistently generates structured, logical responses.
30
+
31
+ ---
32
+
33
+ ## **Key Features**
34
+
35
+ ### **1️⃣ Advanced Reasoning Without GRPO**
36
+ This model **does not rely on GRPO** yet **achieves similar self-reflective thought processes**, proving that structured reasoning can be induced through **high-quality SFT alone**.
37
+
38
+ ### **2️⃣ Self-Questioning and Iterative Thinking**
39
+ The model **actively asks itself intermediate questions before answering**, mimicking the deep **reflection-based thought process** of models like DeepSeek-R1. This leads to **more reliable** and **well-structured** responses.
40
+
41
+ ### **3️⃣ High-Quality SFT on a Curated Dataset**
42
+ To compensate for the lack of reinforcement learning, we used an **extensive dataset** tailored for deep reasoning. This dataset includes:
43
+ - **Mathematical proofs & logical puzzles**
44
+ - **Complex multi-step problem-solving tasks**
45
+ - **Philosophical and ethical reasoning**
46
+ - **Scientific hypothesis evaluation**
47
+
48
+ ### **4️⃣ Implicit Use of `<think>` and `<answer>` Tokens**
49
+ The model internally uses **special reasoning markers** (`<think>` and `<answer>`) to structure its responses, though these may not always be visible in the final output. This ensures a **consistent and methodical approach** to answering questions.
50
+
51
+ ### **5️⃣ Part of the TethysAI Vortex Family**
52
+ This model belongs to the **TethysAI Vortex series**, a collection of fine-tuned models pushing the boundaries of **SFT-based reasoning without reinforcement learning**.
53
+
54
+ ---
55
+
56
+ ## **Breakthrough Insights**
57
+
58
+ | Feature | Base Reasoning (GRPO) ✅ | Vortex Reasoning (SFT-Only) ✅ |
59
+ |----------------------------------|------------------------|----------------------------|
60
+ | Structured Thought Process | ✅ Yes (GRPO) | ✅ Yes (SFT) |
61
+ | Self-Reflection & Questioning | ✅ Strong | ✅ Equally Strong |
62
+ | GRPO-Free Optimization | ❌ No | ✅ Achieved via SFT |
63
+ | Step-by-Step Problem Solving | ✅ Yes | ✅ Yes |
64
+ | Use of `<think>` and `<answer>` | ✅ Explicit | ✅ Implicit (Internal Use) |
65
+
66
+ **Key Takeaway:** This experiment confirms that **reinforcement learning is not the only pathway to advanced reasoning capabilities**—with the right dataset and SFT strategies, models can **self-reflect and logically deduce answers** in a structured manner.
67
+
68
+ ---
69
+
70
+ ## **How to Use**
71
+
72
+ ### **Running with Transformers**
73
+
74
+ ```python
75
+ from transformers import AutoTokenizer, AutoModelForCausalLM
76
+ import torch
77
+
78
+ # Load model & tokenizer
79
+ model_name = "saishshinde15/TethysAI_Vortex_Reasoning"
80
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
81
+ model = AutoModelForCausalLM.from_pretrained(model_name).to("cuda")
82
+
83
+ # Prepare input prompt
84
+ messages = [
85
+ {"role": "system", "content": "You are an AI with strong reasoning skills. Provide clear, step-by-step answers."},
86
+ {"role": "user", "content": "If x + 3 = 10, what is x?"}
87
+ ]
88
+
89
+ # Apply chat template and tokenize
90
+ prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
91
+ input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to("cuda")
92
+
93
+ # Generate response
94
+ outputs = model.generate(input_ids, max_new_tokens=512)
95
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
96
+
97
+ print(response)
98
+ ```