saishshinde15
/

Clyrai_Vortex_Reasoning

@@ -1,23 +1,98 @@
 ---
-base_model: saishshinde15/TethysAI_Base_Reasoning
 tags:
 - text-generation-inference
 - transformers
-- unsloth
 - qwen2
 - trl
-- sft
 license: apache-2.0
 language:
 - en
 ---
-# Uploaded  model
-- **Developed by:** saishshinde15
-- **License:** apache-2.0
-- **Finetuned from model :** saishshinde15/TethysAI_Base_Reasoning
-This qwen2 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
-[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 ---
+base_model:
+- saishshinde15/TethysAI_Base_Reasoning
 tags:
 - text-generation-inference
 - transformers
 - qwen2
 - trl
+- reasoning
+- deepseekR1
+- advanced-finetuning
 license: apache-2.0
 language:
 - en
+pipeline_tag: text-generation
 ---
+# TethysAI Vortex Reasoning
+- **Developed by:** TethysAI
+- **License:** apache-2.0
+- **Fine-tuned from:** [saishshinde15/TethysAI_Base_Reasoning](https://huggingface.co/saishshinde15/TethysAI_Base_Reasoning)
+- **Category:** Experimental, Research
+## **Introduction**
+TethysAI Vortex Reasoning is an **experimental model** that advances the structured reasoning capabilities pioneered by [TethysAI Base Reasoning](https://huggingface.co/saishshinde15/TethysAI_Base_Reasoning). While the Base Reasoning model utilized **Generalized Reinforced Policy Optimization (GRPO)** to enhance step-by-step logical thought processes similar to **DeepSeek-R1**, this model takes a different approach—**eliminating GRPO and instead relying on high-end Supervised Fine-Tuning (SFT) techniques**.
+The core objective was to investigate whether **deep reasoning and self-questioning behavior could emerge purely through SFT on high-quality datasets**. The results were highly promising: the model successfully **questions itself internally**, improves reasoning depth, and consistently generates structured, logical responses.
+---
+## **Key Features**
+### **1️⃣ Advanced Reasoning Without GRPO**
+This model **does not rely on GRPO** yet **achieves similar self-reflective thought processes**, proving that structured reasoning can be induced through **high-quality SFT alone**.
+### **2️⃣ Self-Questioning and Iterative Thinking**
+The model **actively asks itself intermediate questions before answering**, mimicking the deep **reflection-based thought process** of models like DeepSeek-R1. This leads to **more reliable** and **well-structured** responses.
+### **3️⃣ High-Quality SFT on a Curated Dataset**
+To compensate for the lack of reinforcement learning, we used an **extensive dataset** tailored for deep reasoning. This dataset includes:
+- **Mathematical proofs & logical puzzles**
+- **Complex multi-step problem-solving tasks**
+- **Philosophical and ethical reasoning**
+- **Scientific hypothesis evaluation**
+### **4️⃣ Implicit Use of `<think>` and `<answer>` Tokens**
+The model internally uses **special reasoning markers** (`<think>` and `<answer>`) to structure its responses, though these may not always be visible in the final output. This ensures a **consistent and methodical approach** to answering questions.
+### **5️⃣ Part of the TethysAI Vortex Family**
+This model belongs to the **TethysAI Vortex series**, a collection of fine-tuned models pushing the boundaries of **SFT-based reasoning without reinforcement learning**.
+---
+## **Breakthrough Insights**
+| Feature                          | Base Reasoning (GRPO) ✅ | Vortex Reasoning (SFT-Only) ✅ |
+|----------------------------------|------------------------|----------------------------|
+| Structured Thought Process      | ✅ Yes (GRPO)         | ✅ Yes (SFT)              |
+| Self-Reflection & Questioning    | ✅ Strong             | ✅ Equally Strong        |
+| GRPO-Free Optimization          | ❌ No                  | ✅ Achieved via SFT       |
+| Step-by-Step Problem Solving    | ✅ Yes                 | ✅ Yes                    |
+| Use of `<think>` and `<answer>`  | ✅ Explicit           | ✅ Implicit (Internal Use) |
+**Key Takeaway:** This experiment confirms that **reinforcement learning is not the only pathway to advanced reasoning capabilities**—with the right dataset and SFT strategies, models can **self-reflect and logically deduce answers** in a structured manner.
+---
+## **How to Use**
+### **Running with Transformers**
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import torch
+# Load model & tokenizer
+model_name = "saishshinde15/TethysAI_Vortex_Reasoning"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(model_name).to("cuda")
+# Prepare input prompt
+messages = [
+    {"role": "system", "content": "You are an AI with strong reasoning skills. Provide clear, step-by-step answers."},
+    {"role": "user", "content": "If x + 3 = 10, what is x?"}
+]
+# Apply chat template and tokenize
+prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to("cuda")
+# Generate response
+outputs = model.generate(input_ids, max_new_tokens=512)
+response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(response)
+```