File size: 3,468 Bytes
2463036
 
 
 
 
 
 
 
f1bffab
2463036
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e666033
 
2463036
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
---
license: apache-2.0
language:
- en
metrics:
- accuracy
base_model:
- khazarai/BioGenesis-ToT
pipeline_tag: text-generation
tags:
- biology
- medical
- science
- unsloth
- sft
---

# Model Card for BioGenesis-ToT


![alt="General Benchmark Comparison Chart"](benchmark/BioGenesis-ToT.png)

- **Overall Success Rate**:
  - khazarai/BioGenesis-ToT: **51.45**
  - Qwen/Qwen3-1.7B: **46.82**
 
- **Benchmark**: [emre/TARA_Turkish_LLM_Benchmark](https://huggingface.co/datasets/emre/TARA_Turkish_LLM_Benchmark)


GGUF version of https://huggingface.co/khazarai/BioGenesis-ToT

BioGenesis-ToT is a fine-tuned version of Qwen3-1.7B, optimized for mechanistic reasoning and explanatory understanding in biology.
This model has been trained on the [moremilk/ToT-Biology](https://huggingface.co/datasets/moremilk/ToT-Biology) dataset β€” a reasoning-rich collection of biology questions emphasizing why and how processes occur, rather than simply what happens.
 
The model demonstrates strong capabilities in:
- Structured biological explanation generation
- Logical and causal reasoning
- Chain-of-thought (ToT) reasoning in scientific contexts
- Interdisciplinary biological analysis (e.g., bioengineering, medicine, ecology)

## Uses

### πŸš€ Intended Use

- Educational and scientific explanation generation
- Biological reasoning and tutoring applications
- Model interpretability research
- Training datasets for reasoning-focused LLMs


### ⚠️ Limitations

- Not a replacement for expert biological judgment
- May occasionally over-generalize or simplify complex phenomena
- Limited to reasoning quality within biological contexts (not trained for creative writing or coding)


## πŸ§ͺ Dataset: moremilk/ToT-Biology

The ToT-Biology dataset emphasizes mechanistic understanding and explanatory reasoning within biology.
It’s designed to help AI models develop interpretable, step-by-step reasoning abilities for complex biological systems.

It spans a wide range of biological subdomains:
- Foundational biology: Cell biology, genetics, evolution, and ecology
- Advanced topics: Systems biology, synthetic biology, computational biophysics
- Applied domains: Medicine, agriculture, bioengineering, and environmental science

Dataset features include:

- 🧩 Logical reasoning styles β€” deductive, inductive, abductive, causal, and analogical
- 🧠 Problem-solving techniques β€” decomposition, elimination, systems thinking, trade-off analysis
- πŸ”¬ Real-world problem contexts β€” experiment design, pathway mapping, and data interpretation
- 🌍 Practical relevance β€” bridging theoretical reasoning and applied biological insight
- πŸŽ“ Educational focus β€” for both AI training and human learning in scientific reasoning


## 🧭 Objective

This fine-tuning project aims to build an interpretable reasoning model capable of:

- Explaining biological mechanisms clearly and coherently
- Demonstrating transparent, step-by-step thought processes
- Applying logical reasoning techniques to biological and interdisciplinary problems
- Supporting educational and research use cases where reasoning transparency matters


## Citation

**BibTeX:**
```bibtex
@model{khazarai/BioGenesis-ToT,
  title     = {BioGenesis-ToT: A Fine-Tuned Model for Explanatory Biological Reasoning},
  author    = {Rustam Shiriyev},
  year      = {2025},
  publisher = {Hugging Face},
  base_model = {Qwen3-1.7B},
  dataset   = {moremilk/ToT-Biology},
  license   = {MIT}
}
```