| --- |
| license: apache-2.0 |
| language: |
| - en |
| metrics: |
| - accuracy |
| base_model: |
| - khazarai/BioGenesis-ToT |
| pipeline_tag: text-generation |
| tags: |
| - biology |
| - medical |
| - science |
| - unsloth |
| - sft |
| --- |
| |
| # Model Card for BioGenesis-ToT |
|
|
|
|
|  |
|
|
| - **Overall Success Rate**: |
| - khazarai/BioGenesis-ToT: **51.45** |
| - Qwen/Qwen3-1.7B: **46.82** |
| |
| - **Benchmark**: [emre/TARA_Turkish_LLM_Benchmark](https://huggingface.co/datasets/emre/TARA_Turkish_LLM_Benchmark) |
|
|
|
|
| GGUF version of https://huggingface.co/khazarai/BioGenesis-ToT |
|
|
| BioGenesis-ToT is a fine-tuned version of Qwen3-1.7B, optimized for mechanistic reasoning and explanatory understanding in biology. |
| This model has been trained on the [moremilk/ToT-Biology](https://huggingface.co/datasets/moremilk/ToT-Biology) dataset β a reasoning-rich collection of biology questions emphasizing why and how processes occur, rather than simply what happens. |
| |
| The model demonstrates strong capabilities in: |
| - Structured biological explanation generation |
| - Logical and causal reasoning |
| - Chain-of-thought (ToT) reasoning in scientific contexts |
| - Interdisciplinary biological analysis (e.g., bioengineering, medicine, ecology) |
|
|
| ## Uses |
|
|
| ### π Intended Use |
|
|
| - Educational and scientific explanation generation |
| - Biological reasoning and tutoring applications |
| - Model interpretability research |
| - Training datasets for reasoning-focused LLMs |
|
|
|
|
| ### β οΈ Limitations |
|
|
| - Not a replacement for expert biological judgment |
| - May occasionally over-generalize or simplify complex phenomena |
| - Limited to reasoning quality within biological contexts (not trained for creative writing or coding) |
|
|
|
|
| ## π§ͺ Dataset: moremilk/ToT-Biology |
|
|
| The ToT-Biology dataset emphasizes mechanistic understanding and explanatory reasoning within biology. |
| Itβs designed to help AI models develop interpretable, step-by-step reasoning abilities for complex biological systems. |
|
|
| It spans a wide range of biological subdomains: |
| - Foundational biology: Cell biology, genetics, evolution, and ecology |
| - Advanced topics: Systems biology, synthetic biology, computational biophysics |
| - Applied domains: Medicine, agriculture, bioengineering, and environmental science |
|
|
| Dataset features include: |
|
|
| - π§© Logical reasoning styles β deductive, inductive, abductive, causal, and analogical |
| - π§ Problem-solving techniques β decomposition, elimination, systems thinking, trade-off analysis |
| - π¬ Real-world problem contexts β experiment design, pathway mapping, and data interpretation |
| - π Practical relevance β bridging theoretical reasoning and applied biological insight |
| - π Educational focus β for both AI training and human learning in scientific reasoning |
|
|
|
|
| ## π§ Objective |
|
|
| This fine-tuning project aims to build an interpretable reasoning model capable of: |
|
|
| - Explaining biological mechanisms clearly and coherently |
| - Demonstrating transparent, step-by-step thought processes |
| - Applying logical reasoning techniques to biological and interdisciplinary problems |
| - Supporting educational and research use cases where reasoning transparency matters |
|
|
|
|
| ## Citation |
|
|
| **BibTeX:** |
| ```bibtex |
| @model{khazarai/BioGenesis-ToT, |
| title = {BioGenesis-ToT: A Fine-Tuned Model for Explanatory Biological Reasoning}, |
| author = {Rustam Shiriyev}, |
| year = {2025}, |
| publisher = {Hugging Face}, |
| base_model = {Qwen3-1.7B}, |
| dataset = {moremilk/ToT-Biology}, |
| license = {MIT} |
| } |
| ``` |