File size: 5,535 Bytes

---
license: apache-2.0
language:
- en
library_name: transformers
tags:
- scientific-reasoning
- chain-of-thought
- unsloth
- causal-lm
- instruction-tuned
- reasoning
base_model: LiquidAI/LFM2-2.6B
datasets:
- nvidia/OpenScienceReasoning-2
---

# SciReason-LFM2-2.6B

[![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Dataset](https://img.shields.io/badge/Dataset-OpenScienceReasoning--2-orange.svg)](https://huggingface.co/datasets/nvidia/OpenScienceReasoning-2)
[![Base Model](https://img.shields.io/badge/Base%20Model-LFM2--2.6B-purple.svg)](https://huggingface.co/LiquidAI/LFM2-2.6B)
[![Fine-tuned with](https://img.shields.io/badge/Fine--tuned%20with-Unsloth-green.svg)](https://github.com/unsloth/unsloth)
[![Author](https://img.shields.io/badge/Author-yasserrmd-grey.svg)](https://huggingface.co/yasserrmd)

<img src="banner.png" />

---

## Model Overview
**SciReason-LFM2-2.6B** is a fine-tuned version of **LiquidAI/LFM2-2.6B**, trained with **Unsloth** on the **OpenScienceReasoning-2** dataset.  
The fine-tuning enhances the base model’s ability to handle **multi-step scientific reasoning** and produce coherent **chain-of-thought explanations**.  

---

## Training Configuration
- **Framework**: [Unsloth](https://github.com/unsloth/unsloth)  
- **Dataset**: [nvidia/OpenScienceReasoning-2](https://huggingface.co/datasets/nvidia/OpenScienceReasoning-2)  
- **Examples**: ~11,000  
- **Epochs**: 1  
- **Total Steps**: 1,375  
- **Batch size per device**: 2  
- **Gradient Accumulation Steps**: 4  
- **Effective Batch Size**: 8  
- **Trainable Parameters**: ~20M (LoRA / PEFT with Unsloth smart offloading)  
- **Optimizer**: AdamW  
- **Learning Rate**: 2e-4  
- **Weight Decay**: 0.01  
- **LR Scheduler**: cosine with warmup  
- **Hardware**: Single GPU (Unsloth offloading enabled)  

---

## Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model_id = "yasserrmd/SciReason-LFM2-2.6B"
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype="bfloat16",
#    attn_implementation="flash_attention_2" <- uncomment on compatible GPU
)
tokenizer = AutoTokenizer.from_pretrained(model_id)

# Generate answer
prompt = """	
Solve the following problem. Make sure to put the answer (and only answer) inside \boxed{}.

Based on analysis of multinational aeromedical databases (e.g., EASA's EMPR, FAA's CAMI database, and military longitudinal studies), which statement accurately characterizes a fundamental limitation in definitively establishing cause-and-effect relationships for cardiovascular morbidity trends among commercial aircrew?

A: Stratified sampling protocols universally eliminate survivorship bias
B: Retroactive harmonization of biochemical markers across jurisdictions enables precise meta-analysis
C: Inability to fully adjust for dominant confounding variables (e.g., socioeconomic status, undisclosed supplement use)
D: Cohort studies consistently show declining age-adjusted myocardial infarction rates compared to the general population
E: Mandatory polysomnography data provides complete correction for sleep disorder comorbidities
F: Radiation dose metrics exhibit a linear correlation with arrhythmia incidence in jet aircraft pilots
G: Genome-wide association studies have identified fully penetrant monogenic risk variants specific to aviators
H: Continuous blood pressure monitoring during all flight phases yields statistically significant longitudinal datasets
I: Pharmacokinetic interactions between hypoxia and statins are conclusively established in CRF models
J: Regulatory divergence causes morbidity rates to universally decline across all regions after 2018"""
input_ids = tokenizer.apply_chat_template(
    [{
    "role":"system",
    "content":""" 
    You are a reasoning assistant.

When solving problems:
- Always place your reasoning inside think tags.
- Think in structured steps, but keep it concise (3–4 short steps maximum).
- Avoid repeating yourself or giving unnecessary background.
- Use bullet points or brief numbered steps for clarity inside think tag.
- After think end tag, provide only the final answer clearly and directly.
- Do not include reasoning outside of the think tags.


    """
},
        {"role": "user", "content": prompt}],
    add_generation_prompt=True,
    return_tensors="pt",
    tokenize=True,
).to(model.device)

output = model.generate(
    input_ids,
    do_sample=True,
    temperature=0.3,
    min_p=0.15,
    repetition_penalty=1.05,
    max_new_tokens=1024,
)

print(tokenizer.decode(output[0], skip_special_tokens=False))

# <|startoftext|><|im_start|>user
# What is C. elegans?<|im_end|>
# <|im_start|>assistant
# C. elegans, also known as Caenorhabditis elegans, is a small, free-living
# nematode worm (roundworm) that belongs to the phylum Nematoda.

````

---

## Intended Use

This model is designed for:

* **Scientific reasoning tasks**
* **Educational Q\&A**
* **Step-by-step logical problem solving**

⚠️ Disclaimer: Not intended for clinical or legal decision-making.

---

## License

Apache-2.0 License. See [LICENSE](https://opensource.org/licenses/Apache-2.0).

---

## Acknowledgements

* **LiquidAI** for [LFM2-2.6B](https://huggingface.co/LiquidAI/LFM2-2.6B)
* **NVIDIA** for [OpenScienceReasoning-2](https://huggingface.co/datasets/nvidia/OpenScienceReasoning-2)
* **Unsloth** for efficient fine-tuning with gradient offloading