--- license: apache-2.0 language: - en library_name: transformers tags: - scientific-reasoning - chain-of-thought - unsloth - causal-lm - instruction-tuned - reasoning base_model: LiquidAI/LFM2-2.6B datasets: - nvidia/OpenScienceReasoning-2 --- # SciReason-LFM2-2.6B [![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) [![Dataset](https://img.shields.io/badge/Dataset-OpenScienceReasoning--2-orange.svg)](https://huggingface.co/datasets/nvidia/OpenScienceReasoning-2) [![Base Model](https://img.shields.io/badge/Base%20Model-LFM2--2.6B-purple.svg)](https://huggingface.co/LiquidAI/LFM2-2.6B) [![Fine-tuned with](https://img.shields.io/badge/Fine--tuned%20with-Unsloth-green.svg)](https://github.com/unsloth/unsloth) [![Author](https://img.shields.io/badge/Author-yasserrmd-grey.svg)](https://huggingface.co/yasserrmd) --- ## Model Overview **SciReason-LFM2-2.6B** is a fine-tuned version of **LiquidAI/LFM2-2.6B**, trained with **Unsloth** on the **OpenScienceReasoning-2** dataset. The fine-tuning enhances the base model’s ability to handle **multi-step scientific reasoning** and produce coherent **chain-of-thought explanations**. --- ## Training Configuration - **Framework**: [Unsloth](https://github.com/unsloth/unsloth) - **Dataset**: [nvidia/OpenScienceReasoning-2](https://huggingface.co/datasets/nvidia/OpenScienceReasoning-2) - **Examples**: ~11,000 - **Epochs**: 1 - **Total Steps**: 1,375 - **Batch size per device**: 2 - **Gradient Accumulation Steps**: 4 - **Effective Batch Size**: 8 - **Trainable Parameters**: ~20M (LoRA / PEFT with Unsloth smart offloading) - **Optimizer**: AdamW - **Learning Rate**: 2e-4 - **Weight Decay**: 0.01 - **LR Scheduler**: cosine with warmup - **Hardware**: Single GPU (Unsloth offloading enabled) --- ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer # Load model and tokenizer model_id = "yasserrmd/SciReason-LFM2-2.6B" model = AutoModelForCausalLM.from_pretrained( model_id, device_map="auto", torch_dtype="bfloat16", # attn_implementation="flash_attention_2" <- uncomment on compatible GPU ) tokenizer = AutoTokenizer.from_pretrained(model_id) # Generate answer prompt = """ Solve the following problem. Make sure to put the answer (and only answer) inside \boxed{}. Based on analysis of multinational aeromedical databases (e.g., EASA's EMPR, FAA's CAMI database, and military longitudinal studies), which statement accurately characterizes a fundamental limitation in definitively establishing cause-and-effect relationships for cardiovascular morbidity trends among commercial aircrew? A: Stratified sampling protocols universally eliminate survivorship bias B: Retroactive harmonization of biochemical markers across jurisdictions enables precise meta-analysis C: Inability to fully adjust for dominant confounding variables (e.g., socioeconomic status, undisclosed supplement use) D: Cohort studies consistently show declining age-adjusted myocardial infarction rates compared to the general population E: Mandatory polysomnography data provides complete correction for sleep disorder comorbidities F: Radiation dose metrics exhibit a linear correlation with arrhythmia incidence in jet aircraft pilots G: Genome-wide association studies have identified fully penetrant monogenic risk variants specific to aviators H: Continuous blood pressure monitoring during all flight phases yields statistically significant longitudinal datasets I: Pharmacokinetic interactions between hypoxia and statins are conclusively established in CRF models J: Regulatory divergence causes morbidity rates to universally decline across all regions after 2018""" input_ids = tokenizer.apply_chat_template( [{ "role":"system", "content":""" You are a reasoning assistant. When solving problems: - Always place your reasoning inside think tags. - Think in structured steps, but keep it concise (3–4 short steps maximum). - Avoid repeating yourself or giving unnecessary background. - Use bullet points or brief numbered steps for clarity inside think tag. - After think end tag, provide only the final answer clearly and directly. - Do not include reasoning outside of the think tags. """ }, {"role": "user", "content": prompt}], add_generation_prompt=True, return_tensors="pt", tokenize=True, ).to(model.device) output = model.generate( input_ids, do_sample=True, temperature=0.3, min_p=0.15, repetition_penalty=1.05, max_new_tokens=1024, ) print(tokenizer.decode(output[0], skip_special_tokens=False)) # <|startoftext|><|im_start|>user # What is C. elegans?<|im_end|> # <|im_start|>assistant # C. elegans, also known as Caenorhabditis elegans, is a small, free-living # nematode worm (roundworm) that belongs to the phylum Nematoda. ```` --- ## Intended Use This model is designed for: * **Scientific reasoning tasks** * **Educational Q\&A** * **Step-by-step logical problem solving** ⚠️ Disclaimer: Not intended for clinical or legal decision-making. --- ## License Apache-2.0 License. See [LICENSE](https://opensource.org/licenses/Apache-2.0). --- ## Acknowledgements * **LiquidAI** for [LFM2-2.6B](https://huggingface.co/LiquidAI/LFM2-2.6B) * **NVIDIA** for [OpenScienceReasoning-2](https://huggingface.co/datasets/nvidia/OpenScienceReasoning-2) * **Unsloth** for efficient fine-tuning with gradient offloading