# ecocoder-cot-v1 — Ecological Chain-of-Thought Dataset **10 CoT traces** for fine-tuning Nemotron on ecological reasoning + code generation. ## Format Each trace has 3 sections: ``` [CONTEXT] {paper abstract + method description} [REASONING] {step-by-step ecological reasoning} [CODE] {Python/R implementation} ``` ## Splits | Split | Traces | Size | |-------|--------|------| | train | 8 | ~40 KB | | test | 2 | ~10 KB | ## Papers Covered | # | Paper | Method | Code | |---|-------|--------|------| | 1 | GLOSSA (2505.05862) | BART Bayesian SDM | R | | 2 | MaskSDM (2503.13057) | DL + Shapley values | PyTorch | | 3 | GeoThinneR (2505.07867) | kd-tree thinning | R | | 4 | HeteroGNN (2503.11900) | Graph Neural Net | PyTorch Geometric | | 5 | CISO (2508.06704) | Conditional SDM | PyTorch | | 6 | BioAnalyst (2507.09080) | Foundation Model | PyTorch | | 7 | MultiScale (2411.04016) | Multi-scale SDM | PyTorch | | 8 | LD-SDM (2312.08334) | LLM + Taxonomy | PyTorch + HF | | 9 | PointProcess (2311.06755) | Poisson Process | R/INLA | | 10 | EntropyBias (2508.02272) | Shannon Entropy | Python + R | ## Intended Use Fine-tune `nemotron-3-nano-30b-a3b` (32.5B) with Unsloth 4-bit QLoRA on A100 80GB. ### Training config ```python from unsloth import FastLanguageModel model, tokenizer = FastLanguageModel.from_pretrained( model_name="nvidia/Nemotron-3-Nano-30B-A3B-ablated", max_seq_length=4096, load_in_4bit=True, ) ``` ## Generation Pipeline ``` Papers (arXiv) → DeepSeek v4 Pro CoT → JSONL → HuggingFace Dataset → Unsloth QLoRA → ecocoder-nemotron ``` ## Next: v2 (100 traces) Scale to 100 papers across 6 SDM categories: Bayesian methods, deep learning, spatial methods, taxonomic integration, data integration, bias correction. --- Built with DeepSeek v4 Pro · ecoseek-litdump · alrobles