ecocoder-cot / README.md
alrobles's picture
Upload folder using huggingface_hub
d52a3cb verified

ecocoder-cot-v1 — Ecological Chain-of-Thought Dataset

10 CoT traces for fine-tuning Nemotron on ecological reasoning + code generation.

Format

Each trace has 3 sections:

[CONTEXT] {paper abstract + method description}
[REASONING] {step-by-step ecological reasoning}
[CODE] {Python/R implementation}

Splits

Split Traces Size
train 8 ~40 KB
test 2 ~10 KB

Papers Covered

# Paper Method Code
1 GLOSSA (2505.05862) BART Bayesian SDM R
2 MaskSDM (2503.13057) DL + Shapley values PyTorch
3 GeoThinneR (2505.07867) kd-tree thinning R
4 HeteroGNN (2503.11900) Graph Neural Net PyTorch Geometric
5 CISO (2508.06704) Conditional SDM PyTorch
6 BioAnalyst (2507.09080) Foundation Model PyTorch
7 MultiScale (2411.04016) Multi-scale SDM PyTorch
8 LD-SDM (2312.08334) LLM + Taxonomy PyTorch + HF
9 PointProcess (2311.06755) Poisson Process R/INLA
10 EntropyBias (2508.02272) Shannon Entropy Python + R

Intended Use

Fine-tune nemotron-3-nano-30b-a3b (32.5B) with Unsloth 4-bit QLoRA on A100 80GB.

Training config

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="nvidia/Nemotron-3-Nano-30B-A3B-ablated",
    max_seq_length=4096,
    load_in_4bit=True,
)

Generation Pipeline

Papers (arXiv) → DeepSeek v4 Pro CoT → JSONL → HuggingFace Dataset → Unsloth QLoRA → ecocoder-nemotron

Next: v2 (100 traces)

Scale to 100 papers across 6 SDM categories: Bayesian methods, deep learning, spatial methods, taxonomic integration, data integration, bias correction.


Built with DeepSeek v4 Pro · ecoseek-litdump · alrobles