ecocoder-cot-v1 — Ecological Chain-of-Thought Dataset

10 CoT traces for fine-tuning Nemotron on ecological reasoning + code generation.

Format

Each trace has 3 sections:

[CONTEXT] {paper abstract + method description}
[REASONING] {step-by-step ecological reasoning}
[CODE] {Python/R implementation}

Splits

Split	Traces	Size
train	8	~40 KB
test	2	~10 KB

Papers Covered

#	Paper	Method	Code
1	GLOSSA (2505.05862)	BART Bayesian SDM	R
2	MaskSDM (2503.13057)	DL + Shapley values	PyTorch
3	GeoThinneR (2505.07867)	kd-tree thinning	R
4	HeteroGNN (2503.11900)	Graph Neural Net	PyTorch Geometric
5	CISO (2508.06704)	Conditional SDM	PyTorch
6	BioAnalyst (2507.09080)	Foundation Model	PyTorch
7	MultiScale (2411.04016)	Multi-scale SDM	PyTorch
8	LD-SDM (2312.08334)	LLM + Taxonomy	PyTorch + HF
9	PointProcess (2311.06755)	Poisson Process	R/INLA
10	EntropyBias (2508.02272)	Shannon Entropy	Python + R

Intended Use

Fine-tune nemotron-3-nano-30b-a3b (32.5B) with Unsloth 4-bit QLoRA on A100 80GB.

Training config

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="nvidia/Nemotron-3-Nano-30B-A3B-ablated",
    max_seq_length=4096,
    load_in_4bit=True,
)

Generation Pipeline

Papers (arXiv) → DeepSeek v4 Pro CoT → JSONL → HuggingFace Dataset → Unsloth QLoRA → ecocoder-nemotron

Next: v2 (100 traces)

Scale to 100 papers across 6 SDM categories: Bayesian methods, deep learning, spatial methods, taxonomic integration, data integration, bias correction.

Built with DeepSeek v4 Pro · ecoseek-litdump · alrobles

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support