| # ecocoder-cot-v1 — Ecological Chain-of-Thought Dataset |
|
|
| **10 CoT traces** for fine-tuning Nemotron on ecological reasoning + code generation. |
|
|
| ## Format |
|
|
| Each trace has 3 sections: |
|
|
| ``` |
| [CONTEXT] {paper abstract + method description} |
| [REASONING] {step-by-step ecological reasoning} |
| [CODE] {Python/R implementation} |
| ``` |
|
|
| ## Splits |
|
|
| | Split | Traces | Size | |
| |-------|--------|------| |
| | train | 8 | ~40 KB | |
| | test | 2 | ~10 KB | |
|
|
| ## Papers Covered |
|
|
| | # | Paper | Method | Code | |
| |---|-------|--------|------| |
| | 1 | GLOSSA (2505.05862) | BART Bayesian SDM | R | |
| | 2 | MaskSDM (2503.13057) | DL + Shapley values | PyTorch | |
| | 3 | GeoThinneR (2505.07867) | kd-tree thinning | R | |
| | 4 | HeteroGNN (2503.11900) | Graph Neural Net | PyTorch Geometric | |
| | 5 | CISO (2508.06704) | Conditional SDM | PyTorch | |
| | 6 | BioAnalyst (2507.09080) | Foundation Model | PyTorch | |
| | 7 | MultiScale (2411.04016) | Multi-scale SDM | PyTorch | |
| | 8 | LD-SDM (2312.08334) | LLM + Taxonomy | PyTorch + HF | |
| | 9 | PointProcess (2311.06755) | Poisson Process | R/INLA | |
| | 10 | EntropyBias (2508.02272) | Shannon Entropy | Python + R | |
|
|
| ## Intended Use |
|
|
| Fine-tune `nemotron-3-nano-30b-a3b` (32.5B) with Unsloth 4-bit QLoRA on A100 80GB. |
|
|
| ### Training config |
|
|
| ```python |
| from unsloth import FastLanguageModel |
| |
| model, tokenizer = FastLanguageModel.from_pretrained( |
| model_name="nvidia/Nemotron-3-Nano-30B-A3B-ablated", |
| max_seq_length=4096, |
| load_in_4bit=True, |
| ) |
| ``` |
|
|
| ## Generation Pipeline |
|
|
| ``` |
| Papers (arXiv) → DeepSeek v4 Pro CoT → JSONL → HuggingFace Dataset → Unsloth QLoRA → ecocoder-nemotron |
| ``` |
|
|
| ## Next: v2 (100 traces) |
|
|
| Scale to 100 papers across 6 SDM categories: Bayesian methods, deep learning, spatial methods, taxonomic integration, data integration, bias correction. |
|
|
| --- |
|
|
| Built with DeepSeek v4 Pro · ecoseek-litdump · alrobles |
|
|