Upload folder using huggingface_hub

Browse files

Files changed (8) hide show

README.md +65 -0
dataset_dict.json +1 -0
test/data-00000-of-00001.arrow +3 -0
test/dataset_info.json +20 -0
test/state.json +13 -0
train/data-00000-of-00001.arrow +3 -0
train/dataset_info.json +20 -0
train/state.json +13 -0

README.md ADDED Viewed

	@@ -0,0 +1,65 @@

+# ecocoder-cot-v1 — Ecological Chain-of-Thought Dataset
+**10 CoT traces** for fine-tuning Nemotron on ecological reasoning + code generation.
+## Format
+Each trace has 3 sections:
+```
+[CONTEXT] {paper abstract + method description}
+[REASONING] {step-by-step ecological reasoning}
+[CODE] {Python/R implementation}
+```
+## Splits
+| Split | Traces | Size |
+|-------|--------|------|
+| train | 8 | ~40 KB |
+| test  | 2 | ~10 KB |
+## Papers Covered
+| # | Paper | Method | Code |
+|---|-------|--------|------|
+| 1 | GLOSSA (2505.05862) | BART Bayesian SDM | R |
+| 2 | MaskSDM (2503.13057) | DL + Shapley values | PyTorch |
+| 3 | GeoThinneR (2505.07867) | kd-tree thinning | R |
+| 4 | HeteroGNN (2503.11900) | Graph Neural Net | PyTorch Geometric |
+| 5 | CISO (2508.06704) | Conditional SDM | PyTorch |
+| 6 | BioAnalyst (2507.09080) | Foundation Model | PyTorch |
+| 7 | MultiScale (2411.04016) | Multi-scale SDM | PyTorch |
+| 8 | LD-SDM (2312.08334) | LLM + Taxonomy | PyTorch + HF |
+| 9 | PointProcess (2311.06755) | Poisson Process | R/INLA |
+| 10 | EntropyBias (2508.02272) | Shannon Entropy | Python + R |
+## Intended Use
+Fine-tune `nemotron-3-nano-30b-a3b` (32.5B) with Unsloth 4-bit QLoRA on A100 80GB.
+### Training config
+```python
+from unsloth import FastLanguageModel
+model, tokenizer = FastLanguageModel.from_pretrained(
+    model_name="nvidia/Nemotron-3-Nano-30B-A3B-ablated",
+    max_seq_length=4096,
+    load_in_4bit=True,
+)
+```
+## Generation Pipeline
+```
+Papers (arXiv) → DeepSeek v4 Pro CoT → JSONL → HuggingFace Dataset → Unsloth QLoRA → ecocoder-nemotron
+```
+## Next: v2 (100 traces)
+Scale to 100 papers across 6 SDM categories: Bayesian methods, deep learning, spatial methods, taxonomic integration, data integration, bias correction.
+---
+Built with DeepSeek v4 Pro · ecoseek-litdump · alrobles

dataset_dict.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"splits": ["train", "test"]}

test/data-00000-of-00001.arrow ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0ae7ae563fc9e4b747a5e39eea560567f17aad8d00e89f21ca90bd85351e04a4
+size 11424

test/dataset_info.json ADDED Viewed

	@@ -0,0 +1,20 @@

+{
+  "citation": "",
+  "description": "",
+  "features": {
+    "paper_arxiv_id": {
+      "dtype": "string",
+      "_type": "Value"
+    },
+    "paper_title": {
+      "dtype": "string",
+      "_type": "Value"
+    },
+    "text": {
+      "dtype": "string",
+      "_type": "Value"
+    }
+  },
+  "homepage": "",
+  "license": ""
+}

test/state.json ADDED Viewed

	@@ -0,0 +1,13 @@

+{
+  "_data_files": [
+    {
+      "filename": "data-00000-of-00001.arrow"
+    }
+  ],
+  "_fingerprint": "14fd85a6723f7dc9",
+  "_format_columns": null,
+  "_format_kwargs": {},
+  "_format_type": null,
+  "_output_all_columns": false,
+  "_split": null
+}

train/data-00000-of-00001.arrow ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:30444aa114dc4cc1def8e9299d6d376ab2a4d65c16e152e058ad5581c59d12c4
+size 38656

train/dataset_info.json ADDED Viewed

	@@ -0,0 +1,20 @@

+{
+  "citation": "",
+  "description": "",
+  "features": {
+    "paper_arxiv_id": {
+      "dtype": "string",
+      "_type": "Value"
+    },
+    "paper_title": {
+      "dtype": "string",
+      "_type": "Value"
+    },
+    "text": {
+      "dtype": "string",
+      "_type": "Value"
+    }
+  },
+  "homepage": "",
+  "license": ""
+}

train/state.json ADDED Viewed

	@@ -0,0 +1,13 @@

+{
+  "_data_files": [
+    {
+      "filename": "data-00000-of-00001.arrow"
+    }
+  ],
+  "_fingerprint": "879caba3c8a1488d",
+  "_format_columns": null,
+  "_format_kwargs": {},
+  "_format_type": null,
+  "_output_all_columns": false,
+  "_split": null
+}