Text Generation
PEFT
Safetensors
English
code
gis
geospatial
geopandas
shapely
rasterio
osmnx
folium
lora
trl
sft
conversational
Instructions to use RhodWeo/GIS-Coder-7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use RhodWeo/GIS-Coder-7B with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-0.5B-Instruct") model = PeftModel.from_pretrained(base_model, "RhodWeo/GIS-Coder-7B") - Notebooks
- Google Colab
- Kaggle
metadata
license: apache-2.0
base_model: Qwen/Qwen2.5-Coder-0.5B-Instruct
tags:
- code
- gis
- geospatial
- geopandas
- shapely
- rasterio
- osmnx
- folium
- peft
- lora
- trl
- sft
language:
- en
pipeline_tag: text-generation
library_name: peft
GIS-Coder β A Code Model for Geographic Information Systems
A LoRA-adapted code model specialized for GIS and geospatial Python programming. Includes a ready-to-run training package for scaling up to 7B on your own GPU cluster.
π¦ This Repo Contains
| File | Description |
|---|---|
adapter_model.safetensors |
Trained LoRA adapter (0.5B base, proof of concept) |
train_7b.py |
Production 7B QLoRA training script with CLI args |
evaluate.py |
Evaluation suite (12 GIS benchmarks with scoring) |
requirements.txt |
All dependencies |
TRAINING_README.md |
Detailed training guide β hardware, hyperparameters, ablations |
π Train the 7B Model on Your GPUs
# 1. Clone this repo
git clone https://huggingface.co/RhodWeo/GIS-Coder-7B
cd GIS-Coder-7B
# 2. Install deps
pip install -r requirements.txt
# 3. Login
huggingface-cli login
# 4. Train! (A100 80GB recommended)
python train_7b.py
# For A10G/RTX 4090 (24GB):
python train_7b.py --batch_size 1 --grad_accum 16 --max_length 2048
# For H100:
python train_7b.py --batch_size 4 --grad_accum 4 --max_length 8192
# 5. Evaluate
python evaluate.py --adapter_id ./gis-coder-7b-output/final --compare_base
See TRAINING_README.md for the full guide with hardware-specific settings, ablation ideas, and expected results.
πΊοΈ GIS Libraries Covered (13)
| Priority | Libraries | Coverage |
|---|---|---|
| Tier 1 (0% baseline) | OSMnx, MovingPandas, Rasterio, GDAL, PyProj | Heavy β these are where models fail |
| Tier 2 | GeoPandas, Shapely, H3 | Core GIS operations |
| Tier 3 | Folium, xarray, PyQGIS, Fiona, PySAL | Real-world workflows |
π Proof-of-Concept Results (0.5B)
Trained on CPU with the smaller base model to validate the approach:
| Metric | Start β End |
|---|---|
| Loss | 1.52 β 0.88 (β42%) |
| Token Accuracy | 69.3% β 79.3% (+10pp) |
| Eval Quality | 85% (code + library + CoT + function) |
π¬ Training Recipe
Based on published research:
| Principle | Source | Applied |
|---|---|---|
| QLoRA SFT beats 72B models | CFD paper | r=32, all-linear, lr=2e-4 |
| Qwen2.5-Coder best backbone | MapCoder-Lite | Base model selection |
| Models score 0% on GIS | GIS Benchmark | Heavy OSMnx/MovingPandas coverage |
| CoT boosts +20.9% pass@1 | CFD paper ablation | All examples include CoT |
| Target all linear layers | LoRA Without Regret | target_modules="all-linear" |
π Dataset
RhodWeo/gis-code-instructions β 70 expert-curated examples with Chain-of-Thought annotations.
License
Apache 2.0