fouryou-taka's picture
Add InternVL2-8B GenCAD-Code v2 fine-tuned model
13586f8
---
license: apache-2.0
base_model: OpenGVLab/InternVL2-8B
tags:
- internvl2
- cad
- cadquery
- code-generation
- image-to-text
- fine-tuned
datasets:
- CADCODER/GenCAD-Code
language:
- ja
pipeline_tag: image-text-to-text
---
# InternVL2-8B GenCAD-Code Fine-tuned (v2)
An InternVL2-8B model fine-tuned to generate CadQuery Python code from 3D CAD model images.
## Model Details
- **Base Model**: [OpenGVLab/InternVL2-8B](https://huggingface.co/OpenGVLab/InternVL2-8B)
- **Fine-tuning Data**: [CADCODER/GenCAD-Code](https://huggingface.co/datasets/CADCODER/GenCAD-Code) 10,000 samples (extracted from 147K)
- **Training**: Full fine-tuning (no LoRA), 3 epochs, best at epoch 2
- **Best Validation Loss**: 0.1487
- **Hardware**: 4x NVIDIA RTX 6000 Ada (48GB each)
- **Training Framework**: aiDaptive (Phison)
## Performance
50-sample evaluation (stratified sampling from eval.json, unseen during training):
| Metric | Base Model | Fine-tuned (v2) | Improvement |
|--------|:----------:|:---------------:|:-----------:|
| Average Loss | 1.166 | **0.192** | -83.5% |
| Python Syntax Valid | 34/50 | **47/50** | +38% |
| CadQuery Execution | 1/50 | **46/50** | +4500% |
| Solid Generation | 0/50 | **46/50** | - |
| STL Export | 0/50 | **46/50** | - |
## Training Configuration (v2)
| Parameter | Value |
|-----------|-------|
| Learning Rate | 5e-6 (Cosine Annealing) |
| Effective Batch Size | 32 (1/GPU × 4GPU × grad_accum=8) |
| Epochs | 3 |
| Max Sequence Length | 4096 |
| Weight Decay | 0.05 |
| Precision | bf16 |
| Early Stopping | patience=2, min_delta=0.005 |
## Usage
```python
import torch
from transformers import AutoTokenizer, AutoModel
from PIL import Image
model_path = "Nextorage/InternVL2-8B-GenCAD-Code-v2"
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModel.from_pretrained(
model_path,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
).eval().cuda()
# System prompt
system_prompt = (
"あなたはCADコード生成アシスタントです。"
"3D CADモデルの画像が与えられた場合、そのモデルを再現する "
"CadQuery Pythonコードを生成してください。"
"説明は不要です。コードのみを出力してください。"
)
```
## Limitations
- Max token length: 4096 (very complex models may be truncated)
- Trained on GenCAD-Code dataset only (specific CadQuery patterns)
- Japanese system prompt used during training
## Citation
If you use this model, please cite the GenCAD-Code dataset:
```
@misc{gencad-code,
title={GenCAD-Code Dataset},
author={CADCODER},
url={https://huggingface.co/datasets/CADCODER/GenCAD-Code}
}
```