| --- |
| license: apache-2.0 |
| base_model: OpenGVLab/InternVL2-8B |
| tags: |
| - internvl2 |
| - cad |
| - cadquery |
| - code-generation |
| - image-to-text |
| - fine-tuned |
| datasets: |
| - CADCODER/GenCAD-Code |
| language: |
| - ja |
| pipeline_tag: image-text-to-text |
| --- |
| |
| # InternVL2-8B GenCAD-Code Fine-tuned (v2) |
|
|
| An InternVL2-8B model fine-tuned to generate CadQuery Python code from 3D CAD model images. |
|
|
| ## Model Details |
|
|
| - **Base Model**: [OpenGVLab/InternVL2-8B](https://huggingface.co/OpenGVLab/InternVL2-8B) |
| - **Fine-tuning Data**: [CADCODER/GenCAD-Code](https://huggingface.co/datasets/CADCODER/GenCAD-Code) 10,000 samples (extracted from 147K) |
| - **Training**: Full fine-tuning (no LoRA), 3 epochs, best at epoch 2 |
| - **Best Validation Loss**: 0.1487 |
| - **Hardware**: 4x NVIDIA RTX 6000 Ada (48GB each) |
| - **Training Framework**: aiDaptive (Phison) |
|
|
| ## Performance |
|
|
| 50-sample evaluation (stratified sampling from eval.json, unseen during training): |
|
|
| | Metric | Base Model | Fine-tuned (v2) | Improvement | |
| |--------|:----------:|:---------------:|:-----------:| |
| | Average Loss | 1.166 | **0.192** | -83.5% | |
| | Python Syntax Valid | 34/50 | **47/50** | +38% | |
| | CadQuery Execution | 1/50 | **46/50** | +4500% | |
| | Solid Generation | 0/50 | **46/50** | - | |
| | STL Export | 0/50 | **46/50** | - | |
|
|
| ## Training Configuration (v2) |
|
|
| | Parameter | Value | |
| |-----------|-------| |
| | Learning Rate | 5e-6 (Cosine Annealing) | |
| | Effective Batch Size | 32 (1/GPU × 4GPU × grad_accum=8) | |
| | Epochs | 3 | |
| | Max Sequence Length | 4096 | |
| | Weight Decay | 0.05 | |
| | Precision | bf16 | |
| | Early Stopping | patience=2, min_delta=0.005 | |
|
|
| ## Usage |
|
|
| ```python |
| import torch |
| from transformers import AutoTokenizer, AutoModel |
| from PIL import Image |
| |
| model_path = "Nextorage/InternVL2-8B-GenCAD-Code-v2" |
| tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True) |
| model = AutoModel.from_pretrained( |
| model_path, |
| torch_dtype=torch.bfloat16, |
| trust_remote_code=True, |
| ).eval().cuda() |
| |
| # System prompt |
| system_prompt = ( |
| "あなたはCADコード生成アシスタントです。" |
| "3D CADモデルの画像が与えられた場合、そのモデルを再現する " |
| "CadQuery Pythonコードを生成してください。" |
| "説明は不要です。コードのみを出力してください。" |
| ) |
| ``` |
|
|
| ## Limitations |
|
|
| - Max token length: 4096 (very complex models may be truncated) |
| - Trained on GenCAD-Code dataset only (specific CadQuery patterns) |
| - Japanese system prompt used during training |
|
|
| ## Citation |
|
|
| If you use this model, please cite the GenCAD-Code dataset: |
|
|
| ``` |
| @misc{gencad-code, |
| title={GenCAD-Code Dataset}, |
| author={CADCODER}, |
| url={https://huggingface.co/datasets/CADCODER/GenCAD-Code} |
| } |
| ``` |
|
|