| # SWE-Pruner ONNX (code-pruner) | |
| ONNX-converted version of [ayanami-kitasan/code-pruner](https://huggingface.co/ayanami-kitasan/code-pruner) for efficient CPU inference. | |
| ## Source | |
| - **Original Model**: [ayanami-kitasan/code-pruner](https://huggingface.co/ayanami-kitasan/code-pruner) (safetensors) | |
| - **Training Code**: [Ayanami1314/swe-pruner](https://github.com/Ayanami1314/swe-pruner) | |
| ## Architecture | |
| - **Backbone**: Qwen/Qwen3-Reranker-0.6B (28 layers, hidden=1024) | |
| - **Multi-layer Fusion**: Early (layer 7) + Middle (layer 14) + Final (layer 28) β fused_hidden=3072 | |
| - **Fusion**: 1-layer MultiheadAttention (8 heads) + LayerNorm | |
| - **Compression Head**: CRF-style (LayerNorm β Linear(3072,256) β GELU β Linear(256,2)) | |
| - **Output**: `token_scores` β sigmoid scores per token (0-1, higher = keep) | |
| ## Files | |
| | File | Description | | |
| |------|-------------| | |
| | `model.onnx` | Quantized ONNX model (uint8, ~607MB) | | |
| | `vocab.json` | BPE vocabulary (Qwen3 tokenizer) | | |
| | `merges.txt` | BPE merge rules | | |
| | `metadata.json` | Model metadata (token IDs, dimensions) | | |
| | `crf_params.npz` | CRF transition parameters (optional, for Viterbi decoding) | | |
| ## Usage | |
| ```python | |
| import onnxruntime as ort | |
| import numpy as np | |
| sess = ort.InferenceSession("model.onnx") | |
| input_ids = np.array([[...]], dtype=np.int64) # [1, seq_len] | |
| attention_mask = np.array([[...]], dtype=np.int64) # [1, seq_len] | |
| scores = sess.run(None, {"input_ids": input_ids, "attention_mask": attention_mask})[0] | |
| # scores: [1, seq_len] float32, 0-1 range, higher = keep | |
| ``` | |
| ## Conversion Details | |
| - Exported with PyTorch 2.8 + transformers 4.57 | |
| - Opset version: 14 | |
| - Dynamic axes: batch and seq_len | |
| - Quantized: dynamic uint8 quantization | |
| - Causal mask patched for ONNX trace compatibility | |