Upload folder using huggingface_hub
Browse files- README.md +48 -0
- crf_params.npz +3 -0
- merges.txt +0 -0
- metadata.json +10 -0
- model.onnx +3 -0
- vocab.json +0 -0
README.md
ADDED
|
@@ -0,0 +1,48 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# SWE-Pruner ONNX (code-pruner)
|
| 2 |
+
|
| 3 |
+
ONNX-converted version of [ayanami-kitasan/code-pruner](https://huggingface.co/ayanami-kitasan/code-pruner) for efficient CPU inference.
|
| 4 |
+
|
| 5 |
+
## Source
|
| 6 |
+
|
| 7 |
+
- **Original Model**: [ayanami-kitasan/code-pruner](https://huggingface.co/ayanami-kitasan/code-pruner) (safetensors)
|
| 8 |
+
- **Training Code**: [Ayanami1314/swe-pruner](https://github.com/Ayanami1314/swe-pruner)
|
| 9 |
+
|
| 10 |
+
## Architecture
|
| 11 |
+
|
| 12 |
+
- **Backbone**: Qwen/Qwen3-Reranker-0.6B (28 layers, hidden=1024)
|
| 13 |
+
- **Multi-layer Fusion**: Early (layer 7) + Middle (layer 14) + Final (layer 28) → fused_hidden=3072
|
| 14 |
+
- **Fusion**: 1-layer MultiheadAttention (8 heads) + LayerNorm
|
| 15 |
+
- **Compression Head**: CRF-style (LayerNorm → Linear(3072,256) → GELU → Linear(256,2))
|
| 16 |
+
- **Output**: `token_scores` — sigmoid scores per token (0-1, higher = keep)
|
| 17 |
+
|
| 18 |
+
## Files
|
| 19 |
+
|
| 20 |
+
| File | Description |
|
| 21 |
+
|------|-------------|
|
| 22 |
+
| `model.onnx` | Quantized ONNX model (uint8, ~607MB) |
|
| 23 |
+
| `vocab.json` | BPE vocabulary (Qwen3 tokenizer) |
|
| 24 |
+
| `merges.txt` | BPE merge rules |
|
| 25 |
+
| `metadata.json` | Model metadata (token IDs, dimensions) |
|
| 26 |
+
| `crf_params.npz` | CRF transition parameters (optional, for Viterbi decoding) |
|
| 27 |
+
|
| 28 |
+
## Usage
|
| 29 |
+
|
| 30 |
+
```python
|
| 31 |
+
import onnxruntime as ort
|
| 32 |
+
import numpy as np
|
| 33 |
+
|
| 34 |
+
sess = ort.InferenceSession("model.onnx")
|
| 35 |
+
input_ids = np.array([[...]], dtype=np.int64) # [1, seq_len]
|
| 36 |
+
attention_mask = np.array([[...]], dtype=np.int64) # [1, seq_len]
|
| 37 |
+
|
| 38 |
+
scores = sess.run(None, {"input_ids": input_ids, "attention_mask": attention_mask})[0]
|
| 39 |
+
# scores: [1, seq_len] float32, 0-1 range, higher = keep
|
| 40 |
+
```
|
| 41 |
+
|
| 42 |
+
## Conversion Details
|
| 43 |
+
|
| 44 |
+
- Exported with PyTorch 2.8 + transformers 4.57
|
| 45 |
+
- Opset version: 14
|
| 46 |
+
- Dynamic axes: batch and seq_len
|
| 47 |
+
- Quantized: dynamic uint8 quantization
|
| 48 |
+
- Causal mask patched for ONNX trace compatibility
|
crf_params.npz
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:25d8a7c8c5b25418750e99d59497b81eb758fc9a6ca54af631d9f9b384bfb0bc
|
| 3 |
+
size 836
|
merges.txt
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
metadata.json
ADDED
|
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"model_type": "swepruner",
|
| 3 |
+
"backbone": "Qwen/Qwen3-Reranker-0.6B",
|
| 4 |
+
"hidden_size": 1024,
|
| 5 |
+
"fused_hidden_size": 3072,
|
| 6 |
+
"compression_head_type": "crf",
|
| 7 |
+
"token_yes_id": 9693,
|
| 8 |
+
"token_no_id": 2152,
|
| 9 |
+
"output": "token_scores (sigmoid, 0-1, higher=keep)"
|
| 10 |
+
}
|
model.onnx
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:1f81b0977a8e96350271637b825a7a99b5be74d82b528977f9a034b11752734f
|
| 3 |
+
size 636820889
|
vocab.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|