orca-zhang commited on
Commit
ee4090c
·
verified ·
1 Parent(s): 3e27108

Upload folder using huggingface_hub

Browse files
Files changed (6) hide show
  1. README.md +48 -0
  2. crf_params.npz +3 -0
  3. merges.txt +0 -0
  4. metadata.json +10 -0
  5. model.onnx +3 -0
  6. vocab.json +0 -0
README.md ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SWE-Pruner ONNX (code-pruner)
2
+
3
+ ONNX-converted version of [ayanami-kitasan/code-pruner](https://huggingface.co/ayanami-kitasan/code-pruner) for efficient CPU inference.
4
+
5
+ ## Source
6
+
7
+ - **Original Model**: [ayanami-kitasan/code-pruner](https://huggingface.co/ayanami-kitasan/code-pruner) (safetensors)
8
+ - **Training Code**: [Ayanami1314/swe-pruner](https://github.com/Ayanami1314/swe-pruner)
9
+
10
+ ## Architecture
11
+
12
+ - **Backbone**: Qwen/Qwen3-Reranker-0.6B (28 layers, hidden=1024)
13
+ - **Multi-layer Fusion**: Early (layer 7) + Middle (layer 14) + Final (layer 28) → fused_hidden=3072
14
+ - **Fusion**: 1-layer MultiheadAttention (8 heads) + LayerNorm
15
+ - **Compression Head**: CRF-style (LayerNorm → Linear(3072,256) → GELU → Linear(256,2))
16
+ - **Output**: `token_scores` — sigmoid scores per token (0-1, higher = keep)
17
+
18
+ ## Files
19
+
20
+ | File | Description |
21
+ |------|-------------|
22
+ | `model.onnx` | Quantized ONNX model (uint8, ~607MB) |
23
+ | `vocab.json` | BPE vocabulary (Qwen3 tokenizer) |
24
+ | `merges.txt` | BPE merge rules |
25
+ | `metadata.json` | Model metadata (token IDs, dimensions) |
26
+ | `crf_params.npz` | CRF transition parameters (optional, for Viterbi decoding) |
27
+
28
+ ## Usage
29
+
30
+ ```python
31
+ import onnxruntime as ort
32
+ import numpy as np
33
+
34
+ sess = ort.InferenceSession("model.onnx")
35
+ input_ids = np.array([[...]], dtype=np.int64) # [1, seq_len]
36
+ attention_mask = np.array([[...]], dtype=np.int64) # [1, seq_len]
37
+
38
+ scores = sess.run(None, {"input_ids": input_ids, "attention_mask": attention_mask})[0]
39
+ # scores: [1, seq_len] float32, 0-1 range, higher = keep
40
+ ```
41
+
42
+ ## Conversion Details
43
+
44
+ - Exported with PyTorch 2.8 + transformers 4.57
45
+ - Opset version: 14
46
+ - Dynamic axes: batch and seq_len
47
+ - Quantized: dynamic uint8 quantization
48
+ - Causal mask patched for ONNX trace compatibility
crf_params.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:25d8a7c8c5b25418750e99d59497b81eb758fc9a6ca54af631d9f9b384bfb0bc
3
+ size 836
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
metadata.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "swepruner",
3
+ "backbone": "Qwen/Qwen3-Reranker-0.6B",
4
+ "hidden_size": 1024,
5
+ "fused_hidden_size": 3072,
6
+ "compression_head_type": "crf",
7
+ "token_yes_id": 9693,
8
+ "token_no_id": 2152,
9
+ "output": "token_scores (sigmoid, 0-1, higher=keep)"
10
+ }
model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1f81b0977a8e96350271637b825a7a99b5be74d82b528977f9a034b11752734f
3
+ size 636820889
vocab.json ADDED
The diff for this file is too large to render. See raw diff