File size: 7,164 Bytes
0748838 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 | # judgment_partition_infer
Standalone inference bundle for partitioning a Chinese judgment document (or a truncated excerpt) into 7 zones (Z1..Z7) by predicting 6 boundaries.
## Install
```bash
pip install -r requirements.txt
```
If you want to download this bundle from Hugging Face Hub:
```bash
pip install huggingface_hub
python - <<'PY'
from huggingface_hub import snapshot_download
snapshot_download(
repo_id="<USER_OR_ORG>/<REPO_NAME>",
repo_type="model",
local_dir="judgment_partition_infer_bundle",
local_dir_use_symlinks=False,
)
print("Downloaded -> judgment_partition_infer_bundle")
PY
```
## Input format (JSONL)
One JSON object per line. Required field: `text` (or `full_text`).
Example:
```json
{"sample_id":"demo_1","text":"...全文..."}
```
Optional fields `case_no` and `case_name` are passed through to outputs.
## Run inference (CLI)
From this folder:
```bash
python infer_cli.py --input examples/input.jsonl
```
Outputs are written under `output/<YYYYMMDD_HHMMSS>/` by default:
- `predictions.jsonl`
- `run_meta.json`
You can also write to an explicit path:
```bash
python infer_cli.py \
--input examples/input.jsonl \
--output examples/output.example.jsonl \
--device cpu
```
## Anchor behavior
Default: `--anchor auto`
- If anchors are detected, enforce:
- boundary[0] = Z1 anchor ("号" within first 100 chars)
- boundary[3] = Z4 anchor ("判决如下"/"如下判决")
- If anchors are missing/invalid, keep model boundaries and set `anchor_status` accordingly.
## Python API
```python
from judgment_partition_infer import Predictor
pred = Predictor() # loads ./assets/best_model.pt + ./assets/vocab.json (or env override)
out = pred.predict_text("...全文/片段...")
print(out["boundaries"])
```
Note: for Hub compatibility, the model may be stored as
`assets/best_model.pt.b64.part-*` (text shards).
`Predictor()` will automatically reassemble/decode and load the model.
If you `pip install` only the code (without assets), pass explicit paths:
```python
from judgment_partition_infer import Predictor
pred = Predictor(
model_path="path/to/best_model.pt",
vocab_path="path/to/vocab.json",
device="cpu",
)
```
## Publish to Hugging Face Hub (maintainers)
1) Install publishing dependency:
```bash
pip install -r requirements-publish.txt
```
2) Login (recommended) or set env token:
- `huggingface-cli login`
- or `export HF_TOKEN=...` / `export HUGGINGFACE_HUB_TOKEN=...`
3) Create + upload (model repo):
```bash
python publish_to_hf.py --repo-id <USER_OR_ORG>/<REPO_NAME> --public
```
If you already created the repo on the website:
```bash
python publish_to_hf.py --repo-id <USER_OR_ORG>/<REPO_NAME> --skip-create
```
### If HTTPS to huggingface.co is blocked
You can push via SSH (host `hf.co`) instead:
```bash
chmod +x push_to_hf_ssh.sh
./push_to_hf_ssh.sh <USER_OR_ORG>/<REPO_NAME>
```
(The script prints an SSH public key; add it at https://huggingface.co/settings/keys, then rerun.)
# judgment_partition_infer
这是一个独立的推理工具包,用于通过预测 6 个边界位置,将中文裁判文书全文(或截断的片段)自动切分为 7 个固定结构分区(Z1~Z7)。
## 安装依赖
```bash
pip install -r requirements.txt
```
如果你希望从 Hugging Face Hub 下载这个推理包(包含权重与词表):
```bash
pip install huggingface_hub
python - <<'PY'
from huggingface_hub import snapshot_download
snapshot_download(
repo_id="<USER_OR_ORG>/<REPO_NAME>",
repo_type="model",
local_dir="judgment_partition_infer_bundle",
local_dir_use_symlinks=False,
)
print("Downloaded -> judgment_partition_infer_bundle")
PY
```
## 输入格式 (JSONL)
输入文件必须为 JSONL 格式(每行一个独立的 JSON 对象)。
**必填字段**:`text`(系统也兼容读取 `full_text` 字段)。
**数据示例:**
```json
{"sample_id":"demo_1","text":"...全文..."}
```
> **注**:可选的元数据字段如 `case_no`(案号)和 `case_name`(案名)在处理过程中不会被修改,并会原样透传到输出结果中。
## 运行推理 (命令行方式)
请在当前工具包根目录下执行以下命令:
```bash
python infer_cli.py --input examples/input.jsonl
```
默认情况下,推理结果会保存在按时间戳自动生成的 `output/<YYYYMMDD_HHMMSS>/` 目录下,包含以下两个文件:
* `predictions.jsonl`:包含边界坐标、各个分区文本等最终预测结果。
* `run_meta.json`:本次推理任务的运行元数据及统计信息。
你也可以直接输出到一个固定文件路径(便于对接其他系统或做示例):
```bash
python infer_cli.py \
--input examples/input.jsonl \
--output examples/output.example.jsonl \
--device cpu
```
## 锚点规则 (Anchor Behavior)
系统默认启用自动锚点策略:`--anchor auto`
* **当检测到业务锚点时,强制执行以下约束:**
* `boundary[0]`(第 1 条边界)强制对齐至 **Z1 锚点**(即正文前 100 个字符内出现的最后一个“号”字)。
* `boundary[3]`(第 4 条边界)强制对齐至 **Z4 锚点**(匹配“判决如下”或“如下判决”)。
* **当锚点缺失或无效时:**
* 系统将直接保留模型预测的原始句子边界,并在输出结果中相应地更新 `anchor_status` 字段(标明锚点缺失)。
## Python API 调用 (代码内嵌方式)
如果你希望在自己的 Python 代码中直接调用该模型,可以使用以下接口:
```python
from judgment_partition_infer import Predictor
# 初始化预测器(默认加载 ./assets/best_model.pt 和 ./assets/vocab.json,或用环境变量覆盖)
pred = Predictor()
# 传入文书全文或片段进行推理
out = pred.predict_text("...全文/片段...")
# 打印预测出的 6 个边界位置
print(out["boundaries"])
```
说明:为了兼容 Hub 的文件大小与二进制限制,模型可能以
`assets/best_model.pt.b64.part-*` 文本分片形式存储;
`Predictor()` 会自动拼接、解码并加载,不需要手动处理。
如果你只安装了代码(没有把 assets 一起下载到本地),请显式传入路径:
```python
from judgment_partition_infer import Predictor
pred = Predictor(
model_path="path/to/best_model.pt",
vocab_path="path/to/vocab.json",
device="cpu",
)
```
## 发布到 Hugging Face Hub(维护者用)
1) 安装发布依赖:
```bash
pip install -r requirements-publish.txt
```
2) 登录(推荐)或通过环境变量提供 token:
- `huggingface-cli login`
- 或 `export HF_TOKEN=...` / `export HUGGINGFACE_HUB_TOKEN=...`
3) 创建并上传(Model Repo):
```bash
python publish_to_hf.py --repo-id <USER_OR_ORG>/<REPO_NAME> --public
```
如果你已经在网页端创建了仓库:
```bash
python publish_to_hf.py --repo-id <USER_OR_ORG>/<REPO_NAME> --skip-create
```
### 如果当前网络无法访问 huggingface.co
可以改用 SSH(host 为 `hf.co`)推送:
```bash
chmod +x push_to_hf_ssh.sh
./push_to_hf_ssh.sh <USER_OR_ORG>/<REPO_NAME>
```
(脚本会打印 SSH 公钥;请复制到 https://huggingface.co/settings/keys,然后再运行一次脚本。)
|