| # judgment_partition_infer |
|
|
| Standalone inference bundle for partitioning a Chinese judgment document (or a truncated excerpt) into 7 zones (Z1..Z7) by predicting 6 boundaries. |
|
|
| ## Install |
|
|
| ```bash |
| pip install -r requirements.txt |
| ``` |
|
|
| If you want to download this bundle from Hugging Face Hub: |
| ```bash |
| pip install huggingface_hub |
| python - <<'PY' |
| from huggingface_hub import snapshot_download |
| snapshot_download( |
| repo_id="<USER_OR_ORG>/<REPO_NAME>", |
| repo_type="model", |
| local_dir="judgment_partition_infer_bundle", |
| local_dir_use_symlinks=False, |
| ) |
| print("Downloaded -> judgment_partition_infer_bundle") |
| PY |
| ``` |
|
|
| ## Input format (JSONL) |
|
|
| One JSON object per line. Required field: `text` (or `full_text`). |
|
|
| Example: |
| ```json |
| {"sample_id":"demo_1","text":"...全文..."} |
| ``` |
|
|
| Optional fields `case_no` and `case_name` are passed through to outputs. |
|
|
| ## Run inference (CLI) |
|
|
| From this folder: |
| ```bash |
| python infer_cli.py --input examples/input.jsonl |
| ``` |
|
|
| Outputs are written under `output/<YYYYMMDD_HHMMSS>/` by default: |
| - `predictions.jsonl` |
| - `run_meta.json` |
|
|
| You can also write to an explicit path: |
| ```bash |
| python infer_cli.py \ |
| --input examples/input.jsonl \ |
| --output examples/output.example.jsonl \ |
| --device cpu |
| ``` |
|
|
| ## Anchor behavior |
|
|
| Default: `--anchor auto` |
| - If anchors are detected, enforce: |
| - boundary[0] = Z1 anchor ("号" within first 100 chars) |
| - boundary[3] = Z4 anchor ("判决如下"/"如下判决") |
| - If anchors are missing/invalid, keep model boundaries and set `anchor_status` accordingly. |
|
|
| ## Python API |
|
|
| ```python |
| from judgment_partition_infer import Predictor |
| |
| pred = Predictor() # loads ./assets/best_model.pt + ./assets/vocab.json (or env override) |
| out = pred.predict_text("...全文/片段...") |
| print(out["boundaries"]) |
| ``` |
| Note: for Hub compatibility, the model may be stored as |
| `assets/best_model.pt.b64.part-*` (text shards). |
| `Predictor()` will automatically reassemble/decode and load the model. |
|
|
| If you `pip install` only the code (without assets), pass explicit paths: |
| ```python |
| from judgment_partition_infer import Predictor |
| pred = Predictor( |
| model_path="path/to/best_model.pt", |
| vocab_path="path/to/vocab.json", |
| device="cpu", |
| ) |
| ``` |
|
|
| ## Publish to Hugging Face Hub (maintainers) |
|
|
| 1) Install publishing dependency: |
| ```bash |
| pip install -r requirements-publish.txt |
| ``` |
|
|
| 2) Login (recommended) or set env token: |
| - `huggingface-cli login` |
| - or `export HF_TOKEN=...` / `export HUGGINGFACE_HUB_TOKEN=...` |
|
|
| 3) Create + upload (model repo): |
| ```bash |
| python publish_to_hf.py --repo-id <USER_OR_ORG>/<REPO_NAME> --public |
| ``` |
| If you already created the repo on the website: |
| ```bash |
| python publish_to_hf.py --repo-id <USER_OR_ORG>/<REPO_NAME> --skip-create |
| ``` |
|
|
| ### If HTTPS to huggingface.co is blocked |
|
|
| You can push via SSH (host `hf.co`) instead: |
| ```bash |
| chmod +x push_to_hf_ssh.sh |
| ./push_to_hf_ssh.sh <USER_OR_ORG>/<REPO_NAME> |
| ``` |
| (The script prints an SSH public key; add it at https://huggingface.co/settings/keys, then rerun.) |
|
|
| # judgment_partition_infer |
|
|
| 这是一个独立的推理工具包,用于通过预测 6 个边界位置,将中文裁判文书全文(或截断的片段)自动切分为 7 个固定结构分区(Z1~Z7)。 |
|
|
| ## 安装依赖 |
|
|
| ```bash |
| pip install -r requirements.txt |
| ``` |
|
|
| 如果你希望从 Hugging Face Hub 下载这个推理包(包含权重与词表): |
|
|
| ```bash |
| pip install huggingface_hub |
| python - <<'PY' |
| from huggingface_hub import snapshot_download |
| snapshot_download( |
| repo_id="<USER_OR_ORG>/<REPO_NAME>", |
| repo_type="model", |
| local_dir="judgment_partition_infer_bundle", |
| local_dir_use_symlinks=False, |
| ) |
| print("Downloaded -> judgment_partition_infer_bundle") |
| PY |
| ``` |
|
|
| ## 输入格式 (JSONL) |
|
|
| 输入文件必须为 JSONL 格式(每行一个独立的 JSON 对象)。 |
| **必填字段**:`text`(系统也兼容读取 `full_text` 字段)。 |
|
|
| **数据示例:** |
|
|
| ```json |
| {"sample_id":"demo_1","text":"...全文..."} |
| ``` |
|
|
| > **注**:可选的元数据字段如 `case_no`(案号)和 `case_name`(案名)在处理过程中不会被修改,并会原样透传到输出结果中。 |
|
|
| ## 运行推理 (命令行方式) |
|
|
| 请在当前工具包根目录下执行以下命令: |
|
|
| ```bash |
| python infer_cli.py --input examples/input.jsonl |
| |
| ``` |
|
|
| 默认情况下,推理结果会保存在按时间戳自动生成的 `output/<YYYYMMDD_HHMMSS>/` 目录下,包含以下两个文件: |
|
|
| * `predictions.jsonl`:包含边界坐标、各个分区文本等最终预测结果。 |
| * `run_meta.json`:本次推理任务的运行元数据及统计信息。 |
|
|
| 你也可以直接输出到一个固定文件路径(便于对接其他系统或做示例): |
|
|
| ```bash |
| python infer_cli.py \ |
| --input examples/input.jsonl \ |
| --output examples/output.example.jsonl \ |
| --device cpu |
| ``` |
|
|
| ## 锚点规则 (Anchor Behavior) |
|
|
| 系统默认启用自动锚点策略:`--anchor auto` |
|
|
| * **当检测到业务锚点时,强制执行以下约束:** |
| * `boundary[0]`(第 1 条边界)强制对齐至 **Z1 锚点**(即正文前 100 个字符内出现的最后一个“号”字)。 |
| * `boundary[3]`(第 4 条边界)强制对齐至 **Z4 锚点**(匹配“判决如下”或“如下判决”)。 |
|
|
|
|
| * **当锚点缺失或无效时:** |
| * 系统将直接保留模型预测的原始句子边界,并在输出结果中相应地更新 `anchor_status` 字段(标明锚点缺失)。 |
|
|
|
|
|
|
| ## Python API 调用 (代码内嵌方式) |
|
|
| 如果你希望在自己的 Python 代码中直接调用该模型,可以使用以下接口: |
|
|
| ```python |
| from judgment_partition_infer import Predictor |
| |
| # 初始化预测器(默认加载 ./assets/best_model.pt 和 ./assets/vocab.json,或用环境变量覆盖) |
| pred = Predictor() |
| |
| # 传入文书全文或片段进行推理 |
| out = pred.predict_text("...全文/片段...") |
| |
| # 打印预测出的 6 个边界位置 |
| print(out["boundaries"]) |
| ``` |
| 说明:为了兼容 Hub 的文件大小与二进制限制,模型可能以 |
| `assets/best_model.pt.b64.part-*` 文本分片形式存储; |
| `Predictor()` 会自动拼接、解码并加载,不需要手动处理。 |
|
|
| 如果你只安装了代码(没有把 assets 一起下载到本地),请显式传入路径: |
|
|
| ```python |
| from judgment_partition_infer import Predictor |
| pred = Predictor( |
| model_path="path/to/best_model.pt", |
| vocab_path="path/to/vocab.json", |
| device="cpu", |
| ) |
| ``` |
|
|
| ## 发布到 Hugging Face Hub(维护者用) |
|
|
| 1) 安装发布依赖: |
| ```bash |
| pip install -r requirements-publish.txt |
| ``` |
|
|
| 2) 登录(推荐)或通过环境变量提供 token: |
| - `huggingface-cli login` |
| - 或 `export HF_TOKEN=...` / `export HUGGINGFACE_HUB_TOKEN=...` |
|
|
| 3) 创建并上传(Model Repo): |
| ```bash |
| python publish_to_hf.py --repo-id <USER_OR_ORG>/<REPO_NAME> --public |
| ``` |
| 如果你已经在网页端创建了仓库: |
| ```bash |
| python publish_to_hf.py --repo-id <USER_OR_ORG>/<REPO_NAME> --skip-create |
| ``` |
|
|
| ### 如果当前网络无法访问 huggingface.co |
|
|
| 可以改用 SSH(host 为 `hf.co`)推送: |
| ```bash |
| chmod +x push_to_hf_ssh.sh |
| ./push_to_hf_ssh.sh <USER_OR_ORG>/<REPO_NAME> |
| ``` |
| (脚本会打印 SSH 公钥;请复制到 https://huggingface.co/settings/keys,然后再运行一次脚本。) |
|
|