episodic-ingestion-compiler β€” Rust integration bundle

Rust-native packaging of the ingestion-model-spec-v1 encoder (ModernBERT-large + 5 heads) for consumers using candle-transformers. Zero Python runtime needed.

Full training checkpoint (best.pt + calibrator.joblib) lives at Avifenesh/episodic-ingestion-compiler-modernbert-large-span-5000.

Headline metrics (13,786-row held-out test split)

Metric Value
schema_validity 1.000
abstention_accuracy 0.976
claim_present F1 0.962
false_emission_rate 0.017
span_token_IoU 0.969
predicate_accuracy_macro 0.966
subject_type_accuracy_macro 0.997

Contents

model.safetensors     β€” ModernBERT backbone + 5 heads, bf16, 790 MB
config.json           β€” arch + trained args + head schema
tokenizer.json        β€” HF tokenizers-crate compatible
spec.json             β€” spec-v1 contract (predicates, subject_types,
                        object_value shapes, literal fields)
calibrator.json       β€” isotonic regression as piecewise-linear points
tensor_index.json     β€” per-tensor shape/dtype (debug aid)

Backbone keys are remapped to backbone.model.* so candle_transformers::models::modernbert::ModernBert::load(vb.pp("backbone"), ...) resolves without any glue code.

Loading in Rust

[dependencies]
candle-core = "0.9"
candle-nn = "0.9"
candle-transformers = "0.9"
tokenizers = { version = "0.22", default-features = false, features = ["onig"] }
safetensors = "0.4"

Working crate + example at https://github.com/avifenesh/episodic-ingestion-compiler/tree/main/rust/ingestion-model

use candle_core::Device;
use ingestion_model::{Bundle, IngestionEncoder, Role};

let bundle = Bundle::load("ingestion_model_v1")?;
let device = Device::cuda_if_available(0).unwrap_or(Device::Cpu);
let encoder = IngestionEncoder::load(&bundle, device)?;

let raw = encoder.predict(
    Role::Assistant,
    "I edited crates/foo/src/lib.rs and ran cargo test which passed.",
)?;
// raw = claim_prob, predicate, class_hint, subject_type,
//       confidence_raw, span_char (in original blurb), span_tok
let calibrated = bundle.calibrator.as_ref()
    .map(|c| c.apply(raw.confidence_raw))
    .unwrap_or(raw.confidence_raw);

On CPU (single-core gemm), warm-start is ~100 ms/req with bf16 weights automatically upcast to fp32 (candle's CPU matmul doesn't support bf16). Batched / CUDA paths exposed via predict_batch and --features cuda.

Spec-v1 contract

Closed vocabularies are in spec.json:

  • 21 predicates β€” Class S (declarative) + Class E (evidence) + reserved (reverted_file, deleted_file, created_file, committed, deployed, incident_observed)
  • 13 subject_types β€” objective, command, file, pr, incident, policy, person, repo, team, service, document, ticket, thread
  • 2 deferred predicates that the model MUST NEVER emit β€” has_current_input, has_phase
  • object_value shapes per predicate family, for the wrapper

The model emits the information-bearing subset; the caller's deterministic wrapper fills in literal fields (status="candidate", source_authority="model_draft", extraction_method="ingestion_model_v1"), per-predicate object_value dicts, subject_id extraction from the span, and spec validation (silent-drop on failure).

Python reference implementation: scripts/ingestion_wrapper.py. Rust crate ships only the encoder for now; the wrapper lives in the caller's crate so it can evolve independently of the model.

Calibration

calibrator.json is a piecewise-linear isotonic map. Raw confidence ECE 0.340 β†’ post-calibration ECE 0.000 on the 2879-row cal split. Apply with (c - x[i]) / (x[i+1] - x[i]) linear interp between the knots, clip to endpoints out of range.

License / lineage

ModernBERT-large is Apache-2.0. The 5 adapter heads and calibrator are derivatives; training code is at https://github.com/avifenesh/episodic-ingestion-compiler.

Downloads last month
55
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Avifenesh/episodic-ingestion-compiler-rust-bundle

Quantized
(10)
this model