Onca logo

ONCA 1.5

Open pancreatic cancer language model for four research-oriented workflows: trial screening, clinical reasoning, pathology extraction, and variant evidence interpretation.

ONCA 1.5 is the current continued-SFT release in the ONCA model line. It builds on Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled and follows the project direction established in ONCA 1.0: focus on pancreatic cancer workflows, open-data training assets, and practical structured prompting for clinical research use.

For ONCA 1.5, the training emphasis was to improve all four existing task families at once, with particular attention to parser-safe trial screening, preserved pathology extraction performance, concise clinical reasoning, and stronger variant evidence handling.

At a Glance

Field Value
Release BF16 reference release
Base model Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled
Architecture Qwen3.5-class causal LM (Qwen3_5ForCausalLM)
Context window 262,144 tokens
Training recipe Continued SFT
Domain focus Pancreatic cancer and oncology-adjacent research workflows

What This Release Is Good At

  • Criterion-aware pancreatic cancer trial screening with explicit eligibility framing.
  • Concise oncology reasoning and question answering for research workflows.
  • Structured pathology abstraction with field-oriented prompting.
  • Variant evidence interpretation with oncology context and uncertainty signaling.

What It Is Specialized For

This model is specialized for pancreatic cancer and oncology-adjacent research workflows rather than broad general-purpose medical chat. It works best when the task is tightly scoped and the target output format is explicit, especially for:

  • pancreatic cancer trial eligibility review
  • pathology report abstraction into structured fields
  • concise oncology reasoning for case discussion
  • variant evidence interpretation with uncertainty signaling

Example: pancreatic cancer trial-screening workflow

prompt = """
Task: Pancreatic cancer trial screening.

Patient summary:
- 63-year-old with metastatic pancreatic ductal adenocarcinoma
- ECOG 1
- Prior gemcitabine plus nab-paclitaxel
- Bilirubin 0.9 mg/dL
- No active infection

Trial criteria:
- Histologically confirmed metastatic pancreatic adenocarcinoma
- ECOG 0-1
- Progression after 1 prior systemic regimen
- Adequate marrow and hepatic function
- Exclude uncontrolled infection

Return:
1. Eligibility label
2. Criterion-by-criterion reasoning
3. Missing information
"""

messages = [
    {"role": "system", "content": "You are Onca, a pancreatic cancer clinical research assistant."},
    {"role": "user", "content": prompt},
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    do_sample=False,
    temperature=0.2,
)
answer = outputs[0][inputs["input_ids"].shape[1]:]
print(tokenizer.decode(answer, skip_special_tokens=True))

Release Note

This is the main reference checkpoint in the ONCA 1.5 family and the best starting point if you want the least altered release.

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Joesh1/onca-1.5"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True,
)

Use the included tokenizer and chat template for best results:

messages = [
    {"role": "system", "content": "You are Onca, a pancreatic cancer clinical research assistant."},
    {"role": "user", "content": "Extract tumor grade, margin status, pT, and pN from this pathology report as JSON."},
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    do_sample=False,
    temperature=0.2,
)
answer = outputs[0][inputs["input_ids"].shape[1]:]
print(tokenizer.decode(answer, skip_special_tokens=True))

Prompting Tips

  • Use the included chat template and ask for a specific output structure.
  • For extraction tasks, request exact JSON keys or field names.
  • For screening tasks, include both the patient summary and the trial criteria.
  • Ask the model to state uncertainty and missing information explicitly.

Training Scope

The active ONCA 1.5 continued-SFT stack uses openly available data only. The data mixture keeps all four task families in play, with weighting geared toward variant evidence repair and trial-screening stability while guarding against pathology regression.

Task family Active rows Train Val Test
Trial Screening 12,137 10,921 608 608
Clinical Reasoning 3,496 3,146 174 176
Pathology Extraction 7,642 6,583 414 405
Variant Evidence 2,432 2,191 116 125
Total 25,707 22,841 1,312 1,314

Initial task weights in the active prepare stack are 27% trial screening, 18% clinical reasoning, 27% pathology extraction, and 28% variant evidence.

Benchmarks

Benchmark tables and comparative evaluation plots will be added later.

Repository Contents

  • model-*.safetensors: sharded weights for this release.
  • model.safetensors.index.json: shard map for the checkpoint files.
  • config.json: architecture config and, for quantized variants, quantization metadata.
  • generation_config.json: default generation settings.
  • tokenizer.json and tokenizer_config.json: tokenizer assets.
  • chat_template.jinja: chat formatting template for inference.
  • assets/onca-logo-horizontal.svg: ONCA family logo used at the top of the model card.

Related Releases

  • onca-1.5: BF16 reference release (this page).
  • onca-1.5-8bit: 8-bit merged release.
  • onca-1.5-4bit: 4-bit merged release.

Limitations and Safety

  • This is a research model and not a clinical decision system.
  • Outputs should be reviewed by qualified experts before any real-world use.
  • The model is specialized for pancreatic cancer and oncology-adjacent workflows rather than broad general medicine.
  • Variant evidence training includes broader oncology signal, but the intended framing of the model remains pancreatic cancer research support.
  • Quantized releases are convenience variants and may behave slightly differently from the BF16 reference checkpoint.

Citation

A formal ONCA 1.5 citation block will be added with the accompanying manuscript. Until then, please cite the model repository and version used in your work.

Acknowledgments

ONCA 1.5 continues the ONCA project lineage from ONCA 1.0 and builds on the Qwen/Qwopus ecosystem plus the open-data contributors whose datasets made this release possible.

Downloads last month
9
Safetensors
Model size
9B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Joesh1/onca-1.5-9B

Collection including Joesh1/onca-1.5-9B