Instructions to use Joesh1/onca-1.5-9B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Joesh1/onca-1.5-9B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Joesh1/onca-1.5-9B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Joesh1/onca-1.5-9B")
model = AutoModelForCausalLM.from_pretrained("Joesh1/onca-1.5-9B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Joesh1/onca-1.5-9B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Joesh1/onca-1.5-9B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Joesh1/onca-1.5-9B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Joesh1/onca-1.5-9B

SGLang

How to use Joesh1/onca-1.5-9B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Joesh1/onca-1.5-9B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Joesh1/onca-1.5-9B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Joesh1/onca-1.5-9B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Joesh1/onca-1.5-9B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Joesh1/onca-1.5-9B with Docker Model Runner:
```
docker model run hf.co/Joesh1/onca-1.5-9B
```

ONCA 1.5

Open pancreatic cancer language model for four research-oriented workflows: trial screening, clinical reasoning, pathology extraction, and variant evidence interpretation.

ONCA 1.5 is the current continued-SFT release in the ONCA model line. It builds on Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled and follows the project direction established in ONCA 1.0: focus on pancreatic cancer workflows, open-data training assets, and practical structured prompting for clinical research use.

For ONCA 1.5, the training emphasis was to improve all four existing task families at once, with particular attention to parser-safe trial screening, preserved pathology extraction performance, concise clinical reasoning, and stronger variant evidence handling.

At a Glance

Field	Value
Release	BF16 reference release
Base model	`Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled`
Architecture	Qwen3.5-class causal LM (`Qwen3_5ForCausalLM`)
Context window	262,144 tokens
Training recipe	Continued SFT
Domain focus	Pancreatic cancer and oncology-adjacent research workflows

What This Release Is Good At

Criterion-aware pancreatic cancer trial screening with explicit eligibility framing.
Concise oncology reasoning and question answering for research workflows.
Structured pathology abstraction with field-oriented prompting.
Variant evidence interpretation with oncology context and uncertainty signaling.

What It Is Specialized For

This model is specialized for pancreatic cancer and oncology-adjacent research workflows rather than broad general-purpose medical chat. It works best when the task is tightly scoped and the target output format is explicit, especially for:

pancreatic cancer trial eligibility review
pathology report abstraction into structured fields
concise oncology reasoning for case discussion
variant evidence interpretation with uncertainty signaling

Example: pancreatic cancer trial-screening workflow

prompt = """
Task: Pancreatic cancer trial screening.

Patient summary:
- 63-year-old with metastatic pancreatic ductal adenocarcinoma
- ECOG 1
- Prior gemcitabine plus nab-paclitaxel
- Bilirubin 0.9 mg/dL
- No active infection

Trial criteria:
- Histologically confirmed metastatic pancreatic adenocarcinoma
- ECOG 0-1
- Progression after 1 prior systemic regimen
- Adequate marrow and hepatic function
- Exclude uncontrolled infection

Return:
1. Eligibility label
2. Criterion-by-criterion reasoning
3. Missing information
"""

messages = [
    {"role": "system", "content": "You are Onca, a pancreatic cancer clinical research assistant."},
    {"role": "user", "content": prompt},
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    do_sample=False,
    temperature=0.2,
)
answer = outputs[0][inputs["input_ids"].shape[1]:]
print(tokenizer.decode(answer, skip_special_tokens=True))

Release Note

This is the main reference checkpoint in the ONCA 1.5 family and the best starting point if you want the least altered release.

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Joesh1/onca-1.5"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True,
)

Use the included tokenizer and chat template for best results:

messages = [
    {"role": "system", "content": "You are Onca, a pancreatic cancer clinical research assistant."},
    {"role": "user", "content": "Extract tumor grade, margin status, pT, and pN from this pathology report as JSON."},
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    do_sample=False,
    temperature=0.2,
)
answer = outputs[0][inputs["input_ids"].shape[1]:]
print(tokenizer.decode(answer, skip_special_tokens=True))

Prompting Tips

Use the included chat template and ask for a specific output structure.
For extraction tasks, request exact JSON keys or field names.
For screening tasks, include both the patient summary and the trial criteria.
Ask the model to state uncertainty and missing information explicitly.

Training Scope

The active ONCA 1.5 continued-SFT stack uses openly available data only. The data mixture keeps all four task families in play, with weighting geared toward variant evidence repair and trial-screening stability while guarding against pathology regression.

Task family	Active rows	Train	Val	Test
Trial Screening	12,137	10,921	608	608
Clinical Reasoning	3,496	3,146	174	176
Pathology Extraction	7,642	6,583	414	405
Variant Evidence	2,432	2,191	116	125
Total	25,707	22,841	1,312	1,314

Initial task weights in the active prepare stack are 27% trial screening, 18% clinical reasoning, 27% pathology extraction, and 28% variant evidence.

Benchmarks

Benchmark tables and comparative evaluation plots will be added later.

Repository Contents

model-*.safetensors: sharded weights for this release.
model.safetensors.index.json: shard map for the checkpoint files.
config.json: architecture config and, for quantized variants, quantization metadata.
generation_config.json: default generation settings.
tokenizer.json and tokenizer_config.json: tokenizer assets.
chat_template.jinja: chat formatting template for inference.
assets/onca-logo-horizontal.svg: ONCA family logo used at the top of the model card.

Related Releases

onca-1.5: BF16 reference release (this page).
onca-1.5-8bit: 8-bit merged release.
onca-1.5-4bit: 4-bit merged release.

Limitations and Safety

This is a research model and not a clinical decision system.
Outputs should be reviewed by qualified experts before any real-world use.
The model is specialized for pancreatic cancer and oncology-adjacent workflows rather than broad general medicine.
Variant evidence training includes broader oncology signal, but the intended framing of the model remains pancreatic cancer research support.
Quantized releases are convenience variants and may behave slightly differently from the BF16 reference checkpoint.

Citation

A formal ONCA 1.5 citation block will be added with the accompanying manuscript. Until then, please cite the model repository and version used in your work.

Acknowledgments

ONCA 1.5 continues the ONCA project lineage from ONCA 1.0 and builds on the Qwen/Qwopus ecosystem plus the open-data contributors whose datasets made this release possible.