Instructions to use MitzMitz/Llama-ChemLink-Parser-8B-MTYS with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use MitzMitz/Llama-ChemLink-Parser-8B-MTYS with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.3") model = PeftModel.from_pretrained(base_model, "MitzMitz/Llama-ChemLink-Parser-8B-MTYS") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Unsloth Studio
How to use MitzMitz/Llama-ChemLink-Parser-8B-MTYS with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for MitzMitz/Llama-ChemLink-Parser-8B-MTYS to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for MitzMitz/Llama-ChemLink-Parser-8B-MTYS to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for MitzMitz/Llama-ChemLink-Parser-8B-MTYS to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="MitzMitz/Llama-ChemLink-Parser-8B-MTYS", max_seq_length=2048, )
Llama-ChemLink-Parser-8B-MTYS
ChemLink is a LoRA fine-tune of tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.3 for extracting chemical measurement values (MW, IC50, EC50, Yield) from scientific literature, with compound-name linkage for PubChem grounding and Graph RAG integration.
Key Capability
Under a measurement-only prompt (no explicit compound_name instruction),
ChemLink uniquely outputs compound_name alongside each extracted value.
Baseline models output 0% compound names under the same prompt.
{
"document_understanding": {},
"chemical_entities": [
{
"compound_name": "linezolid",
"measurements": [{"type": "Molecular Weight", "value": 337.35, "unit": "g/mol"}]
}
]
}
This behavior is baked into weights by fine-tuning and enables downstream Graph RAG pipelines where a measurement value must be linked to its chemical entity node without manual post-processing.
Model Overview
| Item | Detail |
|---|---|
| Developer | MitzMitz |
| Base model | tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.3 |
| Training tool | unsloth + TRL (SFTTrainer) |
| Quantization | 4-bit NF4 (QLoRA, load_in_4bit=True) |
| LoRA config | r=16, alpha=32, dropout=0, bias=none |
| Max seq length | 2048 |
| Supported languages | Japanese, English |
| License | Llama 3.1 Community License |
Usage
Inference (Colab / GPU)
import torch, json, re
from unsloth import FastLanguageModel
from google.colab import userdata
HF_TOKEN = userdata.get('HF_TOKEN')
SYSTEM_PROMPT = (
"You are a chemical data extraction assistant. "
"Extract measurements from the given text and return a JSON array. "
"Each element must have: type (IC50/EC50/MW/Yield), value (number), unit (string). "
"If no target measurement is found, return []. "
"Output only the JSON array, no explanation."
)
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "MitzMitz/Llama-ChemLink-Parser-8B-MTYS",
max_seq_length = 2048,
dtype = None,
load_in_4bit = True,
token = HF_TOKEN,
)
FastLanguageModel.for_inference(model)
def extract(text):
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": text},
]
input_ids = tokenizer.apply_chat_template(
messages, tokenize=True,
add_generation_prompt=True, return_tensors="pt"
).to("cuda")
with torch.no_grad():
output = model.generate(
input_ids, max_new_tokens=256,
temperature=0.1, do_sample=False,
pad_token_id=tokenizer.eos_token_id,
)
raw = tokenizer.decode(output[0][input_ids.shape[1]:], skip_special_tokens=True).strip()
raw = re.sub(r"^```(?:json)?\s*", "", raw, flags=re.MULTILINE)
raw = re.sub(r"\s*```\s*$", "", raw, flags=re.MULTILINE).strip()
try:
return json.loads(raw)
except json.JSONDecodeError:
return raw
print(extract("The compound linezolid has a molecular weight of 337.35 g/mol."))
Local CPU inference (Ollama)
ollama create llama-chemlink-parser-8b-mtys -f Modelfile
ollama run llama-chemlink-parser-8b-mtys
Training Data
| File | Total | MW | IC50 | EC50 | Yield | Negative ([]) | Source |
|---|---|---|---|---|---|---|---|
| phase6_train_mix | 3,763 | 2,283 | 717 | 0 | 44 | 719 | PubChem / ChEMBL / ORD |
| additional_ec50_yield | 2,534 | 0 | 0 | 1,000 | 1,000 | 534 | ChEMBL / ORD |
| additional_yield_table | 621 | 0 | 0 | 0 | 500 | 121 | ORD |
| additional_mw_unit_fix | 120 | 84 | 16 | 0 | 0 | 20 | PubChem |
| additional_phase5 | 740 | 17 | 115 | 22 | 425 | 161 | ChEMBL / ORD / PubChem |
| Total | 7,778 | 2,384 | 848 | 1,022 | 1,969 | 1,555 |
Data licenses: ORD (CC-BY-SA 4.0), ChEMBL (CC-BY-SA 3.0), PubChem (Public Domain).
Evaluation
Dataset
| Indicator | n | Source | Contamination |
|---|---|---|---|
| MW | 744 | PubChem (PMID-verified) | 0 |
| Yield | 750 | ORD (PMID-verified) | 0 |
| IC50 | 740 | ChEMBL โ PubMed Abstract | 0 |
| EC50 | 729 | ChEMBL โ PubMed Abstract | 0 |
| Total | 2,963 | 0 confirmed |
Condition A โ No explicit compound_name instruction
Prompt requests type / value / unit only. No compound_name requested.
MW (n = 744)
| Model | Environment | MW correct | compound_name output |
|---|---|---|---|
| ChemLink NF4 (this model) | Colab GPU / unsloth NF4 | 736 / 744 = 98.9% | 736 / 744 = 98.9% |
| ChemLink q5_k_m (GGUF) | Local CPU / Ollama | 663 / 744 = 89.1% | 663 / 744 = 89.1% |
| GPT-4.1-mini | OpenAI API | 744 / 744 = 100.0% | 0 / 744 = 0% |
| Swallow-base | Colab GPU / unsloth NF4 | 692 / 744 = 93.0% | 0 / 744 = 0% |
| Mistral-7B | Local CPU / Ollama | 706 / 744 = 94.9% | 0 / 744 = 0% |
| Gemma-7B | Local CPU / Ollama | 283 / 744 = 38.0% | 0 / 744 = 0% |
Yield (n = 750)
| Model | Environment | Yield correct | compound_name output |
|---|---|---|---|
| ChemLink NF4 (this model) | Colab GPU / unsloth NF4 | 730 / 750 = 97.3% | 728 / 750 = 97.1% |
| ChemLink q5_k_m (GGUF) | Local CPU / Ollama | 610 / 750 = 81.3% | 610 / 750 = 81.3% |
| GPT-4.1-mini | OpenAI API | 750 / 750 = 100.0% | 0 / 750 = 0% |
| Swallow-base | Colab GPU / unsloth NF4 | 748 / 750 = 99.7% | 0 / 750 = 0% |
| Mistral-7B | Local CPU / Ollama | 517 / 750 = 68.9% | 0 / 750 = 0% |
| Gemma-7B | Local CPU / Ollama | 442 / 750 = 58.9% | 0 / 750 = 0% |
Inference note: Colab models used temperature=0.1 / max_new_tokens=256 / apply_chat_template. Local Ollama models used temperature=0.0 / num_predict=128 / manual Modelfile TEMPLATE. GPT-4.1-mini was evaluated via the OpenAI Chat Completion API in a separate run. These environment differences should be considered when comparing across rows.
ChemLink NF4 and q5_k_m are the only models that output compound_name under this prompt. This behavior is baked into weights by fine-tuning and does not require any additional instruction.
Condition B โ Explicit compound_name instruction (Colab, same prompt for all models)
All 5 models received the same prompt explicitly requesting compound_name. Evaluated on the same 2,963-sample dataset from Colab GPU.
MW (n = 744)
| Model | MW correct | compound_name output | PubChem hit | MW DB-verified |
|---|---|---|---|---|
| ChemLink NF4 | 740 / 744 = 99.5% | 741 / 744 = 99.6% | 0 | 0 |
| ChemLink q5_k_m | 744 / 744 = 100.0% | 744 / 744 = 100.0% | 4 | 4 (100% cond.) |
| GPT-4.1-mini | 741 / 744 = 99.6% | 741 / 744 = 99.6% | 12 | 10 (83.3% cond.) |
| Mistral-7B | 743 / 744 = 99.9% | 443 / 744 = 59.5% | 36 | 35 (97.2% cond.) |
| Swallow-base | 734 / 744 = 98.7% | 733 / 744 = 98.5% | 0 | 0 |
Yield (n = 750)
| Model | Yield correct | compound_name output | PubChem hit |
|---|---|---|---|
| ChemLink NF4 | 685 / 750 = 91.3% | 685 / 750 = 91.3% | 0 |
| ChemLink q5_k_m | 672 / 750 = 89.6% | 672 / 750 = 89.6% | 49 |
| GPT-4.1-mini | 750 / 750 = 100.0% | 727 / 750 = 96.9% | 18 |
| Mistral-7B | 669 / 750 = 89.2% | 363 / 750 = 48.4% | 18 |
| Swallow-base | 749 / 750 = 99.9% | 706 / 750 = 94.1% | 0 |
PubChem hit rates are low across all models because real PubMed abstracts frequently use generic compound codes ("compound 3", "44") rather than IUPAC names. This is a property of the input text, not of model capability.
PubChem MW grounding โ Synthetic texts (V13.1 strict protocol)
Evaluated on synthetic texts where each sentence explicitly contains an IUPAC compound name and its MW value (sourced from PubChem). ChemLink outputs compound_name without instruction; the extracted name is then searched in PubChem and matched against the source MW.
| Model | compound_name output | PubChem candidate | MW match | Full success rate | Conditional match |
|---|---|---|---|---|---|
| ChemLink NF4 | 736 / 744 | 380 | 375 | 375 / 744 = 50.4% | 375 / 380 = 98.7% |
| ChemLink q5_k_m | 663 / 744 | 392 | 389 | 389 / 744 = 52.3% | 389 / 392 = 99.2% |
| All baselines | 0 / 744 | 0 | 0 | 0% | โ |
Full success: MW correctly extracted AND compound_name output AND PubChem candidate found AND PubChem MolecularWeight matches extracted MW (ยฑ1%). Source: chemlink_v13_1_strict_db_normalization.xlsx, V13.1 strict fixed protocol. Baselines had 0% compound_name output in no-instruction condition; PubChem grounding not applicable.
Limitations
IC50 / EC50: Extraction scores were < 2% across all models and conditions. This reflects a limitation of the unified-output evaluation protocol, not model capability. Not suitable for cross-model comparison on these indicators.
compound_name in real abstracts: Real PubMed abstracts often use generic codes ("compound 3", "2b") rather than IUPAC names. ChemLink outputs whatever name appears in the source text. PubChem resolution depends on how the original literature names the compound.
Quantization gap: ChemLink NF4 (Colab) and q5_k_m (local GGUF) differ in quantization and inference backend. The q5_k_m variant shows ~10 pp lower MW extraction rate than NF4 in the no-instruction evaluation.
Inference environment: Colab GPU evaluations used temperature=0.1 / max_new_tokens=256. Local Ollama evaluations used temperature=0.0 / num_predict=128. Cross-environment comparisons should account for these differences.
Framework Versions (Training)
| Library | Version |
|---|---|
| unsloth | 2026.5.2 |
| PEFT | 0.19.1 |
| Transformers | 5.5.0 |
| PyTorch | 2.10.0 |
| TRL | 0.24.0 |
| Datasets | 4.3.0 |
- Downloads last month
- -