YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Job ACOS Extractor v4 - Deployment

A fine-tuned Qwen3.5-2B model for extracting structured ACOS (Action-Context-Outcome-Skill) data from job descriptions.

Quick Start

1. Clone the repo (with Git LFS for model weights)

git lfs install
git clone https://huggingface.co/team-loxo/jd-acos-extractor-v4
cd jd-acos-extractor-v4

The model weights (model/model.safetensors, ~3.8 GB) and model/tokenizer.json are stored in Git LFS and are downloaded automatically by the clone.

2. Create a Python virtual env (Python 3.10+ recommended)

python3 -m venv .venv
source .venv/bin/activate    # Windows: .venv\Scripts\activate
pip install --upgrade pip

3. Install dependencies

Important: Use the PyTorch CUDA 12.8 wheel index. The default PyPI torch is built for CUDA 13 and won't run on common NVIDIA drivers (CUDA 12.x).

pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu128

Requirements:

  • NVIDIA GPU with CUDA driver β‰₯ 12.4 (e.g., RTX 30/40/50-series, A100, H100)
  • ~10 GB free disk for model + Python deps
  • ~6 GB GPU VRAM for inference (BF16)

4. Run examples

# Single example (uses tests/jd1.txt, prints extraction)
python run.py run

# Test suite: 3 examples + baseline match check
python run.py test

Expected output for python run.py test:

Running 3 test examples...
...
SUMMARY
  Examples run:     3
  Total time:       <a few seconds>s
  Avg time/example: <a few seconds>s
  Baseline matches: 3/3

Troubleshooting

Problem Fix
model.safetensors is a small text file (~150 bytes) LFS not pulled. Run git lfs install && git lfs pull
RuntimeError: NVIDIA driver ... too old Reinstall torch with --extra-index-url https://download.pytorch.org/whl/cu128
Out-of-memory on GPU Use python run.py run (single example) instead of batched workloads, or set ACOS_DEVICE=cpu
ModuleNotFoundError: transformers Forgot to activate venv: source .venv/bin/activate

Output Schema

The model outputs a JSON object with exactly 3 fields:

{
  "core_responsibilities": ["Design ML pipelines", "Collaborate with data team"],
  "hard_requirements": ["Python", "ML frameworks", "distributed systems"],
  "bonus_skills": ["PyTorch", "TensorFlow", "Kubernetes"]
}
Field Type Description
core_responsibilities list[str] Primary duties and day-to-day responsibilities
hard_requirements list[str] Core skills and technologies required (skill names only, no experience levels)
bonus_skills list[str] Preferred or "nice-to-have" qualifications

Note: The model extracts skill names only, not experience requirements. For example:

  • Input: "5+ years Python experience" β†’ Output: "Python"
  • Input: "Experience with ML frameworks (e.g., PyTorch)" β†’ Output: "ML frameworks" (with PyTorch in bonus_skills)

Project Structure

deploy/
β”œβ”€β”€ model.py          # Model interface and loader (BF16)
β”œβ”€β”€ config.py         # Paths and configuration
β”œβ”€β”€ run.py            # Orchestration (run/test commands)
β”œβ”€β”€ requirements.txt  # Dependencies
β”œβ”€β”€ model/            # Model weights (downloaded from HF)
β”‚   β”œβ”€β”€ model.safetensors
β”‚   β”œβ”€β”€ tokenizer.json
β”‚   β”œβ”€β”€ tokenizer_config.json
β”‚   β”œβ”€β”€ config.json
β”‚   └── generation_config.json
└── tests/            # Test examples and baseline
    β”œβ”€β”€ jd1.txt
    β”œβ”€β”€ jd2.txt
    β”œβ”€β”€ jd3.txt
    └── baseline.json

API Usage

from model import load_model

# Load model (singleton, BF16)
extractor = load_model()

# Extract from job description
jd_text = """
Senior Software Engineer - Machine Learning

Requirements:
- 5+ years Python experience
- Experience with ML frameworks (e.g., PyTorch, TensorFlow)
"""

result = extractor.extract(jd_text)
print(result)
# {
#   "core_responsibilities": [...],
#   "hard_requirements": ["Python", "ML frameworks"],
#   "bonus_skills": ["PyTorch", "TensorFlow"]
# }

# Batch extraction (recommended for production)
jd_texts = [jd1, jd2, jd3, ...]  # List of job descriptions
results = extractor.extract_batch(jd_texts, batch_size=128)

Production Alignment

This deploy uses the inference-optimized configuration that was used to measure the production metrics below (91.8% entity-level F1, 5.89 samples/sec at batch 128).

Component Value Notes
System prompt 382 chars / 78 tokens Inference-optimized (shorter than training prompt for speed)
User message format "Extract structured data...{jd_text}" Matches eval/benchmark configuration
MAX_LENGTH 1,500 tokens Matches benchmark setup
Chat template qwen (chat_template.jinja) Matches training
Tokenizer Qwen3.5-2B Matches training
Precision BF16 Matches training

The model is robust to prompt variations: the shorter inference prompt achieves 91.8% F1 with significantly faster throughput than the original 1,471-char training prompt would.

Performance

Entity-Level Metrics (each sample = one entity, n=2,051)

Metric Value
Precision 91.8%
Recall 91.8%
F1 91.8%

How this is computed:

  1. Compute item-level F1 between model prediction and gold label for each sample.
  2. Non-hard failures (F1 β‰₯ 0.5): 1,613 samples β†’ TP.
  3. Hard failures (F1 < 0.5, n=439) sent to GPT-5.5 with full JD for adjudication:
    • A (157): Model is genuinely wrong β†’ FP/FN
    • B (121): Gold is wrong, model correct β†’ TP
    • BOTH_OK (21): Both valid β†’ TP
    • NEITHER (111): Both have problems β†’ excluded (ambiguous)
    • Judge ERROR (28): excluded
  4. TP = 1,613 + 121 + 21 = 1,755, FP = FN = 157
  5. Precision = Recall = F1 = 1,755 / (1,755 + 157) = 91.8%

P and R converge at entity-level because each sample produces one extraction event: a wrong extraction simultaneously counts as both FP and FN for that sample.

Hard Failure Breakdown (n=439, GPT-5.5 adjudicated)

Verdict Count % of failures Meaning
A 157 35.8% Real model errors
B 121 27.6% Gold label has spurious items, model OK
BOTH_OK 21 4.8% Both acceptable
NEITHER 111 25.3% Both have problems
Judge ERROR 28 6.4% Adjudication failed

Item-Level Metrics (raw, no GPT correction)

Field Precision Recall F1
core_responsibilities 78.6% 72.6% 75.5%
hard_requirements 65.4% 46.8% 54.6%
bonus_skills 71.8% 39.6% 51.1%
Overall 73.2% 58.3% 64.9%

Item-level recall is dragged down by gold label issues β€” 27.6% of hard failures were gold containing items NOT in the JD. True item recall is higher.

Speed Benchmarks (BF16)

Single RTX 5090:

Batch Size Samples/sec Latency P50
16 0.94 1046ms
32 1.51 657ms
64 2.65 381ms
128 (optimal) 5.89 168ms

Multi-GPU Production (8 Γ— RTX 5090): 19.18 samples/sec (~1.66M samples/day)

Other Metrics

Spec Value
Model Size 3.76 GB
Precision BF16
JSON Parse Rate 100%
Max New Tokens 400 (retry: 600)
Max Input Length 1500 tokens
Max JD Characters 2500

For the full breakdown including extraction & judge prompts, see v4_production_report.html.

Deployment Notes

  • Runtime: Loads from local model/ directory only (no remote HF fetch)
  • Precision: BF16 (optimal for RTX 5090 and modern GPUs)
  • Validation: Detects Git LFS pointers and requires real weights before running

Preflight Checks

Before running:

  1. Verify model/model.safetensors exists and is not a Git LFS pointer
  2. Install dependencies: pip install -r requirements.txt
  3. Confirm test files exist in tests/
# Check model file is real (not LFS pointer)
head -c 64 model/model.safetensors | xxd
# Should NOT show "version https://git-lfs"

License

Internal use only. Contact team-loxo for licensing inquiries.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support