metadata
library_name: scikit-learn
tags:
- jobs
- classification
- tf-idf
pipeline_tag: text-classification
Jobs job-category classifier (sklearn)
This repo holds the trained artifact consumed by **jobs-shared**
(`jobs_shared.ai.categorizers.pipeline`) for the findjobs taxonomy.
- **Weights:** `category.joblib` — `joblib`-serialized scikit-learn `Pipeline`
(`TfidfVectorizer` + `LogisticRegression`), plus artifact keys `fields`
and `input_joiner`. Compressed with `joblib.dump(..., compress=3)`.
- **Upstream:** Produced by ``scripts/train/train_category.py``.
- **HF repo:** `gateswang00/job_classifier`
### Load locally
```python
import joblib
from huggingface_hub import hf_hub_download
path = hf_hub_download(repo_id="gateswang00/job_classifier", filename="category.joblib")
artifact = joblib.load(path)
clf = artifact["model"] # sklearn Pipeline
fields = artifact.get("fields", ["title", "llm_skills", "description"])
print(fields)
```
### Training metadata snapshot
```
categorizer_filter: ['qwen2.5:7b', 'qwen3-jobs-classifier']
categorizer_mix: {'qwen2.5:7b': 8086} category_source_filter: "category_source IS DISTINCT FROM 'rules'" category_source_mix: {'(null)': 8167} llm_skills_coverage: 0.7056641108088053 min_per_class: 50 n_classes: 14 n_rows: 8086 random_state: 42 source: 'jobs.job_categorized JOIN jobs.jobs_found LEFT JOIN LATERAL jobs.job_extracted' test_size: 0.2 trained_at: '2026-05-23T15:57:36.400167+00:00' ```
Replace this README’s license/frontmatter via the Hugging Face model card UI if needed.