--- library_name: scikit-learn tags: - jobs - classification - tf-idf pipeline_tag: text-classification --- # Jobs job-category classifier (sklearn) This repo holds the trained artifact consumed by **jobs-shared** (`jobs_shared.ai.categorizers.pipeline`) for the findjobs taxonomy. - **Weights:** `category.joblib` — `joblib`-serialized scikit-learn `Pipeline` (`TfidfVectorizer` + `LogisticRegression`), plus artifact keys `fields` and `input_joiner`. Compressed with `joblib.dump(..., compress=3)`. - **Upstream:** Produced by ``scripts/train/train_category.py``. - **HF repo:** `gateswang00/job_classifier` ### Load locally ```python import joblib from huggingface_hub import hf_hub_download path = hf_hub_download(repo_id="gateswang00/job_classifier", filename="category.joblib") artifact = joblib.load(path) clf = artifact["model"] # sklearn Pipeline fields = artifact.get("fields", ["title", "llm_skills", "description"]) print(fields) ``` ### Training metadata snapshot ``` categorizer_filter: ['qwen2.5:7b', 'qwen3-jobs-classifier'] categorizer_mix: {'qwen2.5:7b': 8086} category_source_filter: "category_source IS DISTINCT FROM 'rules'" category_source_mix: {'(null)': 8167} llm_skills_coverage: 0.7056641108088053 min_per_class: 50 n_classes: 14 n_rows: 8086 random_state: 42 source: 'jobs.job_categorized JOIN jobs.jobs_found LEFT JOIN LATERAL jobs.job_extracted' test_size: 0.2 trained_at: '2026-05-23T15:57:36.400167+00:00' ``` Replace this README’s license/frontmatter via the Hugging Face model card UI if needed.