Jobs job-category classifier (sklearn)

  This repo holds the trained artifact consumed by **jobs-shared**
  (`jobs_shared.ai.categorizers.pipeline`) for the findjobs taxonomy.

  - **Weights:** `category.joblib` — `joblib`-serialized scikit-learn `Pipeline`
    (`TfidfVectorizer` + `LogisticRegression`), plus artifact keys `fields`
    and `input_joiner`. Compressed with `joblib.dump(..., compress=3)`.
  - **Upstream:** Produced by ``scripts/train/train_category.py``.
  - **HF repo:** `gateswang00/job_classifier`

  ### Load locally

  ```python
  import joblib
  from huggingface_hub import hf_hub_download

  path = hf_hub_download(repo_id="gateswang00/job_classifier", filename="category.joblib")
  artifact = joblib.load(path)
  clf = artifact["model"]  # sklearn Pipeline
  fields = artifact.get("fields", ["title", "llm_skills", "description"])
  print(fields)
  ```

  ### Training metadata snapshot

  ```
    categorizer_filter: ['qwen2.5:7b', 'qwen3-jobs-classifier']

categorizer_mix: {'qwen2.5:7b': 8086} category_source_filter: "category_source IS DISTINCT FROM 'rules'" category_source_mix: {'(null)': 8167} llm_skills_coverage: 0.7056641108088053 min_per_class: 50 n_classes: 14 n_rows: 8086 random_state: 42 source: 'jobs.job_categorized JOIN jobs.jobs_found LEFT JOIN LATERAL jobs.job_extracted' test_size: 0.2 trained_at: '2026-05-23T15:57:36.400167+00:00' ```

  Replace this README’s license/frontmatter via the Hugging Face model card UI if needed.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support