gateswang00 commited on
Commit
cedc354
·
verified ·
1 Parent(s): 6ef38a9

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +23 -20
README.md CHANGED
@@ -9,29 +9,32 @@ pipeline_tag: text-classification
9
 
10
  # Jobs job-category classifier (sklearn)
11
 
12
- This repo holds the trained artifact consumed by **jobs-shared** (`jobs_shared.ai.categorizers.pipeline`) for the findjobs taxonomy.
 
13
 
14
- - **Weights:** `category.joblib` — `joblib`-serialized scikit-learn `Pipeline` (`TfidfVectorizer` + `LogisticRegression`), plus artifact keys `fields` and `input_joiner`. Compressed with `joblib.dump(..., compress=3)`.
15
- - **Upstream:** Produced by `scripts/train/train_category.py`.
16
- - **HF repo:** `gateswang00/job_classifier`
 
 
17
 
18
- ## Load locally
19
 
20
- ```python
21
- import joblib
22
- from huggingface_hub import hf_hub_download
23
 
24
- path = hf_hub_download(repo_id="gateswang00/job_classifier", filename="category.joblib")
25
- artifact = joblib.load(path)
26
- clf = artifact["model"] # sklearn Pipeline
27
- fields = artifact.get("fields", ["title", "llm_skills", "description"])
28
- print(fields)
29
- ```
30
 
31
- ## Training metadata snapshot
32
 
33
- ```yaml
34
- categorizer_filter: ['qwen2.5:7b', 'qwen3-jobs-classifier']
35
  categorizer_mix: {'qwen2.5:7b': 8086}
36
  category_source_filter: "category_source IS DISTINCT FROM 'rules'"
37
  category_source_mix: {'(null)': 8167}
@@ -40,9 +43,9 @@ min_per_class: 50
40
  n_classes: 14
41
  n_rows: 8086
42
  random_state: 42
43
- source: "jobs.job_categorized JOIN jobs.jobs_found LEFT JOIN LATERAL jobs.job_extracted"
44
  test_size: 0.2
45
  trained_at: '2026-05-23T15:57:36.400167+00:00'
46
- ```
47
 
48
- Replace this README's license/frontmatter via the Hugging Face model card UI if needed.
 
9
 
10
  # Jobs job-category classifier (sklearn)
11
 
12
+ This repo holds the trained artifact consumed by **jobs-shared**
13
+ (`jobs_shared.ai.categorizers.pipeline`) for the findjobs taxonomy.
14
 
15
+ - **Weights:** `category.joblib` — `joblib`-serialized scikit-learn `Pipeline`
16
+ (`TfidfVectorizer` + `LogisticRegression`), plus artifact keys `fields`
17
+ and `input_joiner`. Compressed with `joblib.dump(..., compress=3)`.
18
+ - **Upstream:** Produced by ``scripts/train/train_category.py``.
19
+ - **HF repo:** `gateswang00/job_classifier`
20
 
21
+ ### Load locally
22
 
23
+ ```python
24
+ import joblib
25
+ from huggingface_hub import hf_hub_download
26
 
27
+ path = hf_hub_download(repo_id="gateswang00/job_classifier", filename="category.joblib")
28
+ artifact = joblib.load(path)
29
+ clf = artifact["model"] # sklearn Pipeline
30
+ fields = artifact.get("fields", ["title", "llm_skills", "description"])
31
+ print(fields)
32
+ ```
33
 
34
+ ### Training metadata snapshot
35
 
36
+ ```
37
+ categorizer_filter: ['qwen2.5:7b', 'qwen3-jobs-classifier']
38
  categorizer_mix: {'qwen2.5:7b': 8086}
39
  category_source_filter: "category_source IS DISTINCT FROM 'rules'"
40
  category_source_mix: {'(null)': 8167}
 
43
  n_classes: 14
44
  n_rows: 8086
45
  random_state: 42
46
+ source: 'jobs.job_categorized JOIN jobs.jobs_found LEFT JOIN LATERAL jobs.job_extracted'
47
  test_size: 0.2
48
  trained_at: '2026-05-23T15:57:36.400167+00:00'
49
+ ```
50
 
51
+ Replace this READMEs license/frontmatter via the Hugging Face model card UI if needed.