akzaidan commited on
Commit
22472aa
·
verified ·
1 Parent(s): be9e7a1

Upload folder using huggingface_hub

Browse files
Files changed (6) hide show
  1. README.md +44 -0
  2. config.json +14 -0
  3. norm_stats.json +10 -0
  4. pytorch_model.bin +3 -0
  5. tokenizer.json +0 -0
  6. tokenizer_config.json +23 -0
README.md ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ tags:
4
+ - job-classification
5
+ - salary-prediction
6
+ - experience-prediction
7
+ - deberta
8
+ ---
9
+ # JobPredictor1
10
+ Fine-tuned DeBERTa-v3-base model that predicts:
11
+ - **Expected years of experience** required for a job
12
+ - **Expected salary** (USD)
13
+
14
+ ## Input Format
15
+ ```
16
+ [LOCATION] <Remote | United States (State) | Country> [TITLE]: <job title> [DESC]: <job description>
17
+ ```
18
+
19
+ ## Outputs
20
+ | Output | Type | Description |
21
+ |---|---|---|
22
+ | expected_experience_years | int | Years of experience required |
23
+ | expected_salary | int | Expected salary (USD) |
24
+
25
+ ## Normalization
26
+ Experience is z-score normalized:
27
+ ```python
28
+ real_value = pred * norm_stats["expected_experience_years"]["std"] + norm_stats["expected_experience_years"]["mean"]
29
+ ```
30
+ Salary is log1p + z-score normalized:
31
+ ```python
32
+ real_salary = np.expm1(pred * norm_stats["expected_salary"]["std"] + norm_stats["expected_salary"]["mean"])
33
+ ```
34
+
35
+ ## Test Set Performance
36
+ | Metric | Value |
37
+ |---|---|
38
+ | Experience MAE | 0.57 years |
39
+ | Experience Within 1yr | 83.1% |
40
+ | Salary MAE | $15,511 |
41
+ | Salary Within $20k | 84.5% |
42
+
43
+ ## Base Model
44
+ microsoft/deberta-v3-base
config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "base_model": "microsoft/deberta-v3-base",
3
+ "architecture": "DeBERTa-v3-base + 2 regression heads",
4
+ "outputs": {
5
+ "expected_experience_years": "integer (years of experience)",
6
+ "expected_salary": "integer (expected salary USD)"
7
+ },
8
+ "normalization": {
9
+ "expected_experience_years": "z-score \u2014 use norm_stats.json to denormalize",
10
+ "expected_salary": "log1p then z-score \u2014 reverse with: expm1(pred * std + mean)"
11
+ },
12
+ "max_length": 512,
13
+ "dropout": 0.2
14
+ }
norm_stats.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "expected_experience_years": {
3
+ "mean": 2.9545610445835053,
4
+ "std": 2.8019970286307627
5
+ },
6
+ "expected_salary": {
7
+ "mean": 11.046510169065424,
8
+ "std": 0.9250129804606931
9
+ }
10
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:930177670bfead06a4bfaf12bf756bb27c61301ce20079c8af01905cddbf3bee
3
+ size 736994139
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": true,
3
+ "backend": "tokenizers",
4
+ "bos_token": "[CLS]",
5
+ "cls_token": "[CLS]",
6
+ "do_lower_case": false,
7
+ "eos_token": "[SEP]",
8
+ "extra_special_tokens": [
9
+ "[PAD]",
10
+ "[CLS]",
11
+ "[SEP]"
12
+ ],
13
+ "is_local": false,
14
+ "mask_token": "[MASK]",
15
+ "model_max_length": 1000000000000000019884624838656,
16
+ "pad_token": "[PAD]",
17
+ "sep_token": "[SEP]",
18
+ "split_by_punct": false,
19
+ "tokenizer_class": "DebertaV2Tokenizer",
20
+ "unk_id": 3,
21
+ "unk_token": "[UNK]",
22
+ "vocab_type": "spm"
23
+ }