Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,94 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# WorkUA Resumes Dataset
|
| 2 |
+
|
| 3 |
+
## Dataset Summary
|
| 4 |
+
|
| 5 |
+
This dataset consists of 84,316 resume entries collected from
|
| 6 |
+
publicly available pages on [Work.ua]("https://www.work.ua/resumes/"). Each entry represents structured
|
| 7 |
+
information extracted from a candidate's resume, including education,
|
| 8 |
+
work experience, skills, languages, disability status, veteran status,
|
| 9 |
+
driver license presence, and additional profile metadata.
|
| 10 |
+
|
| 11 |
+
The dataset is designed for research and development of:
|
| 12 |
+
|
| 13 |
+
- Resume parsing models
|
| 14 |
+
- Information extraction systems
|
| 15 |
+
- Vacancy--candidate matching algorithms
|
| 16 |
+
- NLP pipelines for Ukrainian-language documents
|
| 17 |
+
- Data engineering and ML training workflows
|
| 18 |
+
|
| 19 |
+
All personally identifying information has been removed or anonymized.
|
| 20 |
+
|
| 21 |
+
## Dataset Structure
|
| 22 |
+
|
| 23 |
+
The dataset is provided as a Polars DataFrame with **21 fields**.
|
| 24 |
+
|
| 25 |
+
### Schema Overview
|
| 26 |
+
|
| 27 |
+
id: String
|
| 28 |
+
url: String
|
| 29 |
+
title: String
|
| 30 |
+
candidate_name: String
|
| 31 |
+
age: Int64
|
| 32 |
+
city: String
|
| 33 |
+
desired_salary: Int64
|
| 34 |
+
employment_type: String
|
| 35 |
+
work_location_preference: String
|
| 36 |
+
driver_license: Boolean
|
| 37 |
+
creation_date: Datetime
|
| 38 |
+
other_resumes: List(Struct{title, url, resume_id, description})
|
| 39 |
+
veteran: Boolean
|
| 40 |
+
disability: String
|
| 41 |
+
work_experiences: List(Struct{position, start_date, end_date, company, city, industry, responsibilities})
|
| 42 |
+
recommendations: List(Struct{name, position})
|
| 43 |
+
languages: List(Struct{language, level})
|
| 44 |
+
skills: List(String)
|
| 45 |
+
educations: List(Struct{institution, faculty, city, level, start_year, end_year})
|
| 46 |
+
additional_educations: List(Struct{institution, start_year, end_year})
|
| 47 |
+
additional_info: String
|
| 48 |
+
|
| 49 |
+
## Data Example
|
| 50 |
+
|
| 51 |
+
{
|
| 52 |
+
"id": "123456",
|
| 53 |
+
"url": "https://www.work.ua/resumes/123456/",
|
| 54 |
+
"title": "Будівельник",
|
| 55 |
+
"candidate_name": "Іван",
|
| 56 |
+
"age": 32,
|
| 57 |
+
"city": "Київ",
|
| 58 |
+
"desired_salary": 25000,
|
| 59 |
+
"employment_type": "повна",
|
| 60 |
+
"work_location_preference": "офіс",
|
| 61 |
+
"driver_license": true,
|
| 62 |
+
"creation_date": "2025-03-10T12:30:00",
|
| 63 |
+
"veteran": false,
|
| 64 |
+
"disability": null,
|
| 65 |
+
"skills": ["Штукатурка", "Монтаж гіпсокартону"],
|
| 66 |
+
"languages": [{"language": "Українська", "level": "вільно"}],
|
| 67 |
+
"educations": [
|
| 68 |
+
{
|
| 69 |
+
"institution": "КНУБА",
|
| 70 |
+
"faculty": "Промислове та цивільне будівництво",
|
| 71 |
+
"city": "Київ",
|
| 72 |
+
"level": "Вища",
|
| 73 |
+
"start_year": 2012,
|
| 74 |
+
"end_year": 2016
|
| 75 |
+
}
|
| 76 |
+
],
|
| 77 |
+
"additional_info": "Готовий до відряджень."
|
| 78 |
+
}
|
| 79 |
+
|
| 80 |
+
## Intended Use
|
| 81 |
+
|
| 82 |
+
- Training resume parsers
|
| 83 |
+
- Semantic search research
|
| 84 |
+
- Text classification
|
| 85 |
+
- Career recommendation systems
|
| 86 |
+
- Applicant ranking models
|
| 87 |
+
|
| 88 |
+
## Limitations
|
| 89 |
+
|
| 90 |
+
- Some fields may be incomplete due to original document variability
|
| 91 |
+
|
| 92 |
+
## Ethical Considerations
|
| 93 |
+
|
| 94 |
+
These resumes don't include any sensitive or personal information.
|