---
license: apache-2.0
language:
- en
- pt
tags:
- ner
- human
- hr
- recruit
---

# Entity Extraction NER Model for CVs and JDs (Skills & Experience)

This is a `roberta-base` model fine-tuned for **Named Entity Recognition (NER)** on Human Resources documents, specifically Résumés (CVs) and Job Descriptions (JDs).

The model was trained on a private dataset of approximately **20,000 examples** generated using a **Weak Labeling** strategy. Its primary goal is to extract skills and quantifiable years of experience from free-form text.

## Recognized Entities

The model is trained to extract two main entity types (5 BIO labels):

* **`SKILL`**: Technical skills, software, tools, or soft skills.
    * *Examples: "Python", "machine learning", "React", "AWS", "leadership"*
* **`EXPERIENCE_DURATION`**: Text spans that describe a duration of time.
    * *Examples: "5+ years", "6 months", "3-5 anos", "two years of experience"*

## How to Use (Python)

You can use this model directly with the `token-classification` (or `ner`) pipeline from the `transformers` library.

```python
from transformers import pipeline

# Load the model from the Hub
model_id = "feliponi/hirly-ner-multi" 

# Initialize the pipeline
# aggregation_strategy="simple" groups B- and I- tags (e.g., B-SKILL, I-SKILL -> SKILL)
extractor = pipeline(
    "ner", 
    model=model_id, 
    aggregation_strategy="simple"
)

# Example text
text = """
Data Scientist with 5+ years of experience in Python and machine learning.
Also 6 months in Java.

Soft skills: 
inclusive leadership
paradigm thinking
performance optimization
personal initiative

english language proficiency
portuguese language proficiency

AWS Certified Solutions Architect - Associate"""

# Get entities
entities = extractor(text)

# Filter for high confidence
min_confidence = 0.7
confident_entities = [e for e in entities if e['score'] >= min_confidence]

# Print the results
for entity in confident_entities:
    print(f"[{entity['entity_group']}] {entity['word']} (Confidence: {entity['score']:.2f})")
````

**Expected Output:**

````
[{'entity_group': 'SKILL', 'score': np.float32(0.9340167), 'word': 'Data Scientist', 'start': 1, 'end': 15}, 
{'entity_group': 'EXPERIENCE_DURATION', 'score': np.float32(0.9998663), 'word': ' 5+ years', 'start': 21, 'end': 29}, 
{'entity_group': 'SKILL', 'score': np.float32(0.99859816), 'word': ' Python', 'start': 47, 'end': 53}, 
{'entity_group': 'SKILL', 'score': np.float32(0.9998181), 'word': ' machine learning', 'start': 58, 'end': 74}, 
{'entity_group': 'EXPERIENCE_DURATION', 'score': np.float32(0.9998392), 'word': ' 6 months', 'start': 81, 'end': 89}, 
{'entity_group': 'SKILL', 'score': np.float32(0.9982002), 'word': ' Java', 'start': 93, 'end': 97}, 
{'entity_group': 'SOFT_SKILL', 'score': np.float32(0.995745), 'word': ' leadership', 'start': 124, 'end': 134}, 
{'entity_group': 'SOFT_SKILL', 'score': np.float32(0.9859735), 'word': 'performance optimization', 'start': 153, 'end': 177}, 
{'entity_group': 'SOFT_SKILL', 'score': np.float32(0.98516375), 'word': 'personal initiative', 'start': 178, 'end': 197}, 
{'entity_group': 'LANG', 'score': np.float32(0.96456385), 'word': 'english language proficiency', 'start': 199, 'end': 227}, 
{'entity_group': 'LANG', 'score': np.float32(0.9288162), 'word': 'portuguese language proficiency', 'start': 228, 'end': 259}, 
{'entity_group': 'SKILL', 'score': np.float32(0.926032), 'word': 'AWS', 'start': 261, 'end': 264}, 
{'entity_group': 'SOFT_SKILL', 'score': np.float32(0.9559879), 'word': ' Solutions', 'start': 275, 'end': 284}, 
{'entity_group': 'SKILL', 'score': np.float32(0.84499276), 'word': ' Architect', 'start': 285, 'end': 294}]
````

## Training, Performance, and Limitations

This model's performance is a direct result of its training data and weak labeling methodology.

### Performance

The model was validated on a test set of \~2,000 examples, achieving the following F1-scores:

| Entity | F1-Score |
| :--- | :--- |
| **`SKILLS`** | **98.9%** |
| **`LANG`** | **99.0%** |
| **`CERT`** | **84.9%** |
| **`SOFT_SKILL`** | **98.6%** |
| **`EXPERIENCE_DURATION`** | **99.8%** |
| **Overall** | **96.3%** |

### Training Methodology

This model's performance is a direct result of its **Weak Labeling** training methodology. The labels were generated automatically, not manually annotated.

1.  **`EXPERIENCE_DURATION` (Pattern-Based):** This entity was labeled using a robust set of regular expressions designed to find time-based patterns (e.g., "5+ years", "six months", "3-5 anos"). Its near-perfect F1 score reflects the high precision of this regex approach.

2.  **`SKILL`, `SOFT_SKILL`, `LANG`, `CERT` (Vocabulary-Based):** These four entities were labeled by performing high-speed, *exact matching* against four separate vocabulary files (`skills.txt`, `softskills.txt`, `langskills.txt`, `certifications.txt`).

    * **High Performance (`SKILL`, `SOFT_SKILL`, `LANG`):** The excellent F1 scores (98-99%) indicate that the vocabularies for these labels were comprehensive and matched the training texts frequently.
    * **Good Performance (`CERT`):** The 84.9% F1 score is strong but shows room for improvement. This score suggests the `certifications.txt` vocabulary was less comprehensive. The model's performance for this label would be directly improved by adding more certification names (e.g., "AWS CSAA", "PMP", etc.) to the vocabulary file and retraining.

### Limitations (Important)

  * **Vocabulary Dependency:** The model is excellent at finding the **8,700 skills** it was trained on. It will *not* reliably find new skills or tools that were absent from the training vocabulary. It functions more as a "high-speed vocabulary extractor" than a "skill concept detector."
  * **False Positives:** Because the source vocabulary contained generic words, the model learned to tag them as `SKILL` with high confidence. **Users of this model should filter the output** to remove known false positives.
      * *Examples of common false positives: "communication", "leadership", "teamwork", "project", "skills"*.
  * **Noise:** The model may occasionally output low-confidence punctuation or noise (e.g., `.` with a 0.33 score, as seen in the sample output). It is highly recommended to **filter results by a confidence score (e.g., `score > 0.7`)** for clean outputs.

<!-- end list -->