YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
READ ME was generated by ChatGPT
gmay29/ner_model_final
Model Description
This is a Named Entity Recognition (NER) model based on microsoft/deberta-base. It was fine-tuned on synthetic internship and job description data generated using mostly.ai.
The model extracts structured entities from internship postings and job descriptions, such as:
- SKILL โ technical skills, tools, programming languages, frameworks, and soft skills
- DISCIPLINE โ academic or technical fields (AI, ML, NLP, Computer Vision, Engineering, etc.)
- COURSE โ courses and degrees mentioned in the job description
- ROLE โ job roles and collaborators (intern, data scientist, software engineer, etc.)
Intended Use
- Parsing internship descriptions
- Parsing job postings
- Building HR/recruitment tools
- Structuring unstructured job text into a machine-readable format
Not designed for parsing candidate resumes.
Training Data
- Synthetic internship and job description dataset generated with mostly.ai
- ~20,000 labeled samples (replace with actual dataset size if you know it)
- Labels:
SKILL,DISCIPLINE,COURSE,ROLE
How to Use
import torch
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
# Load from Hugging Face Hub
model_name = "gmay29/ner_model_final"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)
device = 0 if torch.cuda.is_available() else -1 # pipeline expects 0 for GPU, -1 for CPU
# Create NER pipeline (handles context automatically)
ner_pipeline = pipeline("ner", model=model, tokenizer=tokenizer, device=device, aggregation_strategy="simple")
# Example job description text
text = """
Responsibilities of the Intern:
Design, develop, and implement AI agents for various applications.
Train and fine-tune machine learning models using structured and unstructured datasets.
Work with deep learning frameworks such as TensorFlow, PyTorch, and scikit-learn.
Implement reinforcement learning techniques to enhance AI agent performance.
Perform data preprocessing, augmentation, and feature engineering.
Optimize model performance through hyperparameter tuning and algorithm optimization.
Integrate AI solutions into real-world applications and assist in deployment.
Collaborate with data scientists, software engineers, and business teams to understand requirements and deliver AI-driven solutions.
Conduct research and stay updated with the latest trends and advancements in AI and ML.
Document research findings, methodologies, and results effectively.
Requirements:
Strong programming skills in Python (knowledge of C++/Java is a plus).
Experience with machine learning frameworks like TensorFlow, PyTorch, or Keras.
Understanding of deep learning, natural language processing (NLP), and computer vision techniques.
Familiarity with reinforcement learning concepts and their applications.
Knowledge of data preprocessing, feature engineering, and model evaluation techniques.
Experience working with large datasets and cloud computing platforms (AWS, Google Cloud, or Azure) is a plus.
Strong problem-solving skills and the ability to work in a collaborative team environment.
Excellent communication and documentation skills.
"""
# Run inference
entities = ner_pipeline(text)
# Pretty print results
for ent in entities:
print(f"{ent['entity_group']:<10} | {ent['word']:<25} | score={ent['score']:.3f}")
Example Output:
Device set to use cpu
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
SKILL | machine learning | score=1.000
SKILL | learning | score=1.000
SKILL | learn | score=0.758
SKILL | learning techniques | score=0.769
DISCIPLINE | engineering | score=1.000
COURSE | data scientists | score=0.954
SKILL | research | score=1.000
SKILL | ML | score=0.997
SKILL | research findings | score=1.000
SKILL | results | score=0.960
SKILL | programming | score=1.000
SKILL | Python | score=0.599
SKILL | machine learning | score=1.000
SKILL | learning | score=1.000
SKILL | learning | score=1.000
DISCIPLINE | engineering | score=1.000
SKILL | collaborative | score=0.900
SKILL | communication | score=1.000
SKILL | documentation skills | score=0.983
Limitations
- Trained on synthetic job descriptions โ may not perfectly generalize to real-world postings.
- Some ambiguity across entity classes (e.g., โAIโ could be both
DISCIPLINEandSKILL). - Supports English only.
Future Work
- Extend training with real-world internship/job postings.
- Add entity types such as
CERTIFICATION,TOOLS,COMPANY. - Benchmark against public job-posting datasets for NER.
[More Information Needed]
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support