**READ ME was generated by ChatGPT** # gmay29/ner\_model\_final ### Model Description This is a **Named Entity Recognition (NER)** model based on [**microsoft/deberta-base**](https://huggingface.co/microsoft/deberta-base). It was **fine-tuned on synthetic internship and job description data** generated using **mostly.ai**. The model extracts structured entities from **internship postings and job descriptions**, such as: * **SKILL** → technical skills, tools, programming languages, frameworks, and soft skills * **DISCIPLINE** → academic or technical fields (AI, ML, NLP, Computer Vision, Engineering, etc.) * **COURSE** → courses and degrees mentioned in the job description * **ROLE** → job roles and collaborators (intern, data scientist, software engineer, etc.) --- ### Intended Use * Parsing **internship descriptions** * Parsing **job postings** * Building **HR/recruitment tools** * Structuring unstructured job text into a machine-readable format Not designed for parsing **candidate resumes**. --- ### Training Data * **Synthetic internship and job description dataset** generated with **mostly.ai** * \~20,000 labeled samples (replace with actual dataset size if you know it) * Labels: `SKILL`, `DISCIPLINE`, `COURSE`, `ROLE` --- ### How to Use ```python import torch from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline # Load from Hugging Face Hub model_name = "gmay29/ner_model_final" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForTokenClassification.from_pretrained(model_name) device = 0 if torch.cuda.is_available() else -1 # pipeline expects 0 for GPU, -1 for CPU # Create NER pipeline (handles context automatically) ner_pipeline = pipeline("ner", model=model, tokenizer=tokenizer, device=device, aggregation_strategy="simple") # Example job description text text = """ Responsibilities of the Intern: Design, develop, and implement AI agents for various applications. Train and fine-tune machine learning models using structured and unstructured datasets. Work with deep learning frameworks such as TensorFlow, PyTorch, and scikit-learn. Implement reinforcement learning techniques to enhance AI agent performance. Perform data preprocessing, augmentation, and feature engineering. Optimize model performance through hyperparameter tuning and algorithm optimization. Integrate AI solutions into real-world applications and assist in deployment. Collaborate with data scientists, software engineers, and business teams to understand requirements and deliver AI-driven solutions. Conduct research and stay updated with the latest trends and advancements in AI and ML. Document research findings, methodologies, and results effectively. Requirements: Strong programming skills in Python (knowledge of C++/Java is a plus). Experience with machine learning frameworks like TensorFlow, PyTorch, or Keras. Understanding of deep learning, natural language processing (NLP), and computer vision techniques. Familiarity with reinforcement learning concepts and their applications. Knowledge of data preprocessing, feature engineering, and model evaluation techniques. Experience working with large datasets and cloud computing platforms (AWS, Google Cloud, or Azure) is a plus. Strong problem-solving skills and the ability to work in a collaborative team environment. Excellent communication and documentation skills. """ # Run inference entities = ner_pipeline(text) # Pretty print results for ent in entities: print(f"{ent['entity_group']:<10} | {ent['word']:<25} | score={ent['score']:.3f}") ``` **Example Output:** ``` Device set to use cpu Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation. SKILL | machine learning | score=1.000 SKILL | learning | score=1.000 SKILL | learn | score=0.758 SKILL | learning techniques | score=0.769 DISCIPLINE | engineering | score=1.000 COURSE | data scientists | score=0.954 SKILL | research | score=1.000 SKILL | ML | score=0.997 SKILL | research findings | score=1.000 SKILL | results | score=0.960 SKILL | programming | score=1.000 SKILL | Python | score=0.599 SKILL | machine learning | score=1.000 SKILL | learning | score=1.000 SKILL | learning | score=1.000 DISCIPLINE | engineering | score=1.000 SKILL | collaborative | score=0.900 SKILL | communication | score=1.000 SKILL | documentation skills | score=0.983 ``` --- ### Limitations * Trained on **synthetic job descriptions** → may not perfectly generalize to **real-world postings**. * Some ambiguity across entity classes (e.g., “AI” could be both `DISCIPLINE` and `SKILL`). * Supports **English** only. --- ### Future Work * Extend training with **real-world internship/job postings**. * Add entity types such as `CERTIFICATION`, `TOOLS`, `COMPANY`. * Benchmark against public job-posting datasets for NER. --- [More Information Needed]