pfrenee's picture
Add new SentenceTransformer model
8f5972e verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - dense
  - generated_from_trainer
  - dataset_size:814
  - loss:MultipleNegativesRankingLoss
base_model: sentence-transformers/all-distilroberta-v1
widget:
  - source_sentence: data modeling, predictive analytics, technical writing
    sentences:
      - >-
        experience in data engineeringStrong understanding of Datawarehousing
        conceptsProficient in Python for building UDFs and pre-processing
        scriptsProficient in sourcing data from APIs and cloud storage
        systemsProficient in SQL with analytical thought processExperience
        working on Airflow orchestrationMust have experience working on any of
        the cloud platforms - AWS would be preferredExperience with CI/CD tools
        in a python tech stackExperience working on Snowflake Datawarehouse
        would be nice to haveCompetent working in secured internal network
        environmentsExperience working in story and task-tracking tools for
        agile workflowsMotivated and Self-Starting: able to think critically
        about problems, decipher user preferences versus hard requirements, and
        effectively use online and onsite resources to find an appropriate
        solution with little interventionPassionate about writing clear,
        maintainable code that will be used and modified by others, and able to
        use and modify other developers’ work rather than recreate itBachelor’s
        Degree in related field
      - >-
        requirements and deliver innovative solutionsPerform data cleaning,
        preprocessing, and feature engineering to improve model
        performanceOptimize and fine-tune machine learning models for
        scalability and efficiencyEvaluate and improve existing ML algorithms,
        frameworks, and toolkitsStay up-to-date with the latest trends and
        advancements in the field of machine learning

        RequirementsBachelor's degree in Computer Science, Engineering, or a
        related fieldStrong knowledge of machine learning algorithms and data
        modeling techniquesProficiency in Python and its associated libraries
        such as TensorFlow, PyTorch, or scikit-learnExperience with big data
        technologies such as Hadoop, Spark, or Apache KafkaFamiliarity with
        cloud computing platforms such as AWS or Google CloudExcellent
        problem-solving and analytical skillsStrong communication and
        collaboration abilitiesAbility to work effectively in a fast-paced and
        dynamic environment
      - >-
        Qualifications


        3 to 5 years of experience in exploratory data analysisStatistics
        Programming, data modeling, simulation, and mathematics Hands on working
        experience with Python, SQL, R, Hadoop, SAS, SPSS, Scala, AWSModel
        lifecycle executionTechnical writingData storytelling and technical
        presentation skillsResearch SkillsInterpersonal SkillsModel
        DevelopmentCommunicationCritical ThinkingCollaborate and Build
        RelationshipsInitiative with sound judgementTechnical (Big Data
        Analysis, Coding, Project Management, Technical Writing, etc.)Problem
        Solving (Responds as problems and issues are identified)Bachelor's
        Degree in Data Science, Statistics, Mathematics, Computers Science,
        Engineering, or degrees in similar quantitative fields



        Desired Qualification(s)


        Master's Degree in Data Science, Statistics, Mathematics, Computer
        Science, or Engineering



        Hours: Monday - Friday, 8:00AM - 4:30PM


        Locations: 820 Follin Lane, Vienna, VA 22180 | 5510 Heritage Oaks Drive,
        Pensacola, FL 32526


        About Us


        You have goals, dreams, hobbies, and things you're passionate
        about—what's important to you is important to us. We're looking for
        people who not only want to do meaningful, challenging work, keep their
        skills sharp and move ahead, but who also take time for the things that
        matter to them—friends, family, and passions. And we're looking for team
        members who are passionate about our mission—making a difference in
        military members' and their families' lives. Together, we can make it
        happen. Don't take our word for it:

         Military Times 2022 Best for Vets Employers WayUp Top 100 Internship Programs Forbes® 2022 The Best Employers for New Grads Fortune Best Workplaces for Women Fortune 100 Best Companies to Work For® Computerworld® Best Places to Work in IT Ripplematch Campus Forward Award - Excellence in Early Career Hiring Fortune Best Place to Work for Financial and Insurance Services




        Disclaimers: Navy Federal reserves the right to fill this role at a
        higher/lower grade level based on business need. An assessment may be
        required to compete for this position. Job postings are subject to close
        early or extend out longer than the anticipated closing date at the
        hiring team’s discretion based on qualified applicant volume. Navy
        Federal Credit Union assesses market data to establish salary ranges
        that enable us to remain competitive. You are paid within the salary
        range, based on your experience, location and market position


        Bank Secrecy Act: Remains cognizant of and adheres to Navy Federal
        policies and procedures, and regulations pertaining to the Bank Secrecy
        Act.
  - source_sentence: >-
      Foreign Exchange analytics, cross-border payments expertise, financial
      services reporting
    sentences:
      - >-
        requirements gathering to recommend SAP solutions that drive data-driven
        decision-making and operational efficiency.


        Client Engagement And Advisory


        Build and maintain robust client relationships, serving as a trusted
        advisor on SAP Analytics capabilities and industry best
        practices.Address client challenges by aligning SAP Analytics solutions
        with their strategic goals, enhancing their analytical capabilities and
        reporting functions.


        Project Leadership And Management


        Oversee SAP Analytics implementation projects, ensuring timely delivery
        within scope and budget.Lead and inspire cross-functional teams,
        promoting collaboration and innovation to meet and exceed project
        objectives.


        Risk Management And Quality Assurance


        Proactively identify and address potential project risks, developing
        strategies to mitigate them and ensure project success.Uphold the
        highest standards of quality for all project deliverables, ensuring they
        meet Argano’s expectations and client requirements.


        Change Management And Training


        Facilitate effective change management processes associated with the
        implementation of SAP Analytics solutions, minimizing business
        disruption.Design and conduct comprehensive training sessions to empower
        clients with the knowledge and skills to leverage SAP Analytics
        solutions fully.


        Thought Leadership And Innovation


        Maintain up-to-date knowledge of the latest SAP Analytics developments,
        trends, and best practices, positioning Argano as a thought leader in
        the field.Foster a culture of continuous improvement by sharing insights
        and best practices with clients and internal teams.


        Minimum And/or Preferred Qualifications


        Education: Bachelor's or master's degree in Business Administration,
        Computer Science, Information Systems, Engineering, or a related
        field.Experience: Minimum of 5+ years in SAP consulting, with extensive
        experience in SAP Analytics Suite (which includes native SAP products,
        Google, Azure, AWS, and other cloud vendor products for SAP customers),
        SAP Analytics Cloud (SAC), SAP Datasphere/Data Warehousing Cloud, SAP
        Embedded Modeling.Certifications: SAP certifications in Analytics, SAC,
        Datasphere/DWC, or related areas are highly regarded.Skills:Profound
        expertise in SAP Analytics, SAP Analytics Suite (which includes native
        SAP products, Google, Azure, AWS, and other cloud vendor products for
        SAP customers), SAP Analytics Cloud (SAC), SAP Datasphere/Data
        Warehousing Cloud, SAP Embedded Modeling.Exceptional project management
        and leadership skills, capable of guiding teams through complex
        implementations.Excellent client engagement and communication skills,
        adept at establishing trust and acting as a strategic advisor.Strong
        capabilities in risk management, quality assurance, and change
        management.Travel required depending on the project.

        This position offers a unique chance to make a significant impact on our
        clients' success and to contribute to the growth and prestige of Argano
        as a global leader in digital consultancy. If you are a seasoned expert
        in SAP Data & Analytics with a passion for digital transformation and a
        proven track record of delivering results, we invite you to join our
        dynamic team.


        About Us


        Argano is the first of its kind: a digital consultancy totally immersed
        in high-performance operations. We steward enterprises through
        ever-evolving markets, empowering them with transformative strategies
        and technologies to exceed customer expectations, unlock commercial
        innovation, and drive optimal efficiency and growth.


        Argano is an equal-opportunity employer. All applicants will be
        considered for employment without regard to race, color, religion, sex,
        sexual orientation, gender identity, national origin, veteran status, or
        disability status.
      - >-
        experience in the industries we serve, and to partner with diverse teams
        of passionate, enterprising SVBers, dedicated to an inclusive approach
        to helping them grow and succeed at every stage of their business.


        Join us at SVB and be part of bringing our clients' world-changing ideas
        to life. At SVB, we have the opportunity to grow and collectively make
        an impact by supporting the innovative clients and communities SVB
        serves. We pride ourselves in having both a diverse client roster and an
        equally diverse and inclusive organization. And we work diligently to
        encourage all with different ways of thinking, different ways of
        working, and especially those traditionally underrepresented in
        technology and financial services, to apply.


        Responsibilities


        SVB’s Foreign Exchange business is one of the largest FX providers to
        the Innovation economy. We support the transactional and risk management
        needs of our fast-growing clients as they expand and do business
        internationally.


        Located close to one of our Hubs in SF, NYC or Raleigh and reporting to
        the Managing Director of FX Strategy, this Business Data Analyst will be
        an integral part of the Product Strategy and Business Management team,
        supporting and driving the insights that will be used to formulate,
        drive and validate our strategic and business effectiveness.You will take part in complex, multi-disciplinary projects to further
        enable the Product, Trading and Sales teams. You will be a fast learner
        who is comfortable in the weeds with analytics and data manipulation
        whilst developing the story for leadership.


        This role would be a great fit for a creative, curious and energetic
        individual and offers the right candidate the opportunity to grow while
        creating significant business value by continuously improving business
        intelligence/reporting, processes, procedures, and workflow.


        The ideal candidate will have 3-5 yrs experience in Financial Services
        or Fintech, preferably with FX, Trading or Cross Border Payment
        experience.requirements.Become familiar with the evolving FX, Fintech and Banking
        landscape to overlay industry insights.Drive continued evolution of our
        business analytics/data framework in order to inform MI and product
        evaluation.Assist with maintenance and accuracy of company data within
        SVB’s data repositories.


        Qualifications


        Basic Requirements:


        BS/BA Degree  preferably in a quantitative discipline (e.g., Economics,
        Mathematics, Statistics) or a HS Diploma or GED with equivalent work
        experience3-5 years’ experience in financial services or fintech,
        ideally within FX or Cross Border Payments


        Preferred Requirements:


        Strong attention to detail with an eye for data governance and
        compliance


        Aptitude for framing business questions in analytic terms and
        translating requirements into useful datasets and analyses with
        actionable insights.
      - >-
        experience, and job responsibilities, and does not encompass additional
        non-standard compensation (e.g., benefits, paid time off, per diem,
        etc.). Job Description:Work with Material Master product team to gather
        requirements, collect data, lead cleansing efforts and load/support data
        loads into SAP.Will need to bridge the gap between business and IT teams
        to document and set expectations of work/deliverables.Create and
        maintain trackers that show progress and hurdles to PM’s and
        stakeholders.Assist in go live of site including, collecting, cleansing
        and loading data into SAP system.Middleman between IT and business
        stakeholderAble to communicate data models.Knowledge in SAP and MDG is
        preferred.Years of experience: 2+ in data analytics spaceStrong
        communication skills are a must.Will be working on multiple high
        priority, high paced projects where attention to detail and organization
        is required.Intermediate to Senior position – great opportunity to learn
        an in-demand area of SAP MDG.Strong willingness to learn – no ceiling on
        learning and growth potential and plenty of work to go around. About
        BCforward:Founded in 1998 on the idea that industry leaders needed a
        professional service, and workforce management expert, to fuel the
        development and execution of core business and technology strategies,
        BCforward is a Black-owned firm providing unique solutions supporting
        value capture and digital product delivery needs for organizations
        around the world. Headquartered in Indianapolis, IN with an Offshore
        Development Center in Hyderabad, India, BCforward’s 6,000 consultants
        support more than 225 clients globally.BCforward champions the power of
        human potential to help companies transform, accelerate, and scale.
        Guided by our core values of People-Centric, Optimism, Excellence,
        Diversity, and Accountability, our professionals have helped our clients
        achieve their strategic goals for more than 25 years. Our strong culture
        and clear values have enabled BCforward to become a market leader and
        best in class place to work.BCforward is
  - source_sentence: data modeling, statistical analysis, data visualization tools
    sentences:
      - >-
        skills to translate the complexity of your work into tangible business
        goals 


        The Ideal Candidate is

         Customer first. You love the process of analyzing and creating, but also share our passion to do the right thing. You know at the end of the day it’s about making the right decision for our customers.  Innovative. You continually research and evaluate emerging technologies. You stay current on published state-of-the-art methods, technologies, and applications and seek out opportunities to apply them.  Creative. You thrive on bringing definition to big, undefined problems. You love asking questions and pushing hard to find answers. You’re not afraid to share a new idea.  A leader. You challenge conventional thinking and work with stakeholders to identify and improve the status quo. You’re passionate about talent development for your own team and beyond.  Technical. You’re comfortable with open-source languages and are passionate about developing further. You have hands-on experience developing data science solutions using open-source tools and cloud computing platforms.  Statistically-minded. You’ve built models, validated them, and backtested them. You know how to interpret a confusion matrix or a ROC curve. You have experience with clustering, classification, sentiment analysis, time series, and deep learning.  A data guru. “Big data” doesn’t faze you. You have the skills to retrieve, combine, and analyze data from a variety of sources and structures. You know understanding the data is often the key to great data science. 

        Basic Qualifications:

         Currently has, or is in the process of obtaining a Bachelor’s Degree plus 2 years of experience in data analytics, or currently has, or is in the process of obtaining Master’s Degree, or currently has, or is in the process of obtaining PhD, with an expectation that required degree will be obtained on or before the scheduled start dat  At least 1 year of experience in open source programming languages for large scale data analysis  At least 1 year of experience with machine learning  At least 1 year of experience with relational databases 

        Preferred Qualifications:

         Master’s Degree in “STEM” field (Science, Technology, Engineering, or Mathematics) plus 3 years of experience in data analytics, or PhD in “STEM” field (Science, Technology, Engineering, or Mathematics)  At least 1 year of experience working with AWS  At least 2 years’ experience in Python, PyTorch, Scala, or R  At least 2 years’ experience with machine learning  At least 2 years’ experience with SQL  At least 2 years' experience working with natural language processing 

        Capital One will consider sponsoring a new qualified applicant for
        employment authorization for this position.


        The minimum and maximum full-time annual salaries for this role are
        listed below, by location. Please note that this salary information is
        solely for candidates hired to perform work within one of these
        locations, and refers to the amount Capital One is willing to pay at the
        time of this posting. Salaries for part-time roles will be prorated
        based upon the agreed upon number of hours to be regularly worked.


        New York City (Hybrid On-Site): $138,500 - $158,100 for Data Science
        Masters


        San Francisco, California (Hybrid On-site): $146,700 - $167,500 for Data
        Science Masters


        Candidates hired to work in other locations will be subject to the pay
        range associated with that location, and the actual annualized salary
        amount offered to any candidate at the time of hire will be reflected
        solely in the candidate’s offer letter.


        This role is also eligible to earn performance based incentive
        compensation, which may include cash bonus(es) and/or long term
        incentives (LTI). Incentives could be discretionary or non discretionary
        depending on the plan.


        Capital One offers a comprehensive, competitive, and inclusive set of
        health, financial and other benefits that support your total well-being.
        Learn more at the Capital One Careers website . Eligibility varies based
        on full or part-time status, exempt or non-exempt status, and management
        level.


        This role is expected to accept applications for a minimum of 5 business
        days.No agencies please. Capital One is 


        If you have visited our website in search of information on employment
        opportunities or to apply for a position, and you require an
        accommodation, please contact Capital One Recruiting at 1-800-304-9102
        or via email at RecruitingAccommodation@capitalone.com . All information
        you provide will be kept confidential and will be used only to the
        extent required to provide needed reasonable accommodations.


        For technical support or questions about Capital One's recruiting
        process, please send an email to Careers@capitalone.com


        Capital One does not provide, endorse nor guarantee and is not liable
        for third-party products, services, educational tools or other
        information available through this site.


        Capital One Financial is made up of several different entities. Please
        note that any position posted in Canada is for Capital One Canada, any
        position posted in the United Kingdom is for Capital One Europe and any
        position posted in the Philippines is for Capital One Philippines
        Service Corp. (COPSSC).
      - >-
        experienced team that caters to niche skills demands for customers
        across various technologies and verticals.
         Role Description
         This is a full-time on-site role for a Data Engineer at Computer Data Concepts, Inc. The Data Engineer will be responsible for day-to-day tasks related to data engineering, data modeling, ETL (Extract Transform Load), data warehousing, and data analytics. The role requires expertise in handling and manipulating large datasets, designing and maintaining databases, and implementing efficient data processing systems.
         Qualifications
         Data Engineering skillsData Modeling skillsETL (Extract Transform Load) skillsData Warehousing skillsData Analytics skillsStrong analytical and problem-solving abilitiesProficiency in programming languages such as Python or SQLExperience with cloud-based data platforms like AWS or AzureKnowledge of data visualization tools like Tableau or PowerBIExcellent communication and teamwork skillsBachelor's degree in Computer Science, Data Science, or a related fieldRelevant certifications in data engineering or related areas
      - >-
        requirements.

         Qualifications
         
        Strong analytical skills, with experience in data analysis and
        statistical techniquesProficiency in data modeling and data
        visualization toolsExcellent communication skills, with the ability to
        effectively convey insights to stakeholdersExperience in business
        analysis and requirements analysisProject management skillsDatabase
        administration knowledgeBackground in Data Analytics and
        StatisticsExperience with Big Data technologies like Hadoop
  - source_sentence: ETL development, data modelling, DBT framework
    sentences:
      - |-
        Qualifications
         Strong knowledge in Pattern Recognition and Neural NetworksProficiency in Computer Science and StatisticsExperience with Algorithms and Data StructuresHands-on experience in machine learning frameworks and librariesFamiliarity with cloud platforms and big data technologiesExcellent problem-solving and analytical skillsStrong programming skills in languages such as Python or RGood communication and collaboration skillsMaster's or PhD in Computer Science, Data Science, or a related field
      - >-
        skills as well as strong leadership qualities.


        This position is eligible for the TalentQuest employee referral program.
        If an employee referred you for this job, please apply using the
        system-generated link that was sent to you.


        Responsibilities


        Design, develop, and evaluate large and complex predictive models and
        advanced algorithms Test hypotheses/models, analyze, and interpret
        resultsDevelop actionable insights and recommendationsDevelop and code
        complex software programs, algorithms, and automated processesUse
        evaluation, judgment, and interpretation to select right course of
        actionWork on problems of diverse scope where analysis of information
        requires evaluation of identifiable factorsProduce innovative solutions
        driven by exploratory data analysis from complex and high-dimensional
        datasetsTransform data into charts, tables, or format that aids
        effective decision makingUtilize effective written and verbal
        communication to document analyses and present findings analyses to a
        diverse audience of stakeholders Develop and maintain strong working
        relationships with team members, subject matter experts, and leadersLead
        moderate to large projects and initiativesModel best practices and
        ethical AIWorks with senior management on complex issuesAssist with the
        development and enhancement practices, procedures, and instructionsServe
        as technical resource for other team membersMentor lower levels



        Qualifications


        6+ years of experience with requisite competenciesFamiliar with
        analytical frameworks used to support the pricing of lending
        productsFamiliar with analytical models/analysis used to support credit
        card underwriting and account management underwriting policiesFamiliar
        using GitHub for documentation and code collaboration purposesComplete
        knowledge and full understanding of specializationStatistics, machine
        learning , data mining, data auditing, aggregation, reconciliation, and
        visualizationProgramming, data modeling, simulation, and advanced
        mathematics SQL, R, Python, Hadoop, SAS, SPSS, Scala, AWSModel lifecycle
        executionTechnical writingData storytelling and technical presentation
        skillsResearch SkillsInterpersonal SkillsAdvanced knowledge of
        procedures, instructions and validation techniquesModel
        DevelopmentCommunicationCritical ThinkingCollaborate and Build
        RelationshipsInitiative with sound judgementTechnical (Big Data
        Analysis, Coding, Project Management, Technical Writing,
        etc.)Independent JudgmentProblem Solving (Identifies the constraints and
        risks)Bachelor's Degree in Data Science, Statistics, Mathematics,
        Computers Science, Engineering, or degrees in similar quantitative
        fields



        Desired Qualification(s)


        Master's/PhD Degree in Data Science, Statistics, Mathematics, Computers
        Science, or Engineering



        Hours: Monday - Friday, 8:00AM - 4:30PM


        Location: 820 Follin Lane, Vienna, VA 22180


        About Us


        You have goals, dreams, hobbies, and things you're passionate
        about—what's important to you is important to us. We're looking for
        people who not only want to do meaningful, challenging work, keep their
        skills sharp and move ahead, but who also take time for the things that
        matter to them—friends, family, and passions. And we're looking for team
        members who are passionate about our mission—making a difference in
        military members' and their families' lives. Together, we can make it
        happen. Don't take our word for it:

         Military Times 2022 Best for Vets Employers WayUp Top 100 Internship Programs Forbes® 2022 The Best Employers for New Grads Fortune Best Workplaces for Women Fortune 100 Best Companies to Work For® Computerworld® Best Places to Work in IT Ripplematch Campus Forward Award - Excellence in Early Career Hiring Fortune Best Place to Work for Financial and Insurance Services




        Disclaimers: Navy Federal reserves the right to fill this role at a
        higher/lower grade level based on business need. An assessment may be
        required to compete for this position. Job postings are subject to close
        early or extend out longer than the anticipated closing date at the
        hiring team’s discretion based on qualified applicant volume. Navy
        Federal Credit Union assesses market data to establish salary ranges
        that enable us to remain competitive. You are paid within the salary
        range, based on your experience, location and market position


        Bank Secrecy Act: Remains cognizant of and adheres to Navy Federal
        policies and procedures, and regulations pertaining to the Bank Secrecy
        Act.
      - >-
        requirements and data mapping documents into a technical design.Develop,
        enhance, and maintain code following best practices and
        standards.Execute unit test plans and support regression/system
        testing.Debug and troubleshoot issues found during testing or
        production.Communicate project status, issues, and blockers with the
        team.Contribute to continuous improvement by identifying and addressing
        opportunities.

        Qualifications / Skills:Minimum of 5 years of experience in ETL/ELT
        development within a Data Warehouse.Understanding of enterprise data
        warehousing best practices and standards.Familiarity with DBT
        framework.Comfortable with git fundamentals change management.Minimum of
        5 years of experience in ETL development.Minimum of 5 years of
        experience writing SQL queries.Minimum of 2 years of experience with
        Python.Minimum of 3 years of cloud experience with AWS, Azure or
        Google.Experience in P&C Insurance or Financial Services Industry
        preferred.Understanding of data warehousing best practices and
        standards.Experience in software engineering, including designing and
        developing systems.

        Education and/or Experience:Required knowledge & skills would typically
        be acquired through a bachelor’s degree in computer sciences or 5 or
        more years of related experience in ELT and/or Analytics Engineering
  - source_sentence: Data engineering, ETL workflows, cloud-based data solutions
    sentences:
      - >-
        Qualifications and Skills Education: Bachelor's degree in Computer
        Science or a related field. Experience: 5+ years in Software Engineering
        with a focus on Data Engineering. Technical Proficiency: Expertise in
        Python; familiarity with JavaScript and Java is beneficial. Proficient
        in SQL (Postgres, Presto/Trino dialects), ETL workflows, and workflow
        orchestration systems (e.g. Airflow, Prefect). Knowledge of modern data
        file formats (e.g. Parquet, Avro, ORC) and Python data tools (e.g.
        pandas, Dask, Ray). Cloud and Data Solutions: Experience in building
        cloud-based Data Warehouse/Data Lake solutions (AWS Athena, Redshift,
        Snowflake) and familiarity with AWS cloud services and
        infrastructure-as-code tools (CDK, Terraform). Communication Skills:
        Excellent communication and presentation skills, fluent in English. Work
        Authorization: Must be authorized to work in the US. 

        Work Schedule Hybrid work schedule: Minimum 3 days per week in the San
        Francisco office (M/W/Th), with the option to work remotely 2 days per
        week. 

        Salary Range: $165,000-$206,000 base depending on experience 

        Bonus: Up to 20% annual performance bonus 

        Generous benefits package: Fully paid healthcare, monthly reimbursements
        for gym, commuting, cell phone & home wifi.
      - >-
        experience with Transformers

        Need to be 8+ year's of work experience. 

        We need a Data Scientist with demonstrated expertise in training and
        evaluating transformers such as BERT and its derivatives.

        Required: Proficiency with Python, pyTorch, Linux, Docker, Kubernetes,
        Jupyter. Expertise in Deep Learning, Transformers, Natural Language
        Processing, Large Language Models

        Preferred: Experience with genomics data, molecular genetics.
        Distributed computing tools like Ray, Dask, Spark
      - >-
        Experience with LLMs and PyTorch: Extensive experience with large
        language models and proficiency in PyTorch.Expertise in Parallel
        Training and GPU Cluster Management: Strong background in parallel
        training methods and managing large-scale training jobs on GPU
        clusters.Analytical and Problem-Solving Skills: Ability to address
        complex challenges in model training and optimization.Leadership and
        Mentorship Capabilities: Proven leadership in guiding projects and
        mentoring team members.Communication and Collaboration Skills: Effective
        communication skills for conveying technical concepts and collaborating
        with cross-functional teams.Innovation and Continuous Learning: Passion
        for staying updated with the latest trends in AI and machine learning.


        What We Offer


        Market competitive and pay equity-focused compensation structure100%
        paid health insurance for employees with 90% coverage for
        dependentsAnnual lifestyle wallet for personal wellness, learning and
        development, and more!Lifetime maximum benefit for family forming and
        fertility benefitsDedicated mental health support for employees and
        eligible dependentsGenerous time away including company holidays, paid
        time off, sick time, parental leave, and more!Lively office environment
        with catered meals, fully stocked kitchens, and geo-specific commuter
        benefits


        Base pay for the successful applicant will depend on a variety of
        job-related factors, which may include education, training, experience,
        location, business needs, or market demands. The expected salary range
        for this role is based on the location where the work will be performed
        and is aligned to one of 3 compensation zones. This role is also
        eligible to participate in a Robinhood bonus plan and Robinhood’s equity
        plan. For other locations not listed, compensation can be discussed with
        your recruiter during the interview process.


        Zone 1 (Menlo Park, CA; New York, NY; Bellevue, WA; Washington, DC)


        $187,000—$220,000 USD


        Zone 2 (Denver, CO; Westlake, TX; Chicago, IL)


        $165,000—$194,000 USD


        Zone 3 (Lake Mary, FL)


        $146,000—$172,000 USD


        Click Here To Learn More About Robinhood’s Benefits.


        We’re looking for more growth-minded and collaborative people to be a
        part of our journey in democratizing finance for all. If you’re ready to
        give 100% in helping us achieve our mission—we’d love to have you apply
        even if you feel unsure about whether you meet every single requirement
        in this posting. At Robinhood, we're looking for people invigorated by
        our mission, values, and drive to change the world, not just those who
        simply check off all the boxes.


        Robinhood embraces a diversity of backgrounds and experiences and
        provides equal opportunity for all applicants and employees. We are
        dedicated to building a company that represents a variety of
        backgrounds, perspectives, and skills. We believe that the more
        inclusive we are, the better our work (and work environment) will be for
        everyone. Additionally, Robinhood provides reasonable accommodations for
        candidates on request and respects applicants' privacy rights. To review
        Robinhood's Privacy Policy please review the specific policy applicable
        to your country.
datasets:
  - pfrenee/ai_alignment
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - cosine_accuracy
model-index:
  - name: SentenceTransformer based on sentence-transformers/all-distilroberta-v1
    results:
      - task:
          type: triplet
          name: Triplet
        dataset:
          name: ai job validation
          type: ai-job-validation
        metrics:
          - type: cosine_accuracy
            value: 0.9801980257034302
            name: Cosine Accuracy
      - task:
          type: triplet
          name: Triplet
        dataset:
          name: ai job test
          type: ai-job-test
        metrics:
          - type: cosine_accuracy
            value: 0.9708737730979919
            name: Cosine Accuracy

SentenceTransformer based on sentence-transformers/all-distilroberta-v1

This is a sentence-transformers model finetuned from sentence-transformers/all-distilroberta-v1 on the ai_alignment dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'RobertaModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("pfrenee/distilroberta_ai_alignment")
# Run inference
queries = [
    "Data engineering, ETL workflows, cloud-based data solutions",
]
documents = [
    "Qualifications and Skills Education: Bachelor's degree in Computer Science or a related field. Experience: 5+ years in Software Engineering with a focus on Data Engineering. Technical Proficiency: Expertise in Python; familiarity with JavaScript and Java is beneficial. Proficient in SQL (Postgres, Presto/Trino dialects), ETL workflows, and workflow orchestration systems (e.g. Airflow, Prefect). Knowledge of modern data file formats (e.g. Parquet, Avro, ORC) and Python data tools (e.g. pandas, Dask, Ray). Cloud and Data Solutions: Experience in building cloud-based Data Warehouse/Data Lake solutions (AWS Athena, Redshift, Snowflake) and familiarity with AWS cloud services and infrastructure-as-code tools (CDK, Terraform). Communication Skills: Excellent communication and presentation skills, fluent in English. Work Authorization: Must be authorized to work in the US. \nWork Schedule Hybrid work schedule: Minimum 3 days per week in the San Francisco office (M/W/Th), with the option to work remotely 2 days per week. \nSalary Range: $165,000-$206,000 base depending on experience \nBonus: Up to 20% annual performance bonus \nGenerous benefits package: Fully paid healthcare, monthly reimbursements for gym, commuting, cell phone & home wifi.",
    "Experience with LLMs and PyTorch: Extensive experience with large language models and proficiency in PyTorch.Expertise in Parallel Training and GPU Cluster Management: Strong background in parallel training methods and managing large-scale training jobs on GPU clusters.Analytical and Problem-Solving Skills: Ability to address complex challenges in model training and optimization.Leadership and Mentorship Capabilities: Proven leadership in guiding projects and mentoring team members.Communication and Collaboration Skills: Effective communication skills for conveying technical concepts and collaborating with cross-functional teams.Innovation and Continuous Learning: Passion for staying updated with the latest trends in AI and machine learning.\n\nWhat We Offer\n\nMarket competitive and pay equity-focused compensation structure100% paid health insurance for employees with 90% coverage for dependentsAnnual lifestyle wallet for personal wellness, learning and development, and more!Lifetime maximum benefit for family forming and fertility benefitsDedicated mental health support for employees and eligible dependentsGenerous time away including company holidays, paid time off, sick time, parental leave, and more!Lively office environment with catered meals, fully stocked kitchens, and geo-specific commuter benefits\n\nBase pay for the successful applicant will depend on a variety of job-related factors, which may include education, training, experience, location, business needs, or market demands. The expected salary range for this role is based on the location where the work will be performed and is aligned to one of 3 compensation zones. This role is also eligible to participate in a Robinhood bonus plan and Robinhood’s equity plan. For other locations not listed, compensation can be discussed with your recruiter during the interview process.\n\nZone 1 (Menlo Park, CA; New York, NY; Bellevue, WA; Washington, DC)\n\n$187,000—$220,000 USD\n\nZone 2 (Denver, CO; Westlake, TX; Chicago, IL)\n\n$165,000—$194,000 USD\n\nZone 3 (Lake Mary, FL)\n\n$146,000—$172,000 USD\n\nClick Here To Learn More About Robinhood’s Benefits.\n\nWe’re looking for more growth-minded and collaborative people to be a part of our journey in democratizing finance for all. If you’re ready to give 100% in helping us achieve our mission—we’d love to have you apply even if you feel unsure about whether you meet every single requirement in this posting. At Robinhood, we're looking for people invigorated by our mission, values, and drive to change the world, not just those who simply check off all the boxes.\n\nRobinhood embraces a diversity of backgrounds and experiences and provides equal opportunity for all applicants and employees. We are dedicated to building a company that represents a variety of backgrounds, perspectives, and skills. We believe that the more inclusive we are, the better our work (and work environment) will be for everyone. Additionally, Robinhood provides reasonable accommodations for candidates on request and respects applicants' privacy rights. To review Robinhood's Privacy Policy please review the specific policy applicable to your country.",
    "experience with Transformers\nNeed to be 8+ year's of work experience. \nWe need a Data Scientist with demonstrated expertise in training and evaluating transformers such as BERT and its derivatives.\nRequired: Proficiency with Python, pyTorch, Linux, Docker, Kubernetes, Jupyter. Expertise in Deep Learning, Transformers, Natural Language Processing, Large Language Models\nPreferred: Experience with genomics data, molecular genetics. Distributed computing tools like Ray, Dask, Spark",
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 768] [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[0.4493, 0.0204, 0.0266]])

Evaluation

Metrics

Triplet

Metric ai-job-validation ai-job-test
cosine_accuracy 0.9802 0.9709

Training Details

Training Dataset

ai_alignment

  • Dataset: ai_alignment at bb2b8ee
  • Size: 814 training samples
  • Columns: query, job_description_pos, and job_description_neg
  • Approximate statistics based on the first 814 samples:
    query job_description_pos job_description_neg
    type string string string
    details
    • min: 8 tokens
    • mean: 14.97 tokens
    • max: 41 tokens
    • min: 7 tokens
    • mean: 349.01 tokens
    • max: 512 tokens
    • min: 7 tokens
    • mean: 347.16 tokens
    • max: 512 tokens
  • Samples:
    query job_description_pos job_description_neg
    Python design patterns, Snowflake data warehousing, AWS data pipeline optimization Requirements:
    - Good communication; and problem-solving abilities- Ability to work as an individual contributor; collaborating with Global team- Strong experience with Data Warehousing- OLTP, OLAP, Dimension, Facts, Data Modeling- Expertise implementing Python design patterns (Creational, Structural and Behavioral Patterns)- Expertise in Python building data application including reading, transforming; writing data sets- Strong experience in using boto3, pandas, numpy, pyarrow, Requests, Fast API, Asyncio, Aiohttp, PyTest, OAuth 2.0, multithreading, multiprocessing, snowflake python connector; Snowpark- Experience in Python building data APIs (Web/REST APIs)- Experience with Snowflake including SQL, Pipes, Stream, Tasks, Time Travel, Data Sharing, Query Optimization- Experience with Scripting language in Snowflake including SQL Stored Procs, Java Script Stored Procedures; Python UDFs- Understanding of Snowflake Internals; experience in integration with Reporting; UI applications- Stron...
    QUALIFICATIONS Required Certifications DoD IAT Level III Certification (Must obtain within 180 days of hire). Education, Background, and Years of Experience 3-5 years of Data Analyst experience. ADDITIONAL SKILLS & QUALIFICATIONS Required Skills At least 3 years of hands-on experience with query languages, such as SQL and Kusto to facilitate robust reporting capabilities. Preferred Skills Understanding of Microsoft Power Platform. Power BI authoring, in combination with designing and integrating with data sources. Tier III, Senior Level Experience with Kusto Query Language (KQL). Tier III, Senior Level Experience with Structured Query Language (SQL). WORKING CONDITIONS Environmental Conditions Contractor site with 0%-10% travel possible. Possible off-hours work to support releases and outages. General office environment. Work is generally sedentary in nature but may require standing and walking for up to 10% of the time. The working environment is generally favorable. Lighting and temp...
    Data Science in Marketing, Customer LTV Modeling, Experimentation Frameworks experience. You are comfortable with a range of statistical and ML techniques with the ability to apply them to deliver measurable business impact at Turo.

    You’re someone who constantly thinks about how data can support Turo’s work across domains, actively utilizing it to work-through challenges and unlock new opportunities. You’re proficient in translating unstructured problems into tangible mathematical frameworks, and are able to bring others with you on that journey. You’re someone who enjoys working with business stakeholders to drive experimentation and foster a data-centric culture. You’re able to recognize the right tools for each problem and design solutions that scale the impact of your work. You have a passion for contributing to a best in class product and take ownership of your work from inception to implementation and beyond.

    What You Will Do

    Turo’s marketplace has enjoyed continued growth as a business, which has in part been achieved through significant Marketing inv...
    requirements.Prepares and presents results of analysis along with improvements and/or recommendations to the business at all levels of management.Coordinates with global sourcing team and peers to aggregate data align reporting.Maintain data integrity of databases and make changes as required to enhance accuracy, usefulness and access.Acts as a Subject Matter Expert (SME) for key systems/processes in subject teams and day-to-day functions.Develops scenario planning tools/models (exit/maintain/grow). Prepares forecasts and analyzes trends in general business conditions.Request for Proposal (RFP) activities – inviting suppliers to participate in RFP, loading RFP into Sourcing tool, collecting RFP responses, conducting qualitative and quantitative analyses.Assists Sourcing Leads in maintaining pipeline, reports on savings targets.
    Qualifications:Bachelors Degree is required.Minimum of 4 years of relevant procurement analyst experience.Advanced Excel skills are required.C.P.M., C.P.S.M., o...
    education workforce data analysis R Tableau experience as an SME in complex enterprise-level projects, 5+ years of experience analyzing info and statistical data to prepare reports and studies for professional use, and experience working with education and workforce data.
    If you’re interested, I'll gladly provide more details about the role and further discuss your qualifications.
    Thanks,Stephen M HrutkaPrincipal Consultantwww.hruckus.com
    Executive Summary: HRUCKUS is looking to hire a Data Analyst resource to provide data analysis and management support. The Data Analyst must have at least 10 years of overall experience.
    Position Description: The role of the Data Analyst is to provide data analysis support for the Office of Education Through Employment Pathways, which is located within the Office of the Deputy Mayor for Education. This is a highly skilled position requiring familiarity with educational data and policies.
    The position will require the resources to produce data analysis, focusing on education and workforce-relate...
    Experience of Delta Lake, DWH, Data Integration, Cloud, Design and Data Modelling.• Proficient in developing programs in Python and SQL• Experience with Data warehouse Dimensional data modeling.• Working with event based/streaming technologies to ingest and process data.• Working with structured, semi structured and unstructured data.• Optimize Databricks jobs for performance and scalability to handle big data workloads. • Monitor and troubleshoot Databricks jobs, identify and resolve issues or bottlenecks. • Implement best practices for data management, security, and governance within the Databricks environment. Experience designing and developing Enterprise Data Warehouse solutions.• Proficient writing SQL queries and programming including stored procedures and reverse engineering existing process.• Perform code reviews to ensure fit to requirements, optimal execution patterns and adherence to established standards.
    Qualifications:
    • 5+ years Python coding experience.• 5+ years - SQL...
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Evaluation Dataset

ai_alignment

  • Dataset: ai_alignment at bb2b8ee
  • Size: 101 evaluation samples
  • Columns: query, job_description_pos, and job_description_neg
  • Approximate statistics based on the first 101 samples:
    query job_description_pos job_description_neg
    type string string string
    details
    • min: 10 tokens
    • mean: 14.79 tokens
    • max: 23 tokens
    • min: 61 tokens
    • mean: 366.96 tokens
    • max: 512 tokens
    • min: 27 tokens
    • mean: 372.63 tokens
    • max: 512 tokens
  • Samples:
    query job_description_pos job_description_neg
    Statistical programming SAS, clinical development, AAV gene therapy QUALIFICATIONS:

    Education:

    12 years of related experience with a Bachelor’s degree; or 8 years and a Master’s degree; or a PhD with 5 years experience; or equivalent experience

    Experience:

    Work experience in biotech/pharmaceutical industry or medical research for a minimum of 8 years (or 4 years for a PhD with relevant training)Experience in clinical developmentExperience in ophthalmology and/or biologic/gene therapy a plus

    Skills:

    Strong SAS programming skills required with proficiency in SAS/BASE, SAS Macros, SAS/Stat and ODS (proficiency in SAS/SQL, SAS/GRAPH or SAS/ACCESS is a plus)Proficiency in R programming a plusProficiency in Microsoft Office Apps, such as WORD, EXCEL, and PowerPoint (familiar with the “Chart” features in EXCEL/PowerPoint a plus)Good understanding of standards specific to clinical trials such as CDISC, SDTM, and ADaM, MedDRA, WHODRUGExperience with all clinical phases (I, II, III, and IV) is desirableExperience with BLA/IND submissions is strongly desir...
    requirements may change at any time.

    Qualifications

    Qualification:
    • BS degree in Computer Science, Computer Engineering or other relevant majors.
    • Excellent programming, debugging, and optimization skills in general purpose programming languages
    • Ability to think critically and to formulate solutions to problems in a clear and concise way.

    Preferred Qualifications:
    • Experience with one or more general purpose programming languages including but not limited to: Go, C/C++, Python.
    • Good understanding in one of the following domains: ad fraud detection, risk control, quality control, adversarial engineering, and online advertising systems.
    • Good knowledge in one of the following areas: machine learning, deep learning, backend, large-scale systems, data science, full-stack.

    TikTok is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe and so does our workpla...
    ETL pipeline design, bulk data solutions, classified environments Skills & Experience:Must hold a TS/SCI Full Scope Polygraph clearance, and have experience working in classified environments.Professional experience with Python and a JVM language (e.g., Scala) 4+ years of experience designing and maintaining ETL pipelines Experience using Apache SparkExperience with SQL (e.g., Postgres) and NoSQL (e.g., Cassandra, ElasticSearch, etc.)databases Experience working on a cloud platform like GCP, AWS, or Azure Experience working collaboratively with git
    Desired Skills & Experience:Understanding of Docker/Kubernetes Understanding of or interest in knowledge graphsExperienced in supporting and working with internal teams and customers in a dynamic environment Passionate about open source development and innovative technology
    Benefits: Limitless growth and learning opportunitiesA collaborative and positive culture - your team will be as smart and driven as youA strong commitment to diversity, equity & inclusionExceedingly generous vacation leave, parental l...
    experience with all aspects of the software development lifecycle, from design to deployment. Demonstrate understanding of the full life data lifecycle and the role that high-quality data plays across applications, machine learning, business analytics, and reporting. Lead and take ownership of assigned technical projects in a fast-paced environment.
    What you need to succeed (minimum qualifications)3-5+ years of experienceFamiliar with best practices for data ingestion and data designDevelop initial queries for profiling data, validating analysis, testing assumptions, driving data quality assessment specifications, and define a path to deploymentIdentify necessary business rules for extracting data along with functional or technical risks related to data sources (e.g. data latency, frequency, etc.)Knowledge of working with queries/applications, including performance tuning, utilizing indexes, and materialized views to improve query performanceContinuously improve quality, efficiency, a...
    Provider data analysis, healthcare compliance, business process improvement requirements of health plan as it pertains to contracting, benefits, prior authorizations, fee schedules, and other business requirements.
    •Analyze and interpret data to determine appropriate configuration changes.• Accurately interprets specific state and/or federal benefits, contracts as well as additional business requirements and converting these terms to configuration parameters.• Oversees coding, updating, and maintaining benefit plans, provider contracts, fee schedules and various system tables through the user interface.• Applies previous experience and knowledge to research and resolve claim/encounter issues, pended claims and update system(s) as necessary.• Works with fluctuating volumes of work and can prioritize work to meet deadlines and needs of user community.• Provides analytical, problem-solving foundation including definition and documentation, specifications.• Recognizes, identifies and documents changes to existing business processes and identifies new opportunities...
    experience.Required Skills: ADF pipelines, SQL, Kusto, Power BI, Cosmos (Scope Scripts). Power Bi, ADX (Kusto), ADF, ADO, Python/C#.Good to have – Azure anomaly Alerting, App Insights, Azure Functions, Azure FabricQualifications for the role 5+ years experience building and optimizing ‘big data’ data pipelines, architectures and data sets. Specific experience working with COSMOS and Scope is required for this role. Experience working with relational databases, query authoring (SQL) as well as working familiarity with a variety of databases is a plus. Experience with investigating and on-boarding new data sources in a big-data environment, including forming relationships with data engineers cross-functionally to permission, mine and reformat new data sets. Strong analytic skills related to working with unstructured data sets. A successful history of manipulating, processing and extracting value from large disconnected datasets.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • learning_rate: 1e-05
  • num_train_epochs: 6
  • warmup_ratio: 0.1
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 1e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 6
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss Validation Loss ai-job-validation_cosine_accuracy ai-job-test_cosine_accuracy
-1 -1 - - 0.8614 -
1.9608 100 0.848 0.3421 0.9802 -
3.9216 200 0.3142 0.3138 0.9802 -
5.8824 300 0.1828 0.3009 0.9802 -
-1 -1 - - 0.9802 0.9709

Framework Versions

  • Python: 3.12.11
  • Sentence Transformers: 5.1.0
  • Transformers: 4.55.4
  • PyTorch: 2.8.0
  • Accelerate: 1.10.1
  • Datasets: 4.0.0
  • Tokenizers: 0.21.4

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}