Add new SentenceTransformer model.

Browse files

Files changed (11) hide show

1_Pooling/config.json +10 -0
README.md +792 -0
config.json +31 -0
config_sentence_transformers.json +10 -0
model.safetensors +3 -0
modules.json +20 -0
sentence_bert_config.json +4 -0
special_tokens_map.json +37 -0
tokenizer.json +0 -0
tokenizer_config.json +64 -0
vocab.txt +0 -0

1_Pooling/config.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "word_embedding_dimension": 384,
+  "pooling_mode_cls_token": true,
+  "pooling_mode_mean_tokens": false,
+  "pooling_mode_max_tokens": false,
+  "pooling_mode_mean_sqrt_len_tokens": false,
+  "pooling_mode_weightedmean_tokens": false,
+  "pooling_mode_lasttoken": false,
+  "include_prompt": true
+}

README.md ADDED Viewed

	@@ -0,0 +1,792 @@

+---
+base_model: marroyo777/bge-99GPT-v1
+library_name: sentence-transformers
+metrics:
+- cosine_accuracy
+- dot_accuracy
+- manhattan_accuracy
+- euclidean_accuracy
+- max_accuracy
+pipeline_tag: sentence-similarity
+tags:
+- sentence-transformers
+- sentence-similarity
+- feature-extraction
+- generated_from_trainer
+- dataset_size:1416
+- loss:MultipleNegativesRankingLoss
+widget:
+- source_sentence: 'Who wrote the blog Sprint 7: Iterating upon iterations?'
+  sentences:
+  - 'Title: From Insights to Impact: 99P Labs Collaborates with BDAA to Foster Data
+    Visualization Talent
+    Published: March, 2023
+    Author(s): Ryan Lingo
+    Claps: 0
+    Comments: 0
+    Word Count: 1537
+    URL: https://medium.com/99p-labs/from-insights-to-impact-99p-labs-collaborates-with-bdaa-to-foster-data-visualization-talent-26e22a76d1df
+    In the spring of 2023, 99P Labs sponsored a data visualization challenge in collaboration
+    with BDAA, the Big Data and Analytics Association at Ohio State University. The
+    challenge lasted two weeks and began with a kickoff event where the 99P Labs team
+    went to the weekly Tuesday BDAA meeting and laid out the motivation and starting
+    guardrails for the challenge. The challenge allowed 99P Labs to connect with the
+    next generation of data professionals and support their growth and development.
+    The data visualization challenge was open to all BDAA members and lasted two weeks.
+    The teams used a wide range of tools and software to create their visualizations
+    and dashboards, including Streamlit, Plotly Dash, and Tableau. The winning entries
+    were highlighted, and the challenge was a valuable experience for 99P Labs. The
+    challenge was not just an opportunity for the students to learn, but it was also
+    an opportunity for 99P Labs to connect with the next generation of data professionals
+    and help build their developer community. The collaboration with BDAA was strengthened,
+    and they look forward to continuing this collaboration in the future.'
+  - 'Title: Sprint 7: Iterating upon iterations
+    Published: October, 2023
+    Author(s): 2023 99P Labs x CMU MHCI Capstone Team
+    Claps: 24
+    Comments: 0
+    Word Count: 985
+    URL: https://medium.com/99p-labs/sprint-7-iterating-upon-iterations-34cc621a5aeb
+    The 99P Labs x CMU MHCI Capstone Team is part of the Master of Human-Computer
+    Interaction (MHCI) program at Carnegie Mellon University. The team started off
+    with a blank canvas and ran design sessions with 3 Gen Z participants to shape
+    their mobile mentor to fit their learning needs. They found that the activities
+    people wanted to perform in cars fell under a few main hierarchies and came up
+    with a set of 3 scenarios to test out the different roles that Gen Zers expect
+    from the mobile mentor. The team then moved from more generative to evaluative
+    testing and decided to focus on the tutor scenario, making use of the unique moving
+    environment of a vehicle. They also made use of their clients'' expertise in vehicle
+    HCI design to conduct testing sessions. The team is looking forward to shaping
+    the future of learning on-the-go in their last few iterations.'
+  - 'Title: 99P Labs 2022 Data I/O Recap
+    Published: November, 2022
+    Author(s): Ryan Lingo
+    Claps: 259
+    Comments: 0
+    Word Count: 1021
+    URL: https://medium.com/99p-labs/99p-labs-2022-data-i-o-recap-7c710fbe28e6
+    The blog post discusses the 99P Labs 2022 Data I/O Recap, which took place at
+    The Ohio State University. The event included 50 students participating in 12
+    teams, with 99P Labs sponsoring and offering a challenge for the participants.
+    Despite varying skill levels, the atmosphere remained friendly and inclusive.
+    The event allowed for more personal interaction and submissions from all teams,
+    resulting in impressive insights and visuals. The winning teams were determined
+    by a team of 99P Labs and OSU faculty. Overall, the author expresses their enjoyment
+    and the inspiring energy of the event. For more information, readers are encouraged
+    to visit the 99P Labs blog post.'
+- source_sentence: What is the safety system for cars using robots mentioned in the
+    blog?
+  sentences:
+  - 'Title: The Future of Smart Mobility — Prof. Chris Atkinson
+    Published: April, 2021
+    Author(s): 99P Labs
+    Claps: 1
+    Comments: 0
+    Word Count: 52
+    URL: https://medium.com/99p-labs/the-future-of-smart-mobility-prof-chris-atkinson-8dfbc1fc1280
+    The blog article discusses the webinar on The Future of Smart Mobility by Prof.
+    Atkinson at The Ohio State University. 99P Labs expresses excitement about the
+    topic and their collaboration with partners such as OSU to work towards realizing
+    this future.'
+  - 'Title: Innovative Projects at MakeOHI/O
+    Published: March, 2023
+    Author(s): Ryan Lingo
+    Claps: 58
+    Comments: 0
+    Word Count: 603
+    URL: https://medium.com/99p-labs/innovative-projects-at-makeohi-o-6f8a4c5a3d02
+    The blog article discusses the Innovative Projects at MakeOHI/O, a makeathon event
+    sponsored by 99P Labs. The event aimed to encourage creativity and innovation
+    among undergraduate and graduate students. The article highlights three successful
+    projects from the event, including a platform for visually impaired individuals,
+    a safety system for cars using robots, and an automated rearview mirror and sun
+    visor adjustment system. The article also expresses gratitude to the event organizers
+    and participants and invites readers to connect with 99P Labs for collaboration.'
+  - 'Title: An Overview of Machine Learning — Part 2: All About Regression
+    Published: January, 2023
+    Author(s): Luka Brkljacic
+    Claps: 2
+    Comments: 0
+    Word Count: 4550
+    URL: https://medium.com/99p-labs/an-overview-of-machine-learning-part-2-all-about-regression-2f991281932e
+    The blog article provides an in-depth overview of regression in machine learning.
+    It covers linear regression, calculating R, limitations of R, multiple regression,
+    adjusted R, and logistic regression. The article also includes practical Python
+    examples for linear regression and multiple regression. The author also mentions
+    that the next post will cover decision trees.'
+- source_sentence: What is the Intel Realsense D435i Depth Camera used for?
+  sentences:
+  - 'Title: How LLMs can Drive the Intersection Between Social and Mobility
+    Published: December, 2023
+    Author(s): Roopal Joshi, Nishant Chintalapati, Sanghmitra Wankhade, Ashima Saxena,
+    and Ken Pulverman
+    Claps: 1
+    Comments: 0
+    Word Count: 2211
+    URL: https://medium.com/99p-labs/how-llms-can-drive-the-intersection-between-social-and-mobility-1cca9f34e410
+    The article discusses the authors'' journey in tackling a challenge for 99P Labs,
+    exploring the relevance of LLMs for the company to engage their users in the future
+    of mobility. The authors detail their process of ideation, convergent and divergent
+    thinking, and the development of a product or service that leverages the capabilities
+    of ChatGPT or Gen AI to initiate or influence physical actions. The article concludes
+    with recommendations and insights gained from the project.'
+  - 'Title: Harnessing Sensors and Software
+    Published: August, 2023
+    Author(s): Edward Lui
+    Claps: 0
+    Comments: 0
+    Word Count: 1133
+    URL: https://medium.com/99p-labs/harnessing-sensors-and-software
+    The blog article discusses the author''s two-month internship at 99P, focusing
+    on sensors and their integration with the Robot Operating System (ROS). The author
+    worked on the SOMEthings project, exploring technologies such as the Intel Realsense
+    D435i Depth Camera, HC-SR04 Ultrasonic Sensor, and DW1000 UWB Module. The challenges
+    faced and accomplishments achieved during the internship are highlighted, providing
+    valuable insights and hands-on experience. The article concludes with an invitation
+    for collaboration and engagement with 99P Labs.'
+  - 'Title: Sprint 6: Designing a Mobile Mentor
+    Published: October, 2023
+    Author(s): Alana Levene
+    Claps: 1
+    Comments: 0
+    Word Count: 1015
+    URL: https://medium.com/99p-labs/sprint-6-designing-a-mobile-mentor
+    The 99P Labs x CMU MHCI Capstone Team has transitioned from research to design,
+    focusing on creating a Mobile Mentor for Gen Z to facilitate on-the-go learning.
+    The team has identified key insights from their research and has begun the prototyping
+    process using a low-fidelity cardboard model. They are actively involving participants
+    in the design process and are considering various influencing factors on their
+    product. The team plans to transition to a design sprint timeline and is excited
+    to continue developing this innovative product.'
+- source_sentence: What use cases are provided for the Sustainable Mobility Analytics
+    dashboard?
+  sentences:
+  - 'Title: MakeOHI/O 2024
+    Published: March, 2024
+    Author(s): Ryan Lingo
+    Claps: 4
+    Comments: 0
+    Word Count: 2582
+    URL: https://medium.com/99p-labs/makeohi-o-2024-cb594eceb99f
+    The blog post discusses the author''s experience at the MakeOHI/O hackathon at
+    Ohio State University. The author served as a mentor and judge and shares the
+    challenge set for the students, the outstanding projects, and the winners. The
+    blog highlights the innovative solutions presented by the winning teams and the
+    overall success of the event. The author also encourages readers to stay engaged
+    with the community and explore partnership opportunities.'
+  - 'Title: CMU Heinz Capstone Project — Building Sustainable Mobility Analytics Tool
+    Published: June, 2022
+    Author(s): 99P Labs
+    Claps: 58
+    Comments: 0
+    Word Count: 2780
+    URL: https://medium.com/99p-labs/cmu-heinz-capstone-project-building-sustainable-mobility-analytics-tool-cbfe6e2591ee
+    The blog article discusses the sustainability of transportation networks and the
+    development of a Sustainable Mobility Analytics dashboard by a group of interdisciplinary
+    research scientists and engineers finishing their graduate studies at Heinz College.
+    The dashboard aims to help partners at 99P Labs understand the complexity of transportation
+    networks and evaluate their sustainability. The article details the three phases
+    of the project, the methodologies for calculating various metrics featured on
+    the dashboard, and provides use cases for the dashboard. Additionally, it discusses
+    the potential for future work and thanks those who supported the project.'
+  - 'Title: Navigating Telematics Data
+    Published: December, 2023
+    Author(s): Amber Liu, Hanna Lee, Parunjodhi Munisamy, Yaretsy Castro, and Tulip
+    Daaboul
+    Claps: 60
+    Comments: 0
+    Word Count: 1637
+    URL: https://medium.com/99p-labs/navigating-telematics-data-1b59e09489c7
+    The blog article discusses the importance of telematics data and its relevance
+    in the transportation landscape. It outlines the challenges faced in working with
+    telematics data, the tools and resources used, and the process of navigating and
+    visualizing the data. The article also delves into the specific analysis of telematics
+    data and census data in Ohio, highlighting the impact of Covid on transportation
+    and income levels. The authors express a desire to further explore pre-covid and
+    post-covid trends and extend the investigation to other states in the United States.'
+- source_sentence: How does gamification enhance the learning experience in data science
+    according to the blog?
+  sentences:
+  - 'Title: Unlocking Potential: The Power of Gamification in Employee Data Science
+    Learning
+    Published: April, 2024
+    Author(s): Fern Zhang
+    Claps: 5
+    Comments: 0
+    Word Count: 1661
+    URL: https://medium.com/99p-labs/unlocking-potential-the-power-of-gamification-in-employee-data-science-learning-5f88e97c74aa
+    The blog article discusses the use of gamification in employee data science learning.
+    It highlights the challenges in data science training and the team''s initiative
+    to revolutionize it using gamification strategies. The team adopted a multifaceted
+    approach to understand the diverse backgrounds and prior knowledge of their target
+    learners to design effective instruction. The article also discusses the gamification
+    strategies for manager and practitioner training, as well as the user testing
+    feedback and future plans for employee training in data science. Overall, the
+    article emphasizes the importance of data science training and the use of gamification
+    to make it an engaging and impactful learning experience.'
+  - 'Title: CMU Capstone Project — Visualization Framework Of Telematics Data
+    Published: April, 2024
+    Author(s): Yiheng Zhang, Yixue Yin, Rui Huang
+    Claps: 1
+    Comments: 0
+    Word Count: 2520
+    URL: https://medium.com/99p-labs/cmu-capstone-project-visualization-framework-of-telematics-data-abb74fcbb975
+    The blog article discusses the development of an application to display telematic
+    trajectory data in various formats on a web browser. The project involved brainstorming,
+    user interviews, experimentation, and necessary pivots to define the trajectory
+    of the development process. The team also focused on enhancing the foundational
+    dashboard, building up a plugin system, fixing problems, and building new features.
+    The final sprint involved finalizing and enhancing the user interface of the visualization
+    framework. The article also outlines future works for the project.'
+  - 'Title: Summer Sprint 3: Planes, Trains, and Autonomous Vehicles
+    Published: September, 2022
+    Author(s): MHCI x 99P Labs Capstone Team
+    Claps: 4
+    Comments: 0
+    Word Count: 1728
+    URL: https://medium.com/99p-labs/summer-sprint-3-planes-trains-and-autonomous-vehicles-5e8b40dbb67e
+    The MHCI x 99P Labs Capstone Team worked remotely from NYC and San Diego, attending
+    two major UX conferences and learning skills to apply to their project. They solidified
+    their understanding of quantitative research methods and learned how to address
+    points of friction in a user''s journey. The team discovered benefits and challenges
+    of remote work, and tested low-fi prototypes of built-in screens in an autonomous
+    people-mover. They conducted a brainstorm of all the capabilities they imagine
+    their AV''s ecosystem would have and identified the highest priority capabilities.
+    The team also developed a wireflow based on their map of capabilities and prototyped
+    it in Figma to test with participants. They plan to A/B test different content
+    for the in-app options and continue to explore specificity levels when it comes
+    to giving passengers information. They are excited to bring all of their learnings
+    to life in their final design, which will inform the future of shared AV transportation.'
+model-index:
+- name: SentenceTransformer based on marroyo777/bge-99GPT-v1
+  results:
+  - task:
+      type: triplet
+      name: Triplet
+    dataset:
+      name: 99GPT Finetuning Embedding test 01
+      type: 99GPT-Finetuning-Embedding-test-01
+    metrics:
+    - type: cosine_accuracy
+      value: 0.9887005649717514
+      name: Cosine Accuracy
+    - type: dot_accuracy
+      value: 0.011299435028248588
+      name: Dot Accuracy
+    - type: manhattan_accuracy
+      value: 0.9887005649717514
+      name: Manhattan Accuracy
+    - type: euclidean_accuracy
+      value: 0.9887005649717514
+      name: Euclidean Accuracy
+    - type: max_accuracy
+      value: 0.9887005649717514
+      name: Max Accuracy
+    - type: cosine_accuracy
+      value: 0.9915254237288136
+      name: Cosine Accuracy
+    - type: dot_accuracy
+      value: 0.00847457627118644
+      name: Dot Accuracy
+    - type: manhattan_accuracy
+      value: 0.9915254237288136
+      name: Manhattan Accuracy
+    - type: euclidean_accuracy
+      value: 0.9915254237288136
+      name: Euclidean Accuracy
+    - type: max_accuracy
+      value: 0.9915254237288136
+      name: Max Accuracy
+---
+# SentenceTransformer based on marroyo777/bge-99GPT-v1
+This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [marroyo777/bge-99GPT-v1](https://huggingface.co/marroyo777/bge-99GPT-v1). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
+## Model Details
+### Model Description
+- **Model Type:** Sentence Transformer
+- **Base model:** [marroyo777/bge-99GPT-v1](https://huggingface.co/marroyo777/bge-99GPT-v1) <!-- at revision 4ca01046331fa1aed7ce35326b38186f8baa5149 -->
+- **Maximum Sequence Length:** 512 tokens
+- **Output Dimensionality:** 384 tokens
+- **Similarity Function:** Cosine Similarity
+<!-- - **Training Dataset:** Unknown -->
+<!-- - **Language:** Unknown -->
+<!-- - **License:** Unknown -->
+### Model Sources
+- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
+- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
+- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
+### Full Model Architecture
+```
+SentenceTransformer(
+  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
+  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
+  (2): Normalize()
+)
+```
+## Usage
+### Direct Usage (Sentence Transformers)
+First install the Sentence Transformers library:
+```bash
+pip install -U sentence-transformers
+```
+Then you can load this model and run inference.
+```python
+from sentence_transformers import SentenceTransformer
+# Download from the 🤗 Hub
+model = SentenceTransformer("marroyo777/bge-99GPT-v1")
+# Run inference
+sentences = [
+    'How does gamification enhance the learning experience in data science according to the blog?',
+    "Title: Unlocking Potential: The Power of Gamification in Employee Data Science Learning\nPublished: April, 2024\nAuthor(s): Fern Zhang\nClaps: 5\nComments: 0\nWord Count: 1661\nURL: https://medium.com/99p-labs/unlocking-potential-the-power-of-gamification-in-employee-data-science-learning-5f88e97c74aa\n\nThe blog article discusses the use of gamification in employee data science learning. It highlights the challenges in data science training and the team's initiative to revolutionize it using gamification strategies. The team adopted a multifaceted approach to understand the diverse backgrounds and prior knowledge of their target learners to design effective instruction. The article also discusses the gamification strategies for manager and practitioner training, as well as the user testing feedback and future plans for employee training in data science. Overall, the article emphasizes the importance of data science training and the use of gamification to make it an engaging and impactful learning experience.",
+    'Title: CMU Capstone Project\u200a—\u200aVisualization Framework Of Telematics Data\nPublished: April, 2024\nAuthor(s): Yiheng Zhang, Yixue Yin, Rui Huang\nClaps: 1\nComments: 0\nWord Count: 2520\nURL: https://medium.com/99p-labs/cmu-capstone-project-visualization-framework-of-telematics-data-abb74fcbb975\n\nThe blog article discusses the development of an application to display telematic trajectory data in various formats on a web browser. The project involved brainstorming, user interviews, experimentation, and necessary pivots to define the trajectory of the development process. The team also focused on enhancing the foundational dashboard, building up a plugin system, fixing problems, and building new features. The final sprint involved finalizing and enhancing the user interface of the visualization framework. The article also outlines future works for the project.',
+]
+embeddings = model.encode(sentences)
+print(embeddings.shape)
+# [3, 384]
+# Get the similarity scores for the embeddings
+similarities = model.similarity(embeddings, embeddings)
+print(similarities.shape)
+# [3, 3]
+```
+<!--
+### Direct Usage (Transformers)
+<details><summary>Click to see the direct usage in Transformers</summary>
+</details>
+-->
+<!--
+### Downstream Usage (Sentence Transformers)
+You can finetune this model on your own dataset.
+<details><summary>Click to expand</summary>
+</details>
+-->
+<!--
+### Out-of-Scope Use
+*List how the model may foreseeably be misused and address what users ought not to do with the model.*
+-->
+## Evaluation
+### Metrics
+#### Triplet
+* Dataset: `99GPT-Finetuning-Embedding-test-01`
+* Evaluated with [<code>TripletEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator)
+| Metric             | Value      |
+|:-------------------|:-----------|
+| cosine_accuracy    | 0.9887     |
+| dot_accuracy       | 0.0113     |
+| manhattan_accuracy | 0.9887     |
+| euclidean_accuracy | 0.9887     |
+| **max_accuracy**   | **0.9887** |
+#### Triplet
+* Dataset: `99GPT-Finetuning-Embedding-test-01`
+* Evaluated with [<code>TripletEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator)
+| Metric             | Value      |
+|:-------------------|:-----------|
+| cosine_accuracy    | 0.9915     |
+| dot_accuracy       | 0.0085     |
+| manhattan_accuracy | 0.9915     |
+| euclidean_accuracy | 0.9915     |
+| **max_accuracy**   | **0.9915** |
+<!--
+## Bias, Risks and Limitations
+*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
+-->
+<!--
+### Recommendations
+*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
+-->
+## Training Details
+### Training Dataset
+#### Unnamed Dataset
+* Size: 1,416 training samples
+* Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
+* Approximate statistics based on the first 1000 samples:
+  |         | anchor                                                                            | positive                                                                              | negative                                                                             |
+  |:--------|:----------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
+  | type    | string                                                                            | string                                                                                | string                                                                               |
+  | details | <ul><li>min: 8 tokens</li><li>mean: 17.71 tokens</li><li>max: 36 tokens</li></ul> | <ul><li>min: 125 tokens</li><li>mean: 190.68 tokens</li><li>max: 331 tokens</li></ul> | <ul><li>min: 125 tokens</li><li>mean: 190.0 tokens</li><li>max: 331 tokens</li></ul> |
+* Samples:
+  | anchor                                                                                 | positive                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | negative                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
+  |:---------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+  | <code>What guidance does the article provide for creating a co-design protocol?</code> | <code>Title: Interactive Co-Design Sessions for Customer Research — Part 2: Co-Design Protocol<br>Published: November, 2020<br>Author(s): Langley Vogt<br>Claps: 0<br>Comments: 0<br>Word Count: 497<br>URL: https://medium.com/99p-labs/interactive-co-design-sessions-for-customer-research-part-2-co-design-protocol-2c60291e88c9<br><br>The article discusses the process of creating an interactive co-design protocol for customer research. It emphasizes the importance of creating a thorough protocol and interactive board simultaneously, and provides guidance on creating a preliminary protocol and laying out the rest of the protocol in a table format. The article also mentions that Part 3 will share co-design learnings and takeaways.</code>                                                                                                                                                                 | <code>Title: What is Software-defined Mobility?<br>Published: March, 2023<br>Author(s): Rajeev Chhajer and Ryan Lingo<br>Claps: 56<br>Comments: 0<br>Word Count: 742<br>URL: https://medium.com/99p-labs/what-is-software-defined-mobility/<br><br>The article discusses the concept of Software-defined Mobility and its impact on the automotive industry. It emphasizes the importance of incorporating intelligence into the mobility ecosystem through software to create a more integrated, sustainable, and emotional mobility experience. The authors believe that participation and cooperation are key to success in this new mobility paradigm, and they aim to leverage cutting-edge technologies and innovative approaches to address the challenges facing the automotive industry.</code>                                                                                                                                   |
+  | <code>What was the goal of the MHCI 99P Labs Capstone Team's project?</code>           | <code>Title: Interactions, Car Data, and Play Dynamics…Oh My!—2021 MHCI Capstone Part 8<br>Published: January, 2022<br>Author(s): MHCI 99P Labs Capstone Team<br>Claps: 0<br>Comments: 0<br>Word Count: 1061<br>URL: https://medium.com/99p-labs/interactions-car-data-and-play-dynamics-oh-my-2021-mhci-capstone-part-8-b3ac8dd1ceef<br><br>The MHCI 99P Labs Capstone Team shares their experiences and learnings from Sprint 2 of their project. They explored various interactions in the car, including shared motion and collaboration, button-based games, and co-creation with data input from the car. The team aimed to foster connections between families through play and successfully learned how these new interactions could achieve this goal. The marble game was the most successful, while the other two prototypes had mixed success. The team plans to take their learnings forward in the next sprint.</code> | <code>Title: Introducing the 99P Labs Blog Chatbot<br>Published: February, 2024<br>Author(s): Martin Arroyo<br>Claps: 4<br>Comments: 1<br>Word Count: 3208<br>URL: https://medium.com/99p-labs/99gpt-building-a-chatbot-fdde8b689df4<br><br>The 99P Labs blog has introduced a chatbot called 99GPT, designed to answer questions about blog content. The chatbot aims to provide a more engaging and interactive way for readers to explore insights from the blog archive. The article discusses the technical considerations, challenges, and lessons learned in building 99GPT, including the ingestion phase, model selection, and developing a querying strategy. The blog also highlights the importance of frameworks like Langchain and LlamaIndex in bridging the gap between raw data and AI-driven interactive applications. The article concludes with the deployment of the chatbot on the Streamlit community cloud.</code> |
+  | <code>What are the ideal data quality outputs mentioned in the article?</code>         | <code>Title: Weighing the Value of Data Quality Checks<br>Published: July, 2022<br>Author(s): Ryan Lingo<br>Claps: 36<br>Comments: 0<br>Word Count: 2572<br>URL: https://medium.com/99p-labs/weighing-the-value-of-data-quality-checks-4a5d0da1f3ff<br><br>The article discusses the exploration of implementing data quality checks into a data platform, the goals, limits, and expectations, and the small experiments conducted to validate thinking. It also covers the flexibility and customization of data quality, potential actions to take when finding inadequate data quality, ideal data quality output, metrics to report, and where in the pipeline data quality checks best fit. The article also explores general deployment options and closing thoughts on the exploration of data quality ideas and architecture.</code>                                                                                        | <code>Title: Sprint 2: Robot You Can Drive My Car<br>Published: May, 2022<br>Author(s): MHCI x 99P Labs Capstone Team<br>Claps: 0<br>Comments: 0<br>Word Count: 648<br>URL: https://medium.com/99p-labs/sprint-2-robot-you-can-drive-my-car-e4d988826555<br><br>The blog article discusses the progress of the MHCI x 99P Labs Capstone Team in their project, focusing on the preliminary research and brainstorming they have conducted. The team has updated their research plan and is preparing to conduct informal interviews and observations in various related fields. They also plan to explore pretotyping in their next sprint to understand what form of attendants is most helpful to human passengers.</code>                                                                                                                                                                                                               |
+* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
+  ```json
+  {
+      "scale": 20.0,
+      "similarity_fct": "cos_sim"
+  }
+  ```
+### Evaluation Dataset
+#### Unnamed Dataset
+* Size: 354 evaluation samples
+* Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
+* Approximate statistics based on the first 354 samples:
+  |         | anchor                                                                            | positive                                                                              | negative                                                                              |
+  |:--------|:----------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------|
+  | type    | string                                                                            | string                                                                                | string                                                                                |
+  | details | <ul><li>min: 7 tokens</li><li>mean: 17.68 tokens</li><li>max: 32 tokens</li></ul> | <ul><li>min: 125 tokens</li><li>mean: 187.96 tokens</li><li>max: 331 tokens</li></ul> | <ul><li>min: 125 tokens</li><li>mean: 189.88 tokens</li><li>max: 331 tokens</li></ul> |
+* Samples:
+  | anchor                                                                        | positive                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | negative                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
+  |:------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+  | <code>What challenges did the 99P capstone team face in their project?</code> | <code>Title: Decoding Travel Times: Exploring Telematics Data Dynamics<br>Published: May, 2024<br>Author(s): Qamar Mohamoud<br>Claps: 3<br>Comments: 1<br>Word Count: 1880<br>URL: https://medium.com/99p-labs/decoding-travel-times-exploring-telematics-data-dynamics<br><br>The blog article discusses the challenges faced by the 99P capstone team of the MTDA program at The Ohio State University in building a model to compare real-life trip times to ideal times projected by the Google Distance Matrix. The team explored telematics data dynamics and the impact of geography, time of day, and local weather on trip times. The article also highlights the team's approach to feature creation, weather analysis, zone identification, data filtering, and modeling. Despite their efforts, the predictive models tested did not exceed 60% accuracy, leading to several key conclusions. The team advises caution in replicating their analysis and suggests addressing data bias, exploring alternative data sources, and considering route information for more accurate analyses in the future.</code> | <code>Title: Sprint 5: Optimizing HRI Research with Smart Guide — A Co-Design Journey<br>Published: May, 2024<br>Author(s): Honda Research Institute MHCI @ CMU<br>Claps: 2<br>Comments: 0<br>Word Count: 970<br>URL: https://medium.com/99p-labs/sprint-5-optimizing-hri-research-with-smart-guide-a-co-design-journey-fa5d64a56a3d<br><br>The blog article discusses the Smart Guide as an AI research companion for HRI researchers, aimed at enhancing the efficiency of human-AI teaming (HAIT) research. The article details the goals and testing process for the Smart Guide, as well as the insights gained from co-creation sessions with CMU researchers. The article also outlines the prototype and the key takeaways from the research process.</code>                                       |
+  | <code>What challenges did the author face during the internship?</code>       | <code>Title: Harnessing Sensors and Software<br>Published: August, 2023<br>Author(s): Edward Lui<br>Claps: 0<br>Comments: 0<br>Word Count: 1133<br>URL: https://medium.com/99p-labs/harnessing-sensors-and-software<br><br>The blog article discusses the author's two-month internship at 99P, focusing on sensors and their integration with the Robot Operating System (ROS). The author worked on the SOMEthings project, exploring technologies such as the Intel Realsense D435i Depth Camera, HC-SR04 Ultrasonic Sensor, and DW1000 UWB Module. The challenges faced and accomplishments achieved during the internship are highlighted, providing valuable insights and hands-on experience. The article concludes with an invitation for collaboration and engagement with 99P Labs.</code>                                                                                                                                                                                                                                                                                                                       | <code>Title: Sprint 6: Designing a Mobile Mentor<br>Published: October, 2023<br>Author(s): Alana Levene<br>Claps: 1<br>Comments: 0<br>Word Count: 1015<br>URL: https://medium.com/99p-labs/sprint-6-designing-a-mobile-mentor<br><br>The 99P Labs x CMU MHCI Capstone Team has transitioned from research to design, focusing on creating a Mobile Mentor for Gen Z to facilitate on-the-go learning. The team has identified key insights from their research and has begun the prototyping process using a low-fidelity cardboard model. They are actively involving participants in the design process and are considering various influencing factors on their product. The team plans to transition to a design sprint timeline and is excited to continue developing this innovative product.</code> |
+  | <code>What are the goals of the SOMEThings project?</code>                    | <code>Title: Introducing the SOMEThings Project<br>Published: July, 2023<br>Author(s): Ryan Lingo<br>Claps: 15<br>Comments: 0<br>Word Count: 2794<br>URL: https://medium.com/99p-labs/introducing-the-somethings-project-f5eb8b0cf572<br><br>The blog introduces the SOMEThings project, which is an initiative to build a miniature smart city for testing and experimenting with real-world challenges in the mobility ecosystem and IoT. The project aims to revolutionize the mobility sector, enhance efficiency and accessibility of mobility through IoT integration, and foster a culture of continuous learning and improvement. The blog also discusses the development of the SOMEThings Lab, the car, and the track for the project. The project is expected to have a substantial impact on the future of mobility and society at large.</code>                                                                                                                                                                                                                                                               | <code>Title: An Overview of Machine Learning — Part 2: All About Regression<br>Published: January, 2023<br>Author(s): Luka Brkljacic<br>Claps: 2<br>Comments: 0<br>Word Count: 4550<br>URL: https://medium.com/99p-labs/an-overview-of-machine-learning-part-2-all-about-regression-2f991281932e<br><br>The blog article provides an in-depth overview of regression in machine learning. It covers linear regression, calculating R, limitations of R, multiple regression, adjusted R, and logistic regression. The article also includes practical Python examples for linear regression and multiple regression. The author also mentions that the next post will cover decision trees.</code>                                                                                                         |
+* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
+  ```json
+  {
+      "scale": 20.0,
+      "similarity_fct": "cos_sim"
+  }
+  ```
+### Training Hyperparameters
+#### Non-Default Hyperparameters
+- `eval_strategy`: steps
+- `per_device_train_batch_size`: 16
+- `per_device_eval_batch_size`: 16
+- `num_train_epochs`: 1
+- `warmup_ratio`: 0.1
+- `fp16`: True
+- `batch_sampler`: no_duplicates
+#### All Hyperparameters
+<details><summary>Click to expand</summary>
+- `overwrite_output_dir`: False
+- `do_predict`: False
+- `eval_strategy`: steps
+- `prediction_loss_only`: True
+- `per_device_train_batch_size`: 16
+- `per_device_eval_batch_size`: 16
+- `per_gpu_train_batch_size`: None
+- `per_gpu_eval_batch_size`: None
+- `gradient_accumulation_steps`: 1
+- `eval_accumulation_steps`: None
+- `torch_empty_cache_steps`: None
+- `learning_rate`: 5e-05
+- `weight_decay`: 0.0
+- `adam_beta1`: 0.9
+- `adam_beta2`: 0.999
+- `adam_epsilon`: 1e-08
+- `max_grad_norm`: 1.0
+- `num_train_epochs`: 1
+- `max_steps`: -1
+- `lr_scheduler_type`: linear
+- `lr_scheduler_kwargs`: {}
+- `warmup_ratio`: 0.1
+- `warmup_steps`: 0
+- `log_level`: passive
+- `log_level_replica`: warning
+- `log_on_each_node`: True
+- `logging_nan_inf_filter`: True
+- `save_safetensors`: True
+- `save_on_each_node`: False
+- `save_only_model`: False
+- `restore_callback_states_from_checkpoint`: False
+- `no_cuda`: False
+- `use_cpu`: False
+- `use_mps_device`: False
+- `seed`: 42
+- `data_seed`: None
+- `jit_mode_eval`: False
+- `use_ipex`: False
+- `bf16`: False
+- `fp16`: True
+- `fp16_opt_level`: O1
+- `half_precision_backend`: auto
+- `bf16_full_eval`: False
+- `fp16_full_eval`: False
+- `tf32`: None
+- `local_rank`: 0
+- `ddp_backend`: None
+- `tpu_num_cores`: None
+- `tpu_metrics_debug`: False
+- `debug`: []
+- `dataloader_drop_last`: False
+- `dataloader_num_workers`: 0
+- `dataloader_prefetch_factor`: None
+- `past_index`: -1
+- `disable_tqdm`: False
+- `remove_unused_columns`: True
+- `label_names`: None
+- `load_best_model_at_end`: False
+- `ignore_data_skip`: False
+- `fsdp`: []
+- `fsdp_min_num_params`: 0
+- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
+- `fsdp_transformer_layer_cls_to_wrap`: None
+- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
+- `deepspeed`: None
+- `label_smoothing_factor`: 0.0
+- `optim`: adamw_torch
+- `optim_args`: None
+- `adafactor`: False
+- `group_by_length`: False
+- `length_column_name`: length
+- `ddp_find_unused_parameters`: None
+- `ddp_bucket_cap_mb`: None
+- `ddp_broadcast_buffers`: False
+- `dataloader_pin_memory`: True
+- `dataloader_persistent_workers`: False
+- `skip_memory_metrics`: True
+- `use_legacy_prediction_loop`: False
+- `push_to_hub`: False
+- `resume_from_checkpoint`: None
+- `hub_model_id`: None
+- `hub_strategy`: every_save
+- `hub_private_repo`: False
+- `hub_always_push`: False
+- `gradient_checkpointing`: False
+- `gradient_checkpointing_kwargs`: None
+- `include_inputs_for_metrics`: False
+- `eval_do_concat_batches`: True
+- `fp16_backend`: auto
+- `push_to_hub_model_id`: None
+- `push_to_hub_organization`: None
+- `mp_parameters`:
+- `auto_find_batch_size`: False
+- `full_determinism`: False
+- `torchdynamo`: None
+- `ray_scope`: last
+- `ddp_timeout`: 1800
+- `torch_compile`: False
+- `torch_compile_backend`: None
+- `torch_compile_mode`: None
+- `dispatch_batches`: None
+- `split_batches`: None
+- `include_tokens_per_second`: False
+- `include_num_input_tokens_seen`: False
+- `neftune_noise_alpha`: None
+- `optim_target_modules`: None
+- `batch_eval_metrics`: False
+- `eval_on_start`: False
+- `eval_use_gather_object`: False
+- `batch_sampler`: no_duplicates
+- `multi_dataset_batch_sampler`: proportional
+</details>
+### Training Logs
+| Epoch | Step | 99GPT-Finetuning-Embedding-test-01_max_accuracy |
+|:-----:|:----:|:-----------------------------------------------:|
+| 1.0   | 89   | 0.9915                                          |
+### Framework Versions
+- Python: 3.10.12
+- Sentence Transformers: 3.1.1
+- Transformers: 4.44.2
+- PyTorch: 2.4.1+cu121
+- Accelerate: 0.34.2
+- Datasets: 3.0.1
+- Tokenizers: 0.19.1
+## Citation
+### BibTeX
+#### Sentence Transformers
+```bibtex
+@inproceedings{reimers-2019-sentence-bert,
+    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
+    author = "Reimers, Nils and Gurevych, Iryna",
+    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
+    month = "11",
+    year = "2019",
+    publisher = "Association for Computational Linguistics",
+    url = "https://arxiv.org/abs/1908.10084",
+}
+```
+#### MultipleNegativesRankingLoss
+```bibtex
+@misc{henderson2017efficient,
+    title={Efficient Natural Language Response Suggestion for Smart Reply},
+    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
+    year={2017},
+    eprint={1705.00652},
+    archivePrefix={arXiv},
+    primaryClass={cs.CL}
+}
+```
+<!--
+## Glossary
+*Clearly define terms in order to be accessible across audiences.*
+-->
+<!--
+## Model Card Authors
+*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
+-->
+<!--
+## Model Card Contact
+*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
+-->

config.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "_name_or_path": "marroyo777/bge-99GPT-v1",
+  "architectures": [
+    "BertModel"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "classifier_dropout": null,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 384,
+  "id2label": {
+    "0": "LABEL_0"
+  },
+  "initializer_range": 0.02,
+  "intermediate_size": 1536,
+  "label2id": {
+    "LABEL_0": 0
+  },
+  "layer_norm_eps": 1e-12,
+  "max_position_embeddings": 512,
+  "model_type": "bert",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 12,
+  "pad_token_id": 0,
+  "position_embedding_type": "absolute",
+  "torch_dtype": "float32",
+  "transformers_version": "4.44.2",
+  "type_vocab_size": 2,
+  "use_cache": true,
+  "vocab_size": 30522
+}

config_sentence_transformers.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "__version__": {
+    "sentence_transformers": "3.1.1",
+    "transformers": "4.44.2",
+    "pytorch": "2.4.1+cu121"
+  },
+  "prompts": {},
+  "default_prompt_name": null,
+  "similarity_fn_name": null
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:35160adc86cb4e7e6f8ec496af9093df1f6bece8ee9bc633b82e224d1b0e4c56
+size 133462128

modules.json ADDED Viewed

	@@ -0,0 +1,20 @@

+[
+  {
+    "idx": 0,
+    "name": "0",
+    "path": "",
+    "type": "sentence_transformers.models.Transformer"
+  },
+  {
+    "idx": 1,
+    "name": "1",
+    "path": "1_Pooling",
+    "type": "sentence_transformers.models.Pooling"
+  },
+  {
+    "idx": 2,
+    "name": "2",
+    "path": "2_Normalize",
+    "type": "sentence_transformers.models.Normalize"
+  }
+]

sentence_bert_config.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "max_seq_length": 512,
+  "do_lower_case": true
+}

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,37 @@

+{
+  "cls_token": {
+    "content": "[CLS]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "mask_token": {
+    "content": "[MASK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "[PAD]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sep_token": {
+    "content": "[SEP]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "[UNK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,64 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "101": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "102": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "103": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "[CLS]",
+  "do_basic_tokenize": true,
+  "do_lower_case": true,
+  "mask_token": "[MASK]",
+  "max_length": 512,
+  "model_max_length": 512,
+  "never_split": null,
+  "pad_to_multiple_of": null,
+  "pad_token": "[PAD]",
+  "pad_token_type_id": 0,
+  "padding_side": "right",
+  "sep_token": "[SEP]",
+  "stride": 0,
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "truncation_side": "right",
+  "truncation_strategy": "longest_first",
+  "unk_token": "[UNK]"
+}

vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff