Instructions to use marroyo777/bge-99GPT-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use marroyo777/bge-99GPT-v1 with sentence-transformers:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("marroyo777/bge-99GPT-v1")

sentences = [
    "Who wrote the blog Sprint 7: Iterating upon iterations?",
    "Title: From Insights to Impact: 99P Labs Collaborates with BDAA to Foster Data Visualization Talent\nPublished: March, 2023\nAuthor(s): Ryan Lingo\nClaps: 0\nComments: 0\nWord Count: 1537\nURL: https://medium.com/99p-labs/from-insights-to-impact-99p-labs-collaborates-with-bdaa-to-foster-data-visualization-talent-26e22a76d1df\n\nIn the spring of 2023, 99P Labs sponsored a data visualization challenge in collaboration with BDAA, the Big Data and Analytics Association at Ohio State University. The challenge lasted two weeks and began with a kickoff event where the 99P Labs team went to the weekly Tuesday BDAA meeting and laid out the motivation and starting guardrails for the challenge. The challenge allowed 99P Labs to connect with the next generation of data professionals and support their growth and development. The data visualization challenge was open to all BDAA members and lasted two weeks. The teams used a wide range of tools and software to create their visualizations and dashboards, including Streamlit, Plotly Dash, and Tableau. The winning entries were highlighted, and the challenge was a valuable experience for 99P Labs. The challenge was not just an opportunity for the students to learn, but it was also an opportunity for 99P Labs to connect with the next generation of data professionals and help build their developer community. The collaboration with BDAA was strengthened, and they look forward to continuing this collaboration in the future.",
    "Title: Sprint 7: Iterating upon iterations\nPublished: October, 2023\nAuthor(s): 2023 99P Labs x CMU MHCI Capstone Team\nClaps: 24\nComments: 0\nWord Count: 985\nURL: https://medium.com/99p-labs/sprint-7-iterating-upon-iterations-34cc621a5aeb\n\nThe 99P Labs x CMU MHCI Capstone Team is part of the Master of Human-Computer Interaction (MHCI) program at Carnegie Mellon University. The team started off with a blank canvas and ran design sessions with 3 Gen Z participants to shape their mobile mentor to fit their learning needs. They found that the activities people wanted to perform in cars fell under a few main hierarchies and came up with a set of 3 scenarios to test out the different roles that Gen Zers expect from the mobile mentor. The team then moved from more generative to evaluative testing and decided to focus on the tutor scenario, making use of the unique moving environment of a vehicle. They also made use of their clients' expertise in vehicle HCI design to conduct testing sessions. The team is looking forward to shaping the future of learning on-the-go in their last few iterations.",
    "Title: 99P Labs 2022 Data I/O Recap\nPublished: November, 2022\nAuthor(s): Ryan Lingo\nClaps: 259\nComments: 0\nWord Count: 1021\nURL: https://medium.com/99p-labs/99p-labs-2022-data-i-o-recap-7c710fbe28e6\n\nThe blog post discusses the 99P Labs 2022 Data I/O Recap, which took place at The Ohio State University. The event included 50 students participating in 12 teams, with 99P Labs sponsoring and offering a challenge for the participants. Despite varying skill levels, the atmosphere remained friendly and inclusive. The event allowed for more personal interaction and submissions from all teams, resulting in impressive insights and visuals. The winning teams were determined by a team of 99P Labs and OSU faculty. Overall, the author expresses their enjoyment and the inspiring energy of the event. For more information, readers are encouraged to visit the 99P Labs blog post."
]
embeddings = model.encode(sentences)

similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]

Notebooks
Google Colab
Kaggle

SentenceTransformer based on marroyo777/bge-99GPT-v1

This is a sentence-transformers model finetuned from marroyo777/bge-99GPT-v1. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: marroyo777/bge-99GPT-v1
Maximum Sequence Length: 512 tokens
Output Dimensionality: 384 tokens
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("marroyo777/bge-99GPT-v1")
# Run inference
sentences = [
    'How does gamification enhance the learning experience in data science according to the blog?',
    "Title: Unlocking Potential: The Power of Gamification in Employee Data Science Learning\nPublished: April, 2024\nAuthor(s): Fern Zhang\nClaps: 5\nComments: 0\nWord Count: 1661\nURL: https://medium.com/99p-labs/unlocking-potential-the-power-of-gamification-in-employee-data-science-learning-5f88e97c74aa\n\nThe blog article discusses the use of gamification in employee data science learning. It highlights the challenges in data science training and the team's initiative to revolutionize it using gamification strategies. The team adopted a multifaceted approach to understand the diverse backgrounds and prior knowledge of their target learners to design effective instruction. The article also discusses the gamification strategies for manager and practitioner training, as well as the user testing feedback and future plans for employee training in data science. Overall, the article emphasizes the importance of data science training and the use of gamification to make it an engaging and impactful learning experience.",
    'Title: CMU Capstone Project\u200a—\u200aVisualization Framework Of Telematics Data\nPublished: April, 2024\nAuthor(s): Yiheng Zhang, Yixue Yin, Rui Huang\nClaps: 1\nComments: 0\nWord Count: 2520\nURL: https://medium.com/99p-labs/cmu-capstone-project-visualization-framework-of-telematics-data-abb74fcbb975\n\nThe blog article discusses the development of an application to display telematic trajectory data in various formats on a web browser. The project involved brainstorming, user interviews, experimentation, and necessary pivots to define the trajectory of the development process. The team also focused on enhancing the foundational dashboard, building up a plugin system, fixing problems, and building new features. The final sprint involved finalizing and enhancing the user interface of the visualization framework. The article also outlines future works for the project.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Dataset: 99GPT-Finetuning-Embedding-test-01
Evaluated with TripletEvaluator

Metric	Value
cosine_accuracy	0.9887
dot_accuracy	0.0113
manhattan_accuracy	0.9887
euclidean_accuracy	0.9887
max_accuracy	0.9887

Triplet

Dataset: 99GPT-Finetuning-Embedding-test-01
Evaluated with TripletEvaluator

Metric	Value
cosine_accuracy	0.9915
dot_accuracy	0.0085
manhattan_accuracy	0.9915
euclidean_accuracy	0.9915
max_accuracy	0.9915

Training Details

Training Dataset

Unnamed Dataset

Size: 1,416 training samples
Columns: anchor, positive, and negative

Approximate statistics based on the first 1000 samples:

	anchor	positive	negative
type	string	string	string
details	min: 8 tokens mean: 17.71 tokens max: 36 tokens	min: 125 tokens mean: 190.68 tokens max: 331 tokens	min: 125 tokens mean: 190.0 tokens max: 331 tokens

Samples:

anchor	positive	negative
`What guidance does the article provide for creating a co-design protocol?`	Title: Interactive Co-Design Sessions for Customer Research — Part 2: Co-Design Protocol Published: November, 2020 Author(s): Langley Vogt Claps: 0 Comments: 0 Word Count: 497 URL: https://medium.com/99p-labs/interactive-co-design-sessions-for-customer-research-part-2-co-design-protocol-2c60291e88c9 The article discusses the process of creating an interactive co-design protocol for customer research. It emphasizes the importance of creating a thorough protocol and interactive board simultaneously, and provides guidance on creating a preliminary protocol and laying out the rest of the protocol in a table format. The article also mentions that Part 3 will share co-design learnings and takeaways.	Title: What is Software-defined Mobility? Published: March, 2023 Author(s): Rajeev Chhajer and Ryan Lingo Claps: 56 Comments: 0 Word Count: 742 URL: https://medium.com/99p-labs/what-is-software-defined-mobility/ The article discusses the concept of Software-defined Mobility and its impact on the automotive industry. It emphasizes the importance of incorporating intelligence into the mobility ecosystem through software to create a more integrated, sustainable, and emotional mobility experience. The authors believe that participation and cooperation are key to success in this new mobility paradigm, and they aim to leverage cutting-edge technologies and innovative approaches to address the challenges facing the automotive industry.
`What was the goal of the MHCI 99P Labs Capstone Team's project?`	Title: Interactions, Car Data, and Play Dynamics…Oh My!—2021 MHCI Capstone Part 8 Published: January, 2022 Author(s): MHCI 99P Labs Capstone Team Claps: 0 Comments: 0 Word Count: 1061 URL: https://medium.com/99p-labs/interactions-car-data-and-play-dynamics-oh-my-2021-mhci-capstone-part-8-b3ac8dd1ceef The MHCI 99P Labs Capstone Team shares their experiences and learnings from Sprint 2 of their project. They explored various interactions in the car, including shared motion and collaboration, button-based games, and co-creation with data input from the car. The team aimed to foster connections between families through play and successfully learned how these new interactions could achieve this goal. The marble game was the most successful, while the other two prototypes had mixed success. The team plans to take their learnings forward in the next sprint.	Title: Introducing the 99P Labs Blog Chatbot Published: February, 2024 Author(s): Martin Arroyo Claps: 4 Comments: 1 Word Count: 3208 URL: https://medium.com/99p-labs/99gpt-building-a-chatbot-fdde8b689df4 The 99P Labs blog has introduced a chatbot called 99GPT, designed to answer questions about blog content. The chatbot aims to provide a more engaging and interactive way for readers to explore insights from the blog archive. The article discusses the technical considerations, challenges, and lessons learned in building 99GPT, including the ingestion phase, model selection, and developing a querying strategy. The blog also highlights the importance of frameworks like Langchain and LlamaIndex in bridging the gap between raw data and AI-driven interactive applications. The article concludes with the deployment of the chatbot on the Streamlit community cloud.
`What are the ideal data quality outputs mentioned in the article?`	Title: Weighing the Value of Data Quality Checks Published: July, 2022 Author(s): Ryan Lingo Claps: 36 Comments: 0 Word Count: 2572 URL: https://medium.com/99p-labs/weighing-the-value-of-data-quality-checks-4a5d0da1f3ff The article discusses the exploration of implementing data quality checks into a data platform, the goals, limits, and expectations, and the small experiments conducted to validate thinking. It also covers the flexibility and customization of data quality, potential actions to take when finding inadequate data quality, ideal data quality output, metrics to report, and where in the pipeline data quality checks best fit. The article also explores general deployment options and closing thoughts on the exploration of data quality ideas and architecture.	Title: Sprint 2: Robot You Can Drive My Car Published: May, 2022 Author(s): MHCI x 99P Labs Capstone Team Claps: 0 Comments: 0 Word Count: 648 URL: https://medium.com/99p-labs/sprint-2-robot-you-can-drive-my-car-e4d988826555 The blog article discusses the progress of the MHCI x 99P Labs Capstone Team in their project, focusing on the preliminary research and brainstorming they have conducted. The team has updated their research plan and is preparing to conduct informal interviews and observations in various related fields. They also plan to explore pretotyping in their next sprint to understand what form of attendants is most helpful to human passengers.

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Evaluation Dataset

Unnamed Dataset

Size: 354 evaluation samples
Columns: anchor, positive, and negative

Approximate statistics based on the first 354 samples:

	anchor	positive	negative
type	string	string	string
details	min: 7 tokens mean: 17.68 tokens max: 32 tokens	min: 125 tokens mean: 187.96 tokens max: 331 tokens	min: 125 tokens mean: 189.88 tokens max: 331 tokens

Samples:

anchor	positive	negative
`What challenges did the 99P capstone team face in their project?`	Title: Decoding Travel Times: Exploring Telematics Data Dynamics Published: May, 2024 Author(s): Qamar Mohamoud Claps: 3 Comments: 1 Word Count: 1880 URL: https://medium.com/99p-labs/decoding-travel-times-exploring-telematics-data-dynamics The blog article discusses the challenges faced by the 99P capstone team of the MTDA program at The Ohio State University in building a model to compare real-life trip times to ideal times projected by the Google Distance Matrix. The team explored telematics data dynamics and the impact of geography, time of day, and local weather on trip times. The article also highlights the team's approach to feature creation, weather analysis, zone identification, data filtering, and modeling. Despite their efforts, the predictive models tested did not exceed 60% accuracy, leading to several key conclusions. The team advises caution in replicating their analysis and suggests addressing data bias, exploring alternative data sources, and considering route information for more accurate analyses in the future.	Title: Sprint 5: Optimizing HRI Research with Smart Guide — A Co-Design Journey Published: May, 2024 Author(s): Honda Research Institute MHCI @ CMU Claps: 2 Comments: 0 Word Count: 970 URL: https://medium.com/99p-labs/sprint-5-optimizing-hri-research-with-smart-guide-a-co-design-journey-fa5d64a56a3d The blog article discusses the Smart Guide as an AI research companion for HRI researchers, aimed at enhancing the efficiency of human-AI teaming (HAIT) research. The article details the goals and testing process for the Smart Guide, as well as the insights gained from co-creation sessions with CMU researchers. The article also outlines the prototype and the key takeaways from the research process.
`What challenges did the author face during the internship?`	Title: Harnessing Sensors and Software Published: August, 2023 Author(s): Edward Lui Claps: 0 Comments: 0 Word Count: 1133 URL: https://medium.com/99p-labs/harnessing-sensors-and-software The blog article discusses the author's two-month internship at 99P, focusing on sensors and their integration with the Robot Operating System (ROS). The author worked on the SOMEthings project, exploring technologies such as the Intel Realsense D435i Depth Camera, HC-SR04 Ultrasonic Sensor, and DW1000 UWB Module. The challenges faced and accomplishments achieved during the internship are highlighted, providing valuable insights and hands-on experience. The article concludes with an invitation for collaboration and engagement with 99P Labs.	Title: Sprint 6: Designing a Mobile Mentor Published: October, 2023 Author(s): Alana Levene Claps: 1 Comments: 0 Word Count: 1015 URL: https://medium.com/99p-labs/sprint-6-designing-a-mobile-mentor The 99P Labs x CMU MHCI Capstone Team has transitioned from research to design, focusing on creating a Mobile Mentor for Gen Z to facilitate on-the-go learning. The team has identified key insights from their research and has begun the prototyping process using a low-fidelity cardboard model. They are actively involving participants in the design process and are considering various influencing factors on their product. The team plans to transition to a design sprint timeline and is excited to continue developing this innovative product.
`What are the goals of the SOMEThings project?`	Title: Introducing the SOMEThings Project Published: July, 2023 Author(s): Ryan Lingo Claps: 15 Comments: 0 Word Count: 2794 URL: https://medium.com/99p-labs/introducing-the-somethings-project-f5eb8b0cf572 The blog introduces the SOMEThings project, which is an initiative to build a miniature smart city for testing and experimenting with real-world challenges in the mobility ecosystem and IoT. The project aims to revolutionize the mobility sector, enhance efficiency and accessibility of mobility through IoT integration, and foster a culture of continuous learning and improvement. The blog also discusses the development of the SOMEThings Lab, the car, and the track for the project. The project is expected to have a substantial impact on the future of mobility and society at large.	Title: An Overview of Machine Learning — Part 2: All About Regression Published: January, 2023 Author(s): Luka Brkljacic Claps: 2 Comments: 0 Word Count: 4550 URL: https://medium.com/99p-labs/an-overview-of-machine-learning-part-2-all-about-regression-2f991281932e The blog article provides an in-depth overview of regression in machine learning. It covers linear regression, calculating R, limitations of R, multiple regression, adjusted R, and logistic regression. The article also includes practical Python examples for linear regression and multiple regression. The author also mentions that the next post will cover decision trees.

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
num_train_epochs: 1
warmup_ratio: 0.1
fp16: True
batch_sampler: no_duplicates

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 5e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 1
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: True
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: False
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
eval_use_gather_object: False
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional

Training Logs

Epoch	Step	99GPT-Finetuning-Embedding-test-01_max_accuracy
1.0	89	0.9915

Framework Versions

Python: 3.10.12
Sentence Transformers: 3.1.1
Transformers: 4.44.2
PyTorch: 2.4.1+cu121
Accelerate: 0.34.2
Datasets: 3.0.1
Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Downloads last month: 4

Safetensors

Model size

33.4M params

Tensor type

F32

Model tree for marroyo777/bge-99GPT-v1

Unable to build the model tree, the base model loops to the model itself. Learn more.

Papers for marroyo777/bge-99GPT-v1

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Paper • 1908.10084 • Published Aug 27, 2019 • 13

Efficient Natural Language Response Suggestion for Smart Reply

Paper • 1705.00652 • Published May 1, 2017

Evaluation results

Cosine Accuracy on 99GPT Finetuning Embedding test 01
self-reported

0.989
Dot Accuracy on 99GPT Finetuning Embedding test 01
self-reported

0.011
Manhattan Accuracy on 99GPT Finetuning Embedding test 01
self-reported

0.989
Euclidean Accuracy on 99GPT Finetuning Embedding test 01
self-reported

0.989
Max Accuracy on 99GPT Finetuning Embedding test 01
self-reported

0.989
Cosine Accuracy on 99GPT Finetuning Embedding test 01
self-reported

0.992
Dot Accuracy on 99GPT Finetuning Embedding test 01
self-reported

0.008
Manhattan Accuracy on 99GPT Finetuning Embedding test 01
self-reported

0.992
Euclidean Accuracy on 99GPT Finetuning Embedding test 01
self-reported

0.992
Max Accuracy on 99GPT Finetuning Embedding test 01
self-reported

0.992