metadata
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- dense
- generated_from_trainer
- dataset_size:1496
- loss:LoggableMNRL
widget:
- source_sentence: >-
According to the passage, why did the intellectual stagnation following
Aristotle's work persist for so long?
sentences:
- >-
world in 587, the Chinese system was very enlightened. Europeans didn't
introduce formal civil service exams till the nineteenth century, and
even then they seem to have been influenced by the Chinese example.
Before credentials, government positions were obtained mainly by family
influence, if not outright bribery. It was a great step forward to judge
people by their performance on a test. But by no means a perfect
solution. When you judge people that way, you tend to get cram
schools—which they did in Ming China and nineteenth century England just
as much as in present day South Korea. What cram schools are, in effect,
is leaks in a seal. The use of credentials was an attempt to seal off
the direct transmission of power between generations, and cram schools
represent that power finding holes in the seal. Cram schools turn wealth
in one generation into credentials in the next. It's hard to beat this
phenomenon, because the schools adjust to suit whatever the tests
measure. When the tests are narrow and predictable, you get cram schools
on the classic model, like those that prepared candidates for Sandhurst
(the British West Point) or the classes American students take now to
improve their SAT scores. But as the te
- >-
tter, but you had no choice in the matter, if you needed money on the
scale only VCs could supply. Now that VCs have competitors, that's going
to put a market price on the help they offer. The interesting thing is,
no one knows yet what it will be. Do startups that want to get really
big need the sort of advice and connections only the top VCs can supply?
Or would super-angel money do just as well? The VCs will say you need
them, and the super-angels will say you don't. But the truth is, no one
knows yet, not even the VCs and super-angels themselves. All the
super-angels know is that their new model seems promising enough to be
worth trying, and all the VCs know is that it seems promising enough to
worry about. RoundsWhatever the outcome, the conflict between VCs and
super-angels is good news for founders. And not just for the obvious
reason that more competition for deals means better terms. The whole
shape of deals is changing. One of the biggest differences between
angels and VCs is the amount of your company they want. VCs want a lot.
In a series A round they want a third of your company, if they can get
it. They don't care much how much they pay for it, but they want a lot
because the number of series A invest
- ' the wrong direction as well. [8] Perhaps worst of all, he protected them from both the criticism of outsiders and the promptings of their own inner compass by establishing the principle that the most noble sort of theoretical knowledge had to be useless. The Metaphysics is mostly a failed experiment. A few ideas from it turned out to be worth keeping; the bulk of it has had no effect at all. The Metaphysics is among the least read of all famous books. It''s not hard to understand the way Newton''s Principia is, but the way a garbled message is. Arguably it''s an interesting failed experiment. But unfortunately that was not the conclusion Aristotle''s successors derived from works like the Metaphysics. [9] Soon after, the western world fell on intellectual hard times. Instead of version 1s to be superseded, the works of Plato and Aristotle became revered texts to be mastered and discussed. And so things remained for a shockingly long time. It was not till around 1600 (in Europe, where the center of gravity had shifted by then) that one found people confident enough to treat Aristotle''s work as a catalog of mistakes. And even then they rarely said so outright. If it seems surprising that the gap was so long, consider ho'
- source_sentence: >-
What is the main reason why Google's headquarters has a unique feel
compared to a typical large company's headquarters?
sentences:
- ' his need. More or less. Higher ranking members of the military got more (as higher ranking members of socialist societies always do), but what they got was fixed according to their rank. And the flattening effect wasn''t limited to those under arms, because the US economy was conscripted too. Between 1942 and 1945 all wages were set by the National War Labor Board. Like the military, they defaulted to flatness. And this national standardization of wages was so pervasive that its effects could still be seen years after the war ended. [1]Business owners weren''t supposed to be making money either. FDR said "not a single war millionaire" would be permitted. To ensure that, any increase in a company''s profits over prewar levels was taxed at 85% And when what was left after corporate taxes reached individuals, it was taxed again at a marginal rate of 93% [2]Socially too the war tended to decrease variation. Over 16 million men and women from all sorts of different backgrounds were brought together in a way of life that was literally uniform. Service rates for men born in the early 1920s approached 80% And working toward a common goal, often under stress, brought them still closer together. Though strictly speaking World '
- >-
iew: Red Rock.7. GoogleGoogle spread out from its first building in
Mountain View to a lot of the surrounding ones. But because the
buildings were built at different times by different people, the place
doesn't have the sterile, walled-off feel that a typical large company's
headquarters have. It definitely has a flavor of its own though. You
sense there is something afoot. The general atmos is vaguely utopian;
there are lots of Priuses, and people who look like they drive them. You
can't get into Google unless you know someone there. It's very much
worth seeing inside if you can, though. Ditto for Facebook, at the end
of California Ave in Palo Alto, though there is nothing to see
outside.8. Skyline DriveSkyline Drive runs along the crest of the Santa
Cruz mountains. On one side is the Valley, and on the other is the
sea—which because it's cold and foggy and has few harbors, plays
surprisingly little role in the lives of people in the Valley,
considering how close it is. Along some parts of Skyline the dominant
trees are huge redwoods, and in others they're live oaks. Redwoods mean
those are the parts where the fog off the coast comes in at night;
redwoods condense rain out of fog. The MROSD manages a collection of
- >-
Written by Paul Graham
The Bus Ticket Theory of Genius
November 2019
Everyone knows that to do great work you need both natural ability and
determination. But there's a third ingredient that's not as well
understood: an obsessive interest in a particular topic. To explain this
point I need to burn my reputation with some group of people, and I'm
going to choose bus ticket collectors. There are people who collect old
bus tickets. Like many collectors, they have an obsessive interest in
the minutiae of what they collect. They can keep track of distinctions
between different types of bus tickets that would be hard for the rest
of us to remember. Because we don't care enough. What's the point of
spending so much time thinking about old bus tickets?Which leads us to
the second feature of this kind of obsession: there is no point. A bus
ticket collector's love is disinterested. They're not doing it to
impress us or to make themselves rich, but for its own sake. When you
look at the lives of people who've done great work, you see a consistent
pattern. They often begin with a bus ticket collector's obsessive
interest in something that would have seemed pointless to most of their
contemporaries. One of the most striking
- source_sentence: >-
According to the passage, why is innocence important for children, and
what consequence does early jadedness have on a person's development?
sentences:
- >-
ful organizations is partly the history of techniques for preserving
that excitement. [4]The team that made the original Macintosh were a
great example of this phenomenon. People like Burrell Smith and Andy
Hertzfeld and Bill Atkinson and Susan Kare were not just following
orders. They were not tennis balls hit by Steve Jobs, but rockets let
loose by Steve Jobs. There was a lot of collaboration between them, but
they all seem to have individually felt the excitement of working on a
project of one's own. In Andy Hertzfeld's book on the Macintosh, he
describes how they'd come back into the office after dinner and work
late into the night. People who've never experienced the thrill of
working on a project they're excited about can't distinguish this kind
of working long hours from the kind that happens in sweatshops and
boiler rooms, but they're at opposite ends of the spectrum. That's why
it's a mistake to insist dogmatically on "work/life balance." Indeed,
the mere expression "work/life" embodies a mistake: it assumes work and
life are distinct. For those to whom the word "work" automatically
implies the dutiful plodding kind, they are. But for the skaters, the
relationship between work and life would be better repr
- >-
tect helpless creatures, considering human offspring are so helpless for
so long. Without the helplessness that makes kids cute, they'd be very
annoying. They'd merely seem like incompetent adults. But there's more
to it than that. The reason our hypothetical jaded 10 year old bothers
me so much is not just that he'd be annoying, but that he'd have cut off
his prospects for growth so early. To be jaded you have to think you
know how the world works, and any theory a 10 year old had about that
would probably be a pretty narrow one. Innocence is also
open-mindedness. We want kids to be innocent so they can continue to
learn. Paradoxical as it sounds, there are some kinds of knowledge that
get in the way of other kinds of knowledge. If you're going to learn
that the world is a brutal place full of people trying to take advantage
of one another, you're better off learning it last. Otherwise you won't
bother learning much more. Very smart adults often seem unusually
innocent, and I don't think this is a coincidence. I think they've
deliberately avoided learning about certain things. Certainly I do. I
used to think I wanted to know everything. Now I know I don't.
DeathAfter sex, death is the topic adults lie most conspic
- >-
do all eight things wrong. In fact, if you look at the way software gets
written in most organizations, it's almost as if they were deliberately
trying to do things wrong. In a sense, they are. One of the defining
qualities of organizations since there have been such a thing is to
treat individuals as interchangeable parts. This works well for more
parallelizable tasks, like fighting wars. For most of history a
well-drilled army of professional soldiers could be counted on to beat
an army of individual warriors, no matter how valorous. But having ideas
is not very parallelizable. And that's what programs are: ideas. It's
not merely true that organizations dislike the idea of depending on
individual genius, it's a tautology. It's part of the definition of an
organization not to. Of our current concept of an organization, at
least. Maybe we could define a new kind of organization that combined
the efforts of individuals without requiring them to be interchangeable.
Arguably a market is such a form of organization, though it may be more
accurate to describe a market as a degenerate case—as what you get by
default when organization isn't possible. Probably the best we'll do is
some kind of hack, like making the program
- source_sentence: >-
According to the passage, why are salesmen and top managers exceptions
when it comes to being rewarded for increased productivity within large
companies?
sentences:
- >-
olleague from 100 years ago, they'd just get into an ideological
argument. Yes, of course, you'll learn something by taking a psychology
class. The point is, you'll learn more by taking a class in another
department. The worthwhile departments, in my opinion, are math, the
hard sciences, engineering, history (especially economic and social
history, and the history of science), architecture, and the classics. A
survey course in art history may be worthwhile. Modern literature is
important, but the way to learn about it is just to read. I don't know
enough about music to say. You can skip the social sciences, philosophy,
and the various departments created recently in response to political
pressures. Many of these fields talk about important problems,
certainly. But the way they talk about them is useless. For example,
philosophy talks, among other things, about our obligations to one
another; but you can learn more about this from a wise grandmother or E.
B. White than from an academic philosopher. I speak here from
experience. I should probably have been offended when people laughed at
Clinton for saying "It depends on what the meaning of the word 'is' is."
I took about five classes in college on what the meaning o
- >-
at are a safe bet to be acquired for $20 million. There needs to be a
chance, however small, of the company becoming really big. Angels are
different in this respect. They're happy to invest in a company where
the most likely outcome is a $20 million acquisition if they can do it
at a low enough valuation. But of course they like companies that could
go public too. So having an ambitious long-term plan pleases everyone.
If you take VC money, you have to mean it, because the structure of VC
deals prevents early acquisitions. If you take VC money, they won't let
you sell early.7. VCs want to invest large amounts. The fact that
they're running investment funds makes VCs want to invest large amounts.
A typical VC fund is now hundreds of millions of dollars. If $400
million has to be invested by 10 partners, they have to invest $40
million each. VCs usually sit on the boards of companies they fund. If
the average deal size was $1 million, each partner would have to sit on
40 boards, which would not be fun. So they prefer bigger deals, where
they can put a lot of money to work at once. VCs don't regard you as a
bargain if you don't need a lot of money. That may even make you less
attractive, because it means their invest
- >-
imes as much wealth as an average employee. A programmer, for example,
instead of chugging along maintaining and updating an existing piece of
software, could write a whole new piece of software, and with it create
a new source of revenue. Companies are not set up to reward people who
want to do this. You can't go to your boss and say, I'd like to start
working ten times as hard, so will you please pay me ten times as much?
For one thing, the official fiction is that you are already working as
hard as you can. But a more serious problem is that the company has no
way of measuring the value of your work. Salesmen are an exception. It's
easy to measure how much revenue they generate, and they're usually paid
a percentage of it. If a salesman wants to work harder, he can just
start doing it, and he will automatically get paid proportionally more.
There is one other job besides sales where big companies can hire
first-rate people: in the top management jobs. And for the same reason:
their performance can be measured. The top managers are held responsible
for the performance of the entire company. Because an ordinary
employee's performance can't usually be measured, he is not expected to
do more than put in a solid effo
- source_sentence: >-
How can a startup founder's ambitions be influenced by YC (a startup
accelerator) and what is the potential trap founders often fall into when
they're trying to seem big?
sentences:
- >-
Written by Paul Graham
The Hardest Lessons for Startups to Learn
April 2006
In something that's out there, problems are alarming. There is a lot
more urgency once you release. And I think that's precisely why people
put it off. They know they'll have to work a lot harder once they do.
[2] 2. Keep Pumping Out Features. Of course, "release early" has a
second component, without which it would be bad advice. If you're going
to start with something that doesn't do much, you better improve it
fast. What I find myself repeating is "pump out features." And this rule
isn't just for the initial stages. This is something all startups should
do for as long as they want to be considered startups. I don't mean, of
course, that you should make your application ever more complex. By
"feature" I mean one unit of hacking-- one quantum of making users'
lives better. As with exercise, improvements beget improvements. If you
run every day, you'll probably feel like running tomorrow. But if you
skip running for a couple weeks, it will be an effort to drag yourself
out. So it is with hacking: the more ideas you implement, the more ideas
you'll have. You should make your system better at least in some small
way every day or two. This
- >-
e that they pay attention; it's when they notice you're still there.
It's just as well that it usually takes a while to gain momentum. Most
technologies evolve a good deal even after they're first launched —
programming languages especially. Nothing could be better, for a new
techology, than a few years of being used only by a small number of
early adopters. Early adopters are sophisticated and demanding, and
quickly flush out whatever flaws remain in your technology. When you
only have a few users you can be in close contact with all of them. And
early adopters are forgiving when you improve your system, even if this
causes some breakage. There are two ways new technology gets introduced:
the organic growth method, and the big bang method. The organic growth
method is exemplified by the classic seat-of-the-pants underfunded
garage startup. A couple guys, working in obscurity, develop some new
technology. They launch it with no marketing and initially have only a
few (fanatically devoted) users. They continue to improve the
technology, and meanwhile their user base grows by word of mouth. Before
they know it, they're big. The other approach, the big bang method, is
exemplified by the VC-backed, heavily marketed sta
- >-
It tipped from being this boulder we had to push to being a train car
that in fact had its own momentum."[4] One of the more subtle ways in
which YC can help founders is by calibrating their ambitions, because we
know exactly how a lot of successful startups looked when they were just
getting started.[5] If you're building something for which you can't
easily get a small set of users to observe — e. g. enterprise software —
and in a domain where you have no connections, you'll have to rely on
cold calls and introductions. But should you even be working on such an
idea?[6] Garry Tan pointed out an interesting trap founders fall into in
the beginning. They want so much to seem big that they imitate even the
flaws of big companies, like indifference to individual users. This
seems to them more "professional." Actually it's better to embrace the
fact that you're small and use whatever advantages that brings.[7] Your
user model almost couldn't be perfectly accurate, because users' needs
often change in response to what you build for them. Build them a
microcomputer, and suddenly they need to run spreadsheets on it, because
the arrival of your new microcomputer causes someone to invent the
spreadsheet.[8] If you have to
pipeline_tag: sentence-similarity
library_name: sentence-transformers
SentenceTransformer
This is a sentence-transformers model trained. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertModel'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
"How can a startup founder's ambitions be influenced by YC (a startup accelerator) and what is the potential trap founders often fall into when they're trying to seem big?",
'It tipped from being this boulder we had to push to being a train car that in fact had its own momentum."[4] One of the more subtle ways in which YC can help founders is by calibrating their ambitions, because we know exactly how a lot of successful startups looked when they were just getting started.[5] If you\'re building something for which you can\'t easily get a small set of users to observe — e. g. enterprise software — and in a domain where you have no connections, you\'ll have to rely on cold calls and introductions. But should you even be working on such an idea?[6] Garry Tan pointed out an interesting trap founders fall into in the beginning. They want so much to seem big that they imitate even the flaws of big companies, like indifference to individual users. This seems to them more "professional." Actually it\'s better to embrace the fact that you\'re small and use whatever advantages that brings.[7] Your user model almost couldn\'t be perfectly accurate, because users\' needs often change in response to what you build for them. Build them a microcomputer, and suddenly they need to run spreadsheets on it, because the arrival of your new microcomputer causes someone to invent the spreadsheet.[8] If you have to ',
"e that they pay attention; it's when they notice you're still there. It's just as well that it usually takes a while to gain momentum. Most technologies evolve a good deal even after they're first launched — programming languages especially. Nothing could be better, for a new techology, than a few years of being used only by a small number of early adopters. Early adopters are sophisticated and demanding, and quickly flush out whatever flaws remain in your technology. When you only have a few users you can be in close contact with all of them. And early adopters are forgiving when you improve your system, even if this causes some breakage. There are two ways new technology gets introduced: the organic growth method, and the big bang method. The organic growth method is exemplified by the classic seat-of-the-pants underfunded garage startup. A couple guys, working in obscurity, develop some new technology. They launch it with no marketing and initially have only a few (fanatically devoted) users. They continue to improve the technology, and meanwhile their user base grows by word of mouth. Before they know it, they're big. The other approach, the big bang method, is exemplified by the VC-backed, heavily marketed sta",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.5865, 0.4398],
# [0.5865, 1.0000, 0.3588],
# [0.4398, 0.3588, 1.0000]])
Training Details
Training Dataset
Unnamed Dataset
- Size: 1,496 training samples
- Columns:
sentence_0andsentence_1 - Approximate statistics based on the first 1000 samples:
sentence_0 sentence_1 type string string details - min: 16 tokens
- mean: 30.39 tokens
- max: 52 tokens
- min: 77 tokens
- mean: 274.6 tokens
- max: 359 tokens
- Samples:
sentence_0 sentence_1 According to the passage, what is more important than the size of a beachhead, and what characteristic must the people within it possess for it to be considered viable?urgently need, you have a beachhead. [11]The question then is whether that beachhead is big enough. Or more importantly, who's in it: if the beachhead consists of people doing something lots more people will be doing in the future, then it's probably big enough no matter how small it is. For example, if you're building something differentiated from competitors by the fact that it works on phones, but it only works on the newest phones, that's probably a big enough beachhead. Err on the side of doing things where you'll face competitors. Inexperienced founders usually give competitors more credit than they deserve. Whether you succeed depends far more on you than on your competitors. So better a good idea with competitors than a bad one without. You don't need to worry about entering a "crowded market" so long as you have a thesis about what everyone else in it is overlooking. In fact that's a very promising starting point. Google was that type of idea. Your thesis has to be more precis...According to the passage, what specific group of workers is uniquely affected by the "cost of checks," and why?So it was left to the Europeans to explore and eventually to dominate the rest of the world, including China. In more recent times, Sarbanes-Oxley has practically destroyed the US IPO market. That wasn't the intention of the legislators who wrote it. They just wanted to add a few more checks on public companies. But they forgot to consider the cost. They forgot that companies about to go public are usually rather stretched, and that the weight of a few extra checks that might be easy for General Electric to bear are enough to prevent younger companies from being public at all. Once you start to think about the cost of checks, you can start to ask other interesting questions. Is the cost increasing or decreasing? Is it higher in some areas than others? Where does it increase discontinuously? If large organizations started to ask questions like that, they'd learn some frightening things. I think the cost of checks may actually be increasing. The reason is that software plays an increasin...According to the passage, what is the most important thing an applicant can do during a Y Combinator interview, and why is this considered more valuable than meeting a higher standard of "convincingness"?ou're in unless there's some other disqualifying flaw. That is a hard standard to meet, though. Airbnb didn't meet it. They had the first part. They had made something they themselves wanted. But it wasn't spreading. So don't feel bad if you don't hit this gold standard of convincingness. If Airbnb didn't hit it, it must be too high. In practice, the YC partners will be satisfied if they feel that you have a deep understanding of your users' needs. And the Airbnbs did have that. They were able to tell us all about what motivated hosts and guests. They knew from first-hand experience, because they'd been the first hosts. We couldn't ask them a question they didn't know the answer to. We ourselves were not very excited about the idea as users, but we knew this didn't prove anything, because there were lots of successful startups we hadn't been excited about as users. We were able to say to ourselves "They seem to know what they're talking about. Maybe they're onto something. It's not gro... - Loss:
main.LoggableMNRLwith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim", "gather_across_devices": false }
Training Hyperparameters
Non-Default Hyperparameters
per_device_train_batch_size: 16per_device_eval_batch_size: 16num_train_epochs: 5fp16: Truemulti_dataset_batch_sampler: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 16per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 5max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthproject: huggingfacetrackio_space_id: trackioddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: noneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Trueprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robinrouter_mapping: {}learning_rate_mapping: {}
Framework Versions
- Python: 3.12.12
- Sentence Transformers: 5.1.2
- Transformers: 4.57.3
- PyTorch: 2.9.0+cu126
- Accelerate: 1.12.0
- Datasets: 4.0.0
- Tokenizers: 0.22.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
LoggableMNRL
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}