pg-mnr-bert / README.md
Tskunz's picture
Upload folder using huggingface_hub
b843246 verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - dense
  - generated_from_trainer
  - dataset_size:1496
  - loss:LoggableMNRL
widget:
  - source_sentence: >-
      According to the passage, why did the intellectual stagnation following
      Aristotle's work persist for so long?
    sentences:
      - >-
        world in 587, the Chinese system was very enlightened. Europeans didn't
        introduce formal civil service exams till the nineteenth century, and
        even then they seem to have been influenced by the Chinese example.
        Before credentials, government positions were obtained mainly by family
        influence, if not outright bribery. It was a great step forward to judge
        people by their performance on a test. But by no means a perfect
        solution. When you judge people that way, you tend to get cram
        schools—which they did in Ming China and nineteenth century England just
        as much as in present day South Korea. What cram schools are, in effect,
        is leaks in a seal. The use of credentials was an attempt to seal off
        the direct transmission of power between generations, and cram schools
        represent that power finding holes in the seal. Cram schools turn wealth
        in one generation into credentials in the next. It's hard to beat this
        phenomenon, because the schools adjust to suit whatever the tests
        measure. When the tests are narrow and predictable, you get cram schools
        on the classic model, like those that prepared candidates for Sandhurst
        (the British West Point) or the classes American students take now to
        improve their SAT scores. But as the te
      - >-
        tter, but you had no choice in the matter, if you needed money on the
        scale only VCs could supply. Now that VCs have competitors, that's going
        to put a market price on the help they offer. The interesting thing is,
        no one knows yet what it will be. Do startups that want to get really
        big need the sort of advice and connections only the top VCs can supply?
        Or would super-angel money do just as well? The VCs will say you need
        them, and the super-angels will say you don't. But the truth is, no one
        knows yet, not even the VCs and super-angels themselves. All the
        super-angels know is that their new model seems promising enough to be
        worth trying, and all the VCs know is that it seems promising enough to
        worry about. RoundsWhatever the outcome, the conflict between VCs and
        super-angels is good news for founders. And not just for the obvious
        reason that more competition for deals means better terms. The whole
        shape of deals is changing. One of the biggest differences between
        angels and VCs is the amount of your company they want. VCs want a lot.
        In a series A round they want a third of your company, if they can get
        it. They don't care much how much they pay for it, but they want a lot
        because the number of series A invest
      - ' the wrong direction as well. [8] Perhaps worst of all, he protected them from both the criticism of outsiders and the promptings of their own inner compass by establishing the principle that the most noble sort of theoretical knowledge had to be useless. The Metaphysics is mostly a failed experiment. A few ideas from it turned out to be worth keeping; the bulk of it has had no effect at all. The Metaphysics is among the least read of all famous books. It''s not hard to understand the way Newton''s Principia is, but the way a garbled message is. Arguably it''s an interesting failed experiment. But unfortunately that was not the conclusion Aristotle''s successors derived from works like the Metaphysics. [9] Soon after, the western world fell on intellectual hard times. Instead of version 1s to be superseded, the works of Plato and Aristotle became revered texts to be mastered and discussed. And so things remained for a shockingly long time. It was not till around 1600 (in Europe, where the center of gravity had shifted by then) that one found people confident enough to treat Aristotle''s work as a catalog of mistakes. And even then they rarely said so outright. If it seems surprising that the gap was so long, consider ho'
  - source_sentence: >-
      What is the main reason why Google's headquarters has a unique feel
      compared to a typical large company's headquarters?
    sentences:
      - ' his need. More or less. Higher ranking members of the military got more (as higher ranking members of socialist societies always do), but what they got was fixed according to their rank. And the flattening effect wasn''t limited to those under arms, because the US economy was conscripted too. Between 1942 and 1945 all wages were set by the National War Labor Board. Like the military, they defaulted to flatness. And this national standardization of wages was so pervasive that its effects could still be seen years after the war ended. [1]Business owners weren''t supposed to be making money either. FDR said "not a single war millionaire" would be permitted. To ensure that, any increase in a company''s profits over prewar levels was taxed at 85% And when what was left after corporate taxes reached individuals, it was taxed again at a marginal rate of 93% [2]Socially too the war tended to decrease variation. Over 16 million men and women from all sorts of different backgrounds were brought together in a way of life that was literally uniform. Service rates for men born in the early 1920s approached 80% And working toward a common goal, often under stress, brought them still closer together. Though strictly speaking World '
      - >-
        iew: Red Rock.7. GoogleGoogle spread out from its first building in
        Mountain View to a lot of the surrounding ones. But because the
        buildings were built at different times by different people, the place
        doesn't have the sterile, walled-off feel that a typical large company's
        headquarters have. It definitely has a flavor of its own though. You
        sense there is something afoot. The general atmos is vaguely utopian;
        there are lots of Priuses, and people who look like they drive them. You
        can't get into Google unless you know someone there. It's very much
        worth seeing inside if you can, though. Ditto for Facebook, at the end
        of California Ave in Palo Alto, though there is nothing to see
        outside.8. Skyline DriveSkyline Drive runs along the crest of the Santa
        Cruz mountains. On one side is the Valley, and on the other is the
        sea—which because it's cold and foggy and has few harbors, plays
        surprisingly little role in the lives of people in the Valley,
        considering how close it is. Along some parts of Skyline the dominant
        trees are huge redwoods, and in others they're live oaks. Redwoods mean
        those are the parts where the fog off the coast comes in at night;
        redwoods condense rain out of fog. The MROSD manages a collection of
      - >-
        Written by Paul Graham


        The Bus Ticket Theory of Genius


        November 2019


        Everyone knows that to do great work you need both natural ability and
        determination. But there's a third ingredient that's not as well
        understood: an obsessive interest in a particular topic. To explain this
        point I need to burn my reputation with some group of people, and I'm
        going to choose bus ticket collectors. There are people who collect old
        bus tickets. Like many collectors, they have an obsessive interest in
        the minutiae of what they collect. They can keep track of distinctions
        between different types of bus tickets that would be hard for the rest
        of us to remember. Because we don't care enough. What's the point of
        spending so much time thinking about old bus tickets?Which leads us to
        the second feature of this kind of obsession: there is no point. A bus
        ticket collector's love is disinterested. They're not doing it to
        impress us or to make themselves rich, but for its own sake. When you
        look at the lives of people who've done great work, you see a consistent
        pattern. They often begin with a bus ticket collector's obsessive
        interest in something that would have seemed pointless to most of their
        contemporaries. One of the most striking 
  - source_sentence: >-
      According to the passage, why is innocence important for children, and
      what consequence does early jadedness have on a person's development?
    sentences:
      - >-
        ful organizations is partly the history of techniques for preserving
        that excitement. [4]The team that made the original Macintosh were a
        great example of this phenomenon. People like Burrell Smith and Andy
        Hertzfeld and Bill Atkinson and Susan Kare were not just following
        orders. They were not tennis balls hit by Steve Jobs, but rockets let
        loose by Steve Jobs. There was a lot of collaboration between them, but
        they all seem to have individually felt the excitement of working on a
        project of one's own. In Andy Hertzfeld's book on the Macintosh, he
        describes how they'd come back into the office after dinner and work
        late into the night. People who've never experienced the thrill of
        working on a project they're excited about can't distinguish this kind
        of working long hours from the kind that happens in sweatshops and
        boiler rooms, but they're at opposite ends of the spectrum. That's why
        it's a mistake to insist dogmatically on "work/life balance." Indeed,
        the mere expression "work/life" embodies a mistake: it assumes work and
        life are distinct. For those to whom the word "work" automatically
        implies the dutiful plodding kind, they are. But for the skaters, the
        relationship between work and life would be better repr
      - >-
        tect helpless creatures, considering human offspring are so helpless for
        so long. Without the helplessness that makes kids cute, they'd be very
        annoying. They'd merely seem like incompetent adults. But there's more
        to it than that. The reason our hypothetical jaded 10 year old bothers
        me so much is not just that he'd be annoying, but that he'd have cut off
        his prospects for growth so early. To be jaded you have to think you
        know how the world works, and any theory a 10 year old had about that
        would probably be a pretty narrow one. Innocence is also
        open-mindedness. We want kids to be innocent so they can continue to
        learn. Paradoxical as it sounds, there are some kinds of knowledge that
        get in the way of other kinds of knowledge. If you're going to learn
        that the world is a brutal place full of people trying to take advantage
        of one another, you're better off learning it last. Otherwise you won't
        bother learning much more. Very smart adults often seem unusually
        innocent, and I don't think this is a coincidence. I think they've
        deliberately avoided learning about certain things. Certainly I do. I
        used to think I wanted to know everything. Now I know I don't.
        DeathAfter sex, death is the topic adults lie most conspic
      - >-
        do all eight things wrong. In fact, if you look at the way software gets
        written in most organizations, it's almost as if they were deliberately
        trying to do things wrong. In a sense, they are. One of the defining
        qualities of organizations since there have been such a thing is to
        treat individuals as interchangeable parts. This works well for more
        parallelizable tasks, like fighting wars. For most of history a
        well-drilled army of professional soldiers could be counted on to beat
        an army of individual warriors, no matter how valorous. But having ideas
        is not very parallelizable. And that's what programs are: ideas. It's
        not merely true that organizations dislike the idea of depending on
        individual genius, it's a tautology. It's part of the definition of an
        organization not to. Of our current concept of an organization, at
        least. Maybe we could define a new kind of organization that combined
        the efforts of individuals without requiring them to be interchangeable.
        Arguably a market is such a form of organization, though it may be more
        accurate to describe a market as a degenerate case—as what you get by
        default when organization isn't possible. Probably the best we'll do is
        some kind of hack, like making the program
  - source_sentence: >-
      According to the passage, why are salesmen and top managers exceptions
      when it comes to being rewarded for increased productivity within large
      companies?
    sentences:
      - >-
        olleague from 100 years ago, they'd just get into an ideological
        argument. Yes, of course, you'll learn something by taking a psychology
        class. The point is, you'll learn more by taking a class in another
        department. The worthwhile departments, in my opinion, are math, the
        hard sciences, engineering, history (especially economic and social
        history, and the history of science), architecture, and the classics. A
        survey course in art history may be worthwhile. Modern literature is
        important, but the way to learn about it is just to read. I don't know
        enough about music to say. You can skip the social sciences, philosophy,
        and the various departments created recently in response to political
        pressures. Many of these fields talk about important problems,
        certainly. But the way they talk about them is useless. For example,
        philosophy talks, among other things, about our obligations to one
        another; but you can learn more about this from a wise grandmother or E.
        B. White than from an academic philosopher. I speak here from
        experience. I should probably have been offended when people laughed at
        Clinton for saying "It depends on what the meaning of the word 'is' is."
        I took about five classes in college on what the meaning o
      - >-
        at are a safe bet to be acquired for $20 million. There needs to be a
        chance, however small, of the company becoming really big. Angels are
        different in this respect. They're happy to invest in a company where
        the most likely outcome is a $20 million acquisition if they can do it
        at a low enough valuation. But of course they like companies that could
        go public too. So having an ambitious long-term plan pleases everyone.
        If you take VC money, you have to mean it, because the structure of VC
        deals prevents early acquisitions. If you take VC money, they won't let
        you sell early.7. VCs want to invest large amounts. The fact that
        they're running investment funds makes VCs want to invest large amounts.
        A typical VC fund is now hundreds of millions of dollars. If $400
        million has to be invested by 10 partners, they have to invest $40
        million each. VCs usually sit on the boards of companies they fund. If
        the average deal size was $1 million, each partner would have to sit on
        40 boards, which would not be fun. So they prefer bigger deals, where
        they can put a lot of money to work at once. VCs don't regard you as a
        bargain if you don't need a lot of money. That may even make you less
        attractive, because it means their invest
      - >-
        imes as much wealth as an average employee. A programmer, for example,
        instead of chugging along maintaining and updating an existing piece of
        software, could write a whole new piece of software, and with it create
        a new source of revenue. Companies are not set up to reward people who
        want to do this. You can't go to your boss and say, I'd like to start
        working ten times as hard, so will you please pay me ten times as much?
        For one thing, the official fiction is that you are already working as
        hard as you can. But a more serious problem is that the company has no
        way of measuring the value of your work. Salesmen are an exception. It's
        easy to measure how much revenue they generate, and they're usually paid
        a percentage of it. If a salesman wants to work harder, he can just
        start doing it, and he will automatically get paid proportionally more.
        There is one other job besides sales where big companies can hire
        first-rate people: in the top management jobs. And for the same reason:
        their performance can be measured. The top managers are held responsible
        for the performance of the entire company. Because an ordinary
        employee's performance can't usually be measured, he is not expected to
        do more than put in a solid effo
  - source_sentence: >-
      How can a startup founder's ambitions be influenced by YC (a startup
      accelerator) and what is the potential trap founders often fall into when
      they're trying to seem big?
    sentences:
      - >-
        Written by Paul Graham


        The Hardest Lessons for Startups to Learn


        April 2006


        In something that's out there, problems are alarming. There is a lot
        more urgency once you release. And I think that's precisely why people
        put it off. They know they'll have to work a lot harder once they do.
        [2] 2. Keep Pumping Out Features. Of course, "release early" has a
        second component, without which it would be bad advice. If you're going
        to start with something that doesn't do much, you better improve it
        fast. What I find myself repeating is "pump out features." And this rule
        isn't just for the initial stages. This is something all startups should
        do for as long as they want to be considered startups. I don't mean, of
        course, that you should make your application ever more complex. By
        "feature" I mean one unit of hacking-- one quantum of making users'
        lives better. As with exercise, improvements beget improvements. If you
        run every day, you'll probably feel like running tomorrow. But if you
        skip running for a couple weeks, it will be an effort to drag yourself
        out. So it is with hacking: the more ideas you implement, the more ideas
        you'll have. You should make your system better at least in some small
        way every day or two. This 
      - >-
        e that they pay attention; it's when they notice you're still there.
        It's just as well that it usually takes a while to gain momentum. Most
        technologies evolve a good deal even after they're first launched —
        programming languages especially. Nothing could be better, for a new
        techology, than a few years of being used only by a small number of
        early adopters. Early adopters are sophisticated and demanding, and
        quickly flush out whatever flaws remain in your technology. When you
        only have a few users you can be in close contact with all of them. And
        early adopters are forgiving when you improve your system, even if this
        causes some breakage. There are two ways new technology gets introduced:
        the organic growth method, and the big bang method. The organic growth
        method is exemplified by the classic seat-of-the-pants underfunded
        garage startup. A couple guys, working in obscurity, develop some new
        technology. They launch it with no marketing and initially have only a
        few (fanatically devoted) users. They continue to improve the
        technology, and meanwhile their user base grows by word of mouth. Before
        they know it, they're big. The other approach, the big bang method, is
        exemplified by the VC-backed, heavily marketed sta
      - >-
        It tipped from being this boulder we had to push to being a train car
        that in fact had its own momentum."[4] One of the more subtle ways in
        which YC can help founders is by calibrating their ambitions, because we
        know exactly how a lot of successful startups looked when they were just
        getting started.[5] If you're building something for which you can't
        easily get a small set of users to observe — e. g. enterprise software —
        and in a domain where you have no connections, you'll have to rely on
        cold calls and introductions. But should you even be working on such an
        idea?[6] Garry Tan pointed out an interesting trap founders fall into in
        the beginning. They want so much to seem big that they imitate even the
        flaws of big companies, like indifference to individual users. This
        seems to them more "professional." Actually it's better to embrace the
        fact that you're small and use whatever advantages that brings.[7] Your
        user model almost couldn't be perfectly accurate, because users' needs
        often change in response to what you build for them. Build them a
        microcomputer, and suddenly they need to run spreadsheets on it, because
        the arrival of your new microcomputer causes someone to invent the
        spreadsheet.[8] If you have to 
pipeline_tag: sentence-similarity
library_name: sentence-transformers

SentenceTransformer

This is a sentence-transformers model trained. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    "How can a startup founder's ambitions be influenced by YC (a startup accelerator) and what is the potential trap founders often fall into when they're trying to seem big?",
    'It tipped from being this boulder we had to push to being a train car that in fact had its own momentum."[4] One of the more subtle ways in which YC can help founders is by calibrating their ambitions, because we know exactly how a lot of successful startups looked when they were just getting started.[5] If you\'re building something for which you can\'t easily get a small set of users to observe — e. g. enterprise software — and in a domain where you have no connections, you\'ll have to rely on cold calls and introductions. But should you even be working on such an idea?[6] Garry Tan pointed out an interesting trap founders fall into in the beginning. They want so much to seem big that they imitate even the flaws of big companies, like indifference to individual users. This seems to them more "professional." Actually it\'s better to embrace the fact that you\'re small and use whatever advantages that brings.[7] Your user model almost couldn\'t be perfectly accurate, because users\' needs often change in response to what you build for them. Build them a microcomputer, and suddenly they need to run spreadsheets on it, because the arrival of your new microcomputer causes someone to invent the spreadsheet.[8] If you have to ',
    "e that they pay attention; it's when they notice you're still there. It's just as well that it usually takes a while to gain momentum. Most technologies evolve a good deal even after they're first launched — programming languages especially. Nothing could be better, for a new techology, than a few years of being used only by a small number of early adopters. Early adopters are sophisticated and demanding, and quickly flush out whatever flaws remain in your technology. When you only have a few users you can be in close contact with all of them. And early adopters are forgiving when you improve your system, even if this causes some breakage. There are two ways new technology gets introduced: the organic growth method, and the big bang method. The organic growth method is exemplified by the classic seat-of-the-pants underfunded garage startup. A couple guys, working in obscurity, develop some new technology. They launch it with no marketing and initially have only a few (fanatically devoted) users. They continue to improve the technology, and meanwhile their user base grows by word of mouth. Before they know it, they're big. The other approach, the big bang method, is exemplified by the VC-backed, heavily marketed sta",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.5865, 0.4398],
#         [0.5865, 1.0000, 0.3588],
#         [0.4398, 0.3588, 1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 1,496 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 16 tokens
    • mean: 30.39 tokens
    • max: 52 tokens
    • min: 77 tokens
    • mean: 274.6 tokens
    • max: 359 tokens
  • Samples:
    sentence_0 sentence_1
    According to the passage, what is more important than the size of a beachhead, and what characteristic must the people within it possess for it to be considered viable? urgently need, you have a beachhead. [11]The question then is whether that beachhead is big enough. Or more importantly, who's in it: if the beachhead consists of people doing something lots more people will be doing in the future, then it's probably big enough no matter how small it is. For example, if you're building something differentiated from competitors by the fact that it works on phones, but it only works on the newest phones, that's probably a big enough beachhead. Err on the side of doing things where you'll face competitors. Inexperienced founders usually give competitors more credit than they deserve. Whether you succeed depends far more on you than on your competitors. So better a good idea with competitors than a bad one without. You don't need to worry about entering a "crowded market" so long as you have a thesis about what everyone else in it is overlooking. In fact that's a very promising starting point. Google was that type of idea. Your thesis has to be more precis...
    According to the passage, what specific group of workers is uniquely affected by the "cost of checks," and why? So it was left to the Europeans to explore and eventually to dominate the rest of the world, including China. In more recent times, Sarbanes-Oxley has practically destroyed the US IPO market. That wasn't the intention of the legislators who wrote it. They just wanted to add a few more checks on public companies. But they forgot to consider the cost. They forgot that companies about to go public are usually rather stretched, and that the weight of a few extra checks that might be easy for General Electric to bear are enough to prevent younger companies from being public at all. Once you start to think about the cost of checks, you can start to ask other interesting questions. Is the cost increasing or decreasing? Is it higher in some areas than others? Where does it increase discontinuously? If large organizations started to ask questions like that, they'd learn some frightening things. I think the cost of checks may actually be increasing. The reason is that software plays an increasin...
    According to the passage, what is the most important thing an applicant can do during a Y Combinator interview, and why is this considered more valuable than meeting a higher standard of "convincingness"? ou're in unless there's some other disqualifying flaw. That is a hard standard to meet, though. Airbnb didn't meet it. They had the first part. They had made something they themselves wanted. But it wasn't spreading. So don't feel bad if you don't hit this gold standard of convincingness. If Airbnb didn't hit it, it must be too high. In practice, the YC partners will be satisfied if they feel that you have a deep understanding of your users' needs. And the Airbnbs did have that. They were able to tell us all about what motivated hosts and guests. They knew from first-hand experience, because they'd been the first hosts. We couldn't ask them a question they didn't know the answer to. We ourselves were not very excited about the idea as users, but we knew this didn't prove anything, because there were lots of successful startups we hadn't been excited about as users. We were able to say to ourselves "They seem to know what they're talking about. Maybe they're onto something. It's not gro...
  • Loss: main.LoggableMNRL with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 5
  • fp16: True
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Framework Versions

  • Python: 3.12.12
  • Sentence Transformers: 5.1.2
  • Transformers: 4.57.3
  • PyTorch: 2.9.0+cu126
  • Accelerate: 1.12.0
  • Datasets: 4.0.0
  • Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

LoggableMNRL

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}