Tskunz commited on
Commit
b843246
·
verified ·
1 Parent(s): acde4fb

Upload folder using huggingface_hub

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,604 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - dense
7
+ - generated_from_trainer
8
+ - dataset_size:1496
9
+ - loss:LoggableMNRL
10
+ widget:
11
+ - source_sentence: According to the passage, why did the intellectual stagnation following
12
+ Aristotle's work persist for so long?
13
+ sentences:
14
+ - world in 587, the Chinese system was very enlightened. Europeans didn't introduce
15
+ formal civil service exams till the nineteenth century, and even then they seem
16
+ to have been influenced by the Chinese example. Before credentials, government
17
+ positions were obtained mainly by family influence, if not outright bribery. It
18
+ was a great step forward to judge people by their performance on a test. But by
19
+ no means a perfect solution. When you judge people that way, you tend to get cram
20
+ schools—which they did in Ming China and nineteenth century England just as much
21
+ as in present day South Korea. What cram schools are, in effect, is leaks in a
22
+ seal. The use of credentials was an attempt to seal off the direct transmission
23
+ of power between generations, and cram schools represent that power finding holes
24
+ in the seal. Cram schools turn wealth in one generation into credentials in the
25
+ next. It's hard to beat this phenomenon, because the schools adjust to suit whatever
26
+ the tests measure. When the tests are narrow and predictable, you get cram schools
27
+ on the classic model, like those that prepared candidates for Sandhurst (the British
28
+ West Point) or the classes American students take now to improve their SAT scores.
29
+ But as the te
30
+ - tter, but you had no choice in the matter, if you needed money on the scale only
31
+ VCs could supply. Now that VCs have competitors, that's going to put a market
32
+ price on the help they offer. The interesting thing is, no one knows yet what
33
+ it will be. Do startups that want to get really big need the sort of advice and
34
+ connections only the top VCs can supply? Or would super-angel money do just as
35
+ well? The VCs will say you need them, and the super-angels will say you don't.
36
+ But the truth is, no one knows yet, not even the VCs and super-angels themselves.
37
+ All the super-angels know is that their new model seems promising enough to be
38
+ worth trying, and all the VCs know is that it seems promising enough to worry
39
+ about. RoundsWhatever the outcome, the conflict between VCs and super-angels is
40
+ good news for founders. And not just for the obvious reason that more competition
41
+ for deals means better terms. The whole shape of deals is changing. One of the
42
+ biggest differences between angels and VCs is the amount of your company they
43
+ want. VCs want a lot. In a series A round they want a third of your company, if
44
+ they can get it. They don't care much how much they pay for it, but they want
45
+ a lot because the number of series A invest
46
+ - ' the wrong direction as well. [8] Perhaps worst of all, he protected them from
47
+ both the criticism of outsiders and the promptings of their own inner compass
48
+ by establishing the principle that the most noble sort of theoretical knowledge
49
+ had to be useless. The Metaphysics is mostly a failed experiment. A few ideas
50
+ from it turned out to be worth keeping; the bulk of it has had no effect at all.
51
+ The Metaphysics is among the least read of all famous books. It''s not hard to
52
+ understand the way Newton''s Principia is, but the way a garbled message is. Arguably
53
+ it''s an interesting failed experiment. But unfortunately that was not the conclusion
54
+ Aristotle''s successors derived from works like the Metaphysics. [9] Soon after,
55
+ the western world fell on intellectual hard times. Instead of version 1s to be
56
+ superseded, the works of Plato and Aristotle became revered texts to be mastered
57
+ and discussed. And so things remained for a shockingly long time. It was not till
58
+ around 1600 (in Europe, where the center of gravity had shifted by then) that
59
+ one found people confident enough to treat Aristotle''s work as a catalog of mistakes.
60
+ And even then they rarely said so outright. If it seems surprising that the gap
61
+ was so long, consider ho'
62
+ - source_sentence: What is the main reason why Google's headquarters has a unique
63
+ feel compared to a typical large company's headquarters?
64
+ sentences:
65
+ - ' his need. More or less. Higher ranking members of the military got more (as
66
+ higher ranking members of socialist societies always do), but what they got was
67
+ fixed according to their rank. And the flattening effect wasn''t limited to those
68
+ under arms, because the US economy was conscripted too. Between 1942 and 1945
69
+ all wages were set by the National War Labor Board. Like the military, they defaulted
70
+ to flatness. And this national standardization of wages was so pervasive that
71
+ its effects could still be seen years after the war ended. [1]Business owners
72
+ weren''t supposed to be making money either. FDR said "not a single war millionaire"
73
+ would be permitted. To ensure that, any increase in a company''s profits over
74
+ prewar levels was taxed at 85% And when what was left after corporate taxes reached
75
+ individuals, it was taxed again at a marginal rate of 93% [2]Socially too the
76
+ war tended to decrease variation. Over 16 million men and women from all sorts
77
+ of different backgrounds were brought together in a way of life that was literally
78
+ uniform. Service rates for men born in the early 1920s approached 80% And working
79
+ toward a common goal, often under stress, brought them still closer together.
80
+ Though strictly speaking World '
81
+ - 'iew: Red Rock.7. GoogleGoogle spread out from its first building in Mountain
82
+ View to a lot of the surrounding ones. But because the buildings were built at
83
+ different times by different people, the place doesn''t have the sterile, walled-off
84
+ feel that a typical large company''s headquarters have. It definitely has a flavor
85
+ of its own though. You sense there is something afoot. The general atmos is vaguely
86
+ utopian; there are lots of Priuses, and people who look like they drive them.
87
+ You can''t get into Google unless you know someone there. It''s very much worth
88
+ seeing inside if you can, though. Ditto for Facebook, at the end of California
89
+ Ave in Palo Alto, though there is nothing to see outside.8. Skyline DriveSkyline
90
+ Drive runs along the crest of the Santa Cruz mountains. On one side is the Valley,
91
+ and on the other is the sea—which because it''s cold and foggy and has few harbors,
92
+ plays surprisingly little role in the lives of people in the Valley, considering
93
+ how close it is. Along some parts of Skyline the dominant trees are huge redwoods,
94
+ and in others they''re live oaks. Redwoods mean those are the parts where the
95
+ fog off the coast comes in at night; redwoods condense rain out of fog. The MROSD
96
+ manages a collection of'
97
+ - 'Written by Paul Graham
98
+
99
+
100
+ The Bus Ticket Theory of Genius
101
+
102
+
103
+ November 2019
104
+
105
+
106
+ Everyone knows that to do great work you need both natural ability and determination.
107
+ But there''s a third ingredient that''s not as well understood: an obsessive interest
108
+ in a particular topic. To explain this point I need to burn my reputation with
109
+ some group of people, and I''m going to choose bus ticket collectors. There are
110
+ people who collect old bus tickets. Like many collectors, they have an obsessive
111
+ interest in the minutiae of what they collect. They can keep track of distinctions
112
+ between different types of bus tickets that would be hard for the rest of us to
113
+ remember. Because we don''t care enough. What''s the point of spending so much
114
+ time thinking about old bus tickets?Which leads us to the second feature of this
115
+ kind of obsession: there is no point. A bus ticket collector''s love is disinterested.
116
+ They''re not doing it to impress us or to make themselves rich, but for its own
117
+ sake. When you look at the lives of people who''ve done great work, you see a
118
+ consistent pattern. They often begin with a bus ticket collector''s obsessive
119
+ interest in something that would have seemed pointless to most of their contemporaries.
120
+ One of the most striking '
121
+ - source_sentence: According to the passage, why is innocence important for children,
122
+ and what consequence does early jadedness have on a person's development?
123
+ sentences:
124
+ - 'ful organizations is partly the history of techniques for preserving that excitement.
125
+ [4]The team that made the original Macintosh were a great example of this phenomenon.
126
+ People like Burrell Smith and Andy Hertzfeld and Bill Atkinson and Susan Kare
127
+ were not just following orders. They were not tennis balls hit by Steve Jobs,
128
+ but rockets let loose by Steve Jobs. There was a lot of collaboration between
129
+ them, but they all seem to have individually felt the excitement of working on
130
+ a project of one''s own. In Andy Hertzfeld''s book on the Macintosh, he describes
131
+ how they''d come back into the office after dinner and work late into the night.
132
+ People who''ve never experienced the thrill of working on a project they''re excited
133
+ about can''t distinguish this kind of working long hours from the kind that happens
134
+ in sweatshops and boiler rooms, but they''re at opposite ends of the spectrum.
135
+ That''s why it''s a mistake to insist dogmatically on "work/life balance." Indeed,
136
+ the mere expression "work/life" embodies a mistake: it assumes work and life are
137
+ distinct. For those to whom the word "work" automatically implies the dutiful
138
+ plodding kind, they are. But for the skaters, the relationship between work and
139
+ life would be better repr'
140
+ - tect helpless creatures, considering human offspring are so helpless for so long.
141
+ Without the helplessness that makes kids cute, they'd be very annoying. They'd
142
+ merely seem like incompetent adults. But there's more to it than that. The reason
143
+ our hypothetical jaded 10 year old bothers me so much is not just that he'd be
144
+ annoying, but that he'd have cut off his prospects for growth so early. To be
145
+ jaded you have to think you know how the world works, and any theory a 10 year
146
+ old had about that would probably be a pretty narrow one. Innocence is also open-mindedness.
147
+ We want kids to be innocent so they can continue to learn. Paradoxical as it sounds,
148
+ there are some kinds of knowledge that get in the way of other kinds of knowledge.
149
+ If you're going to learn that the world is a brutal place full of people trying
150
+ to take advantage of one another, you're better off learning it last. Otherwise
151
+ you won't bother learning much more. Very smart adults often seem unusually innocent,
152
+ and I don't think this is a coincidence. I think they've deliberately avoided
153
+ learning about certain things. Certainly I do. I used to think I wanted to know
154
+ everything. Now I know I don't. DeathAfter sex, death is the topic adults lie
155
+ most conspic
156
+ - 'do all eight things wrong. In fact, if you look at the way software gets written
157
+ in most organizations, it''s almost as if they were deliberately trying to do
158
+ things wrong. In a sense, they are. One of the defining qualities of organizations
159
+ since there have been such a thing is to treat individuals as interchangeable
160
+ parts. This works well for more parallelizable tasks, like fighting wars. For
161
+ most of history a well-drilled army of professional soldiers could be counted
162
+ on to beat an army of individual warriors, no matter how valorous. But having
163
+ ideas is not very parallelizable. And that''s what programs are: ideas. It''s
164
+ not merely true that organizations dislike the idea of depending on individual
165
+ genius, it''s a tautology. It''s part of the definition of an organization not
166
+ to. Of our current concept of an organization, at least. Maybe we could define
167
+ a new kind of organization that combined the efforts of individuals without requiring
168
+ them to be interchangeable. Arguably a market is such a form of organization,
169
+ though it may be more accurate to describe a market as a degenerate case—as what
170
+ you get by default when organization isn''t possible. Probably the best we''ll
171
+ do is some kind of hack, like making the program'
172
+ - source_sentence: According to the passage, why are salesmen and top managers exceptions
173
+ when it comes to being rewarded for increased productivity within large companies?
174
+ sentences:
175
+ - olleague from 100 years ago, they'd just get into an ideological argument. Yes,
176
+ of course, you'll learn something by taking a psychology class. The point is,
177
+ you'll learn more by taking a class in another department. The worthwhile departments,
178
+ in my opinion, are math, the hard sciences, engineering, history (especially economic
179
+ and social history, and the history of science), architecture, and the classics.
180
+ A survey course in art history may be worthwhile. Modern literature is important,
181
+ but the way to learn about it is just to read. I don't know enough about music
182
+ to say. You can skip the social sciences, philosophy, and the various departments
183
+ created recently in response to political pressures. Many of these fields talk
184
+ about important problems, certainly. But the way they talk about them is useless.
185
+ For example, philosophy talks, among other things, about our obligations to one
186
+ another; but you can learn more about this from a wise grandmother or E. B. White
187
+ than from an academic philosopher. I speak here from experience. I should probably
188
+ have been offended when people laughed at Clinton for saying "It depends on what
189
+ the meaning of the word 'is' is." I took about five classes in college on what
190
+ the meaning o
191
+ - at are a safe bet to be acquired for $20 million. There needs to be a chance,
192
+ however small, of the company becoming really big. Angels are different in this
193
+ respect. They're happy to invest in a company where the most likely outcome is
194
+ a $20 million acquisition if they can do it at a low enough valuation. But of
195
+ course they like companies that could go public too. So having an ambitious long-term
196
+ plan pleases everyone. If you take VC money, you have to mean it, because the
197
+ structure of VC deals prevents early acquisitions. If you take VC money, they
198
+ won't let you sell early.7. VCs want to invest large amounts. The fact that they're
199
+ running investment funds makes VCs want to invest large amounts. A typical VC
200
+ fund is now hundreds of millions of dollars. If $400 million has to be invested
201
+ by 10 partners, they have to invest $40 million each. VCs usually sit on the boards
202
+ of companies they fund. If the average deal size was $1 million, each partner
203
+ would have to sit on 40 boards, which would not be fun. So they prefer bigger
204
+ deals, where they can put a lot of money to work at once. VCs don't regard you
205
+ as a bargain if you don't need a lot of money. That may even make you less attractive,
206
+ because it means their invest
207
+ - 'imes as much wealth as an average employee. A programmer, for example, instead
208
+ of chugging along maintaining and updating an existing piece of software, could
209
+ write a whole new piece of software, and with it create a new source of revenue.
210
+ Companies are not set up to reward people who want to do this. You can''t go to
211
+ your boss and say, I''d like to start working ten times as hard, so will you please
212
+ pay me ten times as much? For one thing, the official fiction is that you are
213
+ already working as hard as you can. But a more serious problem is that the company
214
+ has no way of measuring the value of your work. Salesmen are an exception. It''s
215
+ easy to measure how much revenue they generate, and they''re usually paid a percentage
216
+ of it. If a salesman wants to work harder, he can just start doing it, and he
217
+ will automatically get paid proportionally more. There is one other job besides
218
+ sales where big companies can hire first-rate people: in the top management jobs.
219
+ And for the same reason: their performance can be measured. The top managers are
220
+ held responsible for the performance of the entire company. Because an ordinary
221
+ employee''s performance can''t usually be measured, he is not expected to do more
222
+ than put in a solid effo'
223
+ - source_sentence: How can a startup founder's ambitions be influenced by YC (a startup
224
+ accelerator) and what is the potential trap founders often fall into when they're
225
+ trying to seem big?
226
+ sentences:
227
+ - 'Written by Paul Graham
228
+
229
+
230
+ The Hardest Lessons for Startups to Learn
231
+
232
+
233
+ April 2006
234
+
235
+
236
+ In something that''s out there, problems are alarming. There is a lot more urgency
237
+ once you release. And I think that''s precisely why people put it off. They know
238
+ they''ll have to work a lot harder once they do. [2] 2. Keep Pumping Out Features.
239
+ Of course, "release early" has a second component, without which it would be bad
240
+ advice. If you''re going to start with something that doesn''t do much, you better
241
+ improve it fast. What I find myself repeating is "pump out features." And this
242
+ rule isn''t just for the initial stages. This is something all startups should
243
+ do for as long as they want to be considered startups. I don''t mean, of course,
244
+ that you should make your application ever more complex. By "feature" I mean one
245
+ unit of hacking-- one quantum of making users'' lives better. As with exercise,
246
+ improvements beget improvements. If you run every day, you''ll probably feel like
247
+ running tomorrow. But if you skip running for a couple weeks, it will be an effort
248
+ to drag yourself out. So it is with hacking: the more ideas you implement, the
249
+ more ideas you''ll have. You should make your system better at least in some small
250
+ way every day or two. This '
251
+ - 'e that they pay attention; it''s when they notice you''re still there. It''s
252
+ just as well that it usually takes a while to gain momentum. Most technologies
253
+ evolve a good deal even after they''re first launched — programming languages
254
+ especially. Nothing could be better, for a new techology, than a few years of
255
+ being used only by a small number of early adopters. Early adopters are sophisticated
256
+ and demanding, and quickly flush out whatever flaws remain in your technology.
257
+ When you only have a few users you can be in close contact with all of them. And
258
+ early adopters are forgiving when you improve your system, even if this causes
259
+ some breakage. There are two ways new technology gets introduced: the organic
260
+ growth method, and the big bang method. The organic growth method is exemplified
261
+ by the classic seat-of-the-pants underfunded garage startup. A couple guys, working
262
+ in obscurity, develop some new technology. They launch it with no marketing and
263
+ initially have only a few (fanatically devoted) users. They continue to improve
264
+ the technology, and meanwhile their user base grows by word of mouth. Before they
265
+ know it, they''re big. The other approach, the big bang method, is exemplified
266
+ by the VC-backed, heavily marketed sta'
267
+ - 'It tipped from being this boulder we had to push to being a train car that in
268
+ fact had its own momentum."[4] One of the more subtle ways in which YC can help
269
+ founders is by calibrating their ambitions, because we know exactly how a lot
270
+ of successful startups looked when they were just getting started.[5] If you''re
271
+ building something for which you can''t easily get a small set of users to observe
272
+ — e. g. enterprise software — and in a domain where you have no connections, you''ll
273
+ have to rely on cold calls and introductions. But should you even be working on
274
+ such an idea?[6] Garry Tan pointed out an interesting trap founders fall into
275
+ in the beginning. They want so much to seem big that they imitate even the flaws
276
+ of big companies, like indifference to individual users. This seems to them more
277
+ "professional." Actually it''s better to embrace the fact that you''re small and
278
+ use whatever advantages that brings.[7] Your user model almost couldn''t be perfectly
279
+ accurate, because users'' needs often change in response to what you build for
280
+ them. Build them a microcomputer, and suddenly they need to run spreadsheets on
281
+ it, because the arrival of your new microcomputer causes someone to invent the
282
+ spreadsheet.[8] If you have to '
283
+ pipeline_tag: sentence-similarity
284
+ library_name: sentence-transformers
285
+ ---
286
+
287
+ # SentenceTransformer
288
+
289
+ This is a [sentence-transformers](https://www.SBERT.net) model trained. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
290
+
291
+ ## Model Details
292
+
293
+ ### Model Description
294
+ - **Model Type:** Sentence Transformer
295
+ <!-- - **Base model:** [Unknown](https://huggingface.co/unknown) -->
296
+ - **Maximum Sequence Length:** 512 tokens
297
+ - **Output Dimensionality:** 768 dimensions
298
+ - **Similarity Function:** Cosine Similarity
299
+ <!-- - **Training Dataset:** Unknown -->
300
+ <!-- - **Language:** Unknown -->
301
+ <!-- - **License:** Unknown -->
302
+
303
+ ### Model Sources
304
+
305
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
306
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/huggingface/sentence-transformers)
307
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
308
+
309
+ ### Full Model Architecture
310
+
311
+ ```
312
+ SentenceTransformer(
313
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertModel'})
314
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
315
+ )
316
+ ```
317
+
318
+ ## Usage
319
+
320
+ ### Direct Usage (Sentence Transformers)
321
+
322
+ First install the Sentence Transformers library:
323
+
324
+ ```bash
325
+ pip install -U sentence-transformers
326
+ ```
327
+
328
+ Then you can load this model and run inference.
329
+ ```python
330
+ from sentence_transformers import SentenceTransformer
331
+
332
+ # Download from the 🤗 Hub
333
+ model = SentenceTransformer("sentence_transformers_model_id")
334
+ # Run inference
335
+ sentences = [
336
+ "How can a startup founder's ambitions be influenced by YC (a startup accelerator) and what is the potential trap founders often fall into when they're trying to seem big?",
337
+ 'It tipped from being this boulder we had to push to being a train car that in fact had its own momentum."[4] One of the more subtle ways in which YC can help founders is by calibrating their ambitions, because we know exactly how a lot of successful startups looked when they were just getting started.[5] If you\'re building something for which you can\'t easily get a small set of users to observe — e. g. enterprise software — and in a domain where you have no connections, you\'ll have to rely on cold calls and introductions. But should you even be working on such an idea?[6] Garry Tan pointed out an interesting trap founders fall into in the beginning. They want so much to seem big that they imitate even the flaws of big companies, like indifference to individual users. This seems to them more "professional." Actually it\'s better to embrace the fact that you\'re small and use whatever advantages that brings.[7] Your user model almost couldn\'t be perfectly accurate, because users\' needs often change in response to what you build for them. Build them a microcomputer, and suddenly they need to run spreadsheets on it, because the arrival of your new microcomputer causes someone to invent the spreadsheet.[8] If you have to ',
338
+ "e that they pay attention; it's when they notice you're still there. It's just as well that it usually takes a while to gain momentum. Most technologies evolve a good deal even after they're first launched — programming languages especially. Nothing could be better, for a new techology, than a few years of being used only by a small number of early adopters. Early adopters are sophisticated and demanding, and quickly flush out whatever flaws remain in your technology. When you only have a few users you can be in close contact with all of them. And early adopters are forgiving when you improve your system, even if this causes some breakage. There are two ways new technology gets introduced: the organic growth method, and the big bang method. The organic growth method is exemplified by the classic seat-of-the-pants underfunded garage startup. A couple guys, working in obscurity, develop some new technology. They launch it with no marketing and initially have only a few (fanatically devoted) users. They continue to improve the technology, and meanwhile their user base grows by word of mouth. Before they know it, they're big. The other approach, the big bang method, is exemplified by the VC-backed, heavily marketed sta",
339
+ ]
340
+ embeddings = model.encode(sentences)
341
+ print(embeddings.shape)
342
+ # [3, 768]
343
+
344
+ # Get the similarity scores for the embeddings
345
+ similarities = model.similarity(embeddings, embeddings)
346
+ print(similarities)
347
+ # tensor([[1.0000, 0.5865, 0.4398],
348
+ # [0.5865, 1.0000, 0.3588],
349
+ # [0.4398, 0.3588, 1.0000]])
350
+ ```
351
+
352
+ <!--
353
+ ### Direct Usage (Transformers)
354
+
355
+ <details><summary>Click to see the direct usage in Transformers</summary>
356
+
357
+ </details>
358
+ -->
359
+
360
+ <!--
361
+ ### Downstream Usage (Sentence Transformers)
362
+
363
+ You can finetune this model on your own dataset.
364
+
365
+ <details><summary>Click to expand</summary>
366
+
367
+ </details>
368
+ -->
369
+
370
+ <!--
371
+ ### Out-of-Scope Use
372
+
373
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
374
+ -->
375
+
376
+ <!--
377
+ ## Bias, Risks and Limitations
378
+
379
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
380
+ -->
381
+
382
+ <!--
383
+ ### Recommendations
384
+
385
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
386
+ -->
387
+
388
+ ## Training Details
389
+
390
+ ### Training Dataset
391
+
392
+ #### Unnamed Dataset
393
+
394
+ * Size: 1,496 training samples
395
+ * Columns: <code>sentence_0</code> and <code>sentence_1</code>
396
+ * Approximate statistics based on the first 1000 samples:
397
+ | | sentence_0 | sentence_1 |
398
+ |:--------|:-----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
399
+ | type | string | string |
400
+ | details | <ul><li>min: 16 tokens</li><li>mean: 30.39 tokens</li><li>max: 52 tokens</li></ul> | <ul><li>min: 77 tokens</li><li>mean: 274.6 tokens</li><li>max: 359 tokens</li></ul> |
401
+ * Samples:
402
+ | sentence_0 | sentence_1 |
403
+ |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
404
+ | <code>According to the passage, what is more important than the size of a beachhead, and what characteristic must the people within it possess for it to be considered viable?</code> | <code>urgently need, you have a beachhead. [11]The question then is whether that beachhead is big enough. Or more importantly, who's in it: if the beachhead consists of people doing something lots more people will be doing in the future, then it's probably big enough no matter how small it is. For example, if you're building something differentiated from competitors by the fact that it works on phones, but it only works on the newest phones, that's probably a big enough beachhead. Err on the side of doing things where you'll face competitors. Inexperienced founders usually give competitors more credit than they deserve. Whether you succeed depends far more on you than on your competitors. So better a good idea with competitors than a bad one without. You don't need to worry about entering a "crowded market" so long as you have a thesis about what everyone else in it is overlooking. In fact that's a very promising starting point. Google was that type of idea. Your thesis has to be more precis...</code> |
405
+ | <code>According to the passage, what specific group of workers is uniquely affected by the "cost of checks," and why?</code> | <code>So it was left to the Europeans to explore and eventually to dominate the rest of the world, including China. In more recent times, Sarbanes-Oxley has practically destroyed the US IPO market. That wasn't the intention of the legislators who wrote it. They just wanted to add a few more checks on public companies. But they forgot to consider the cost. They forgot that companies about to go public are usually rather stretched, and that the weight of a few extra checks that might be easy for General Electric to bear are enough to prevent younger companies from being public at all. Once you start to think about the cost of checks, you can start to ask other interesting questions. Is the cost increasing or decreasing? Is it higher in some areas than others? Where does it increase discontinuously? If large organizations started to ask questions like that, they'd learn some frightening things. I think the cost of checks may actually be increasing. The reason is that software plays an increasin...</code> |
406
+ | <code>According to the passage, what is the most important thing an applicant can do during a Y Combinator interview, and why is this considered more valuable than meeting a higher standard of "convincingness"?</code> | <code>ou're in unless there's some other disqualifying flaw. That is a hard standard to meet, though. Airbnb didn't meet it. They had the first part. They had made something they themselves wanted. But it wasn't spreading. So don't feel bad if you don't hit this gold standard of convincingness. If Airbnb didn't hit it, it must be too high. In practice, the YC partners will be satisfied if they feel that you have a deep understanding of your users' needs. And the Airbnbs did have that. They were able to tell us all about what motivated hosts and guests. They knew from first-hand experience, because they'd been the first hosts. We couldn't ask them a question they didn't know the answer to. We ourselves were not very excited about the idea as users, but we knew this didn't prove anything, because there were lots of successful startups we hadn't been excited about as users. We were able to say to ourselves "They seem to know what they're talking about. Maybe they're onto something. It's not gro...</code> |
407
+ * Loss: <code>__main__.LoggableMNRL</code> with these parameters:
408
+ ```json
409
+ {
410
+ "scale": 20.0,
411
+ "similarity_fct": "cos_sim",
412
+ "gather_across_devices": false
413
+ }
414
+ ```
415
+
416
+ ### Training Hyperparameters
417
+ #### Non-Default Hyperparameters
418
+
419
+ - `per_device_train_batch_size`: 16
420
+ - `per_device_eval_batch_size`: 16
421
+ - `num_train_epochs`: 5
422
+ - `fp16`: True
423
+ - `multi_dataset_batch_sampler`: round_robin
424
+
425
+ #### All Hyperparameters
426
+ <details><summary>Click to expand</summary>
427
+
428
+ - `overwrite_output_dir`: False
429
+ - `do_predict`: False
430
+ - `eval_strategy`: no
431
+ - `prediction_loss_only`: True
432
+ - `per_device_train_batch_size`: 16
433
+ - `per_device_eval_batch_size`: 16
434
+ - `per_gpu_train_batch_size`: None
435
+ - `per_gpu_eval_batch_size`: None
436
+ - `gradient_accumulation_steps`: 1
437
+ - `eval_accumulation_steps`: None
438
+ - `torch_empty_cache_steps`: None
439
+ - `learning_rate`: 5e-05
440
+ - `weight_decay`: 0.0
441
+ - `adam_beta1`: 0.9
442
+ - `adam_beta2`: 0.999
443
+ - `adam_epsilon`: 1e-08
444
+ - `max_grad_norm`: 1
445
+ - `num_train_epochs`: 5
446
+ - `max_steps`: -1
447
+ - `lr_scheduler_type`: linear
448
+ - `lr_scheduler_kwargs`: {}
449
+ - `warmup_ratio`: 0.0
450
+ - `warmup_steps`: 0
451
+ - `log_level`: passive
452
+ - `log_level_replica`: warning
453
+ - `log_on_each_node`: True
454
+ - `logging_nan_inf_filter`: True
455
+ - `save_safetensors`: True
456
+ - `save_on_each_node`: False
457
+ - `save_only_model`: False
458
+ - `restore_callback_states_from_checkpoint`: False
459
+ - `no_cuda`: False
460
+ - `use_cpu`: False
461
+ - `use_mps_device`: False
462
+ - `seed`: 42
463
+ - `data_seed`: None
464
+ - `jit_mode_eval`: False
465
+ - `bf16`: False
466
+ - `fp16`: True
467
+ - `fp16_opt_level`: O1
468
+ - `half_precision_backend`: auto
469
+ - `bf16_full_eval`: False
470
+ - `fp16_full_eval`: False
471
+ - `tf32`: None
472
+ - `local_rank`: 0
473
+ - `ddp_backend`: None
474
+ - `tpu_num_cores`: None
475
+ - `tpu_metrics_debug`: False
476
+ - `debug`: []
477
+ - `dataloader_drop_last`: False
478
+ - `dataloader_num_workers`: 0
479
+ - `dataloader_prefetch_factor`: None
480
+ - `past_index`: -1
481
+ - `disable_tqdm`: False
482
+ - `remove_unused_columns`: True
483
+ - `label_names`: None
484
+ - `load_best_model_at_end`: False
485
+ - `ignore_data_skip`: False
486
+ - `fsdp`: []
487
+ - `fsdp_min_num_params`: 0
488
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
489
+ - `fsdp_transformer_layer_cls_to_wrap`: None
490
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
491
+ - `parallelism_config`: None
492
+ - `deepspeed`: None
493
+ - `label_smoothing_factor`: 0.0
494
+ - `optim`: adamw_torch_fused
495
+ - `optim_args`: None
496
+ - `adafactor`: False
497
+ - `group_by_length`: False
498
+ - `length_column_name`: length
499
+ - `project`: huggingface
500
+ - `trackio_space_id`: trackio
501
+ - `ddp_find_unused_parameters`: None
502
+ - `ddp_bucket_cap_mb`: None
503
+ - `ddp_broadcast_buffers`: False
504
+ - `dataloader_pin_memory`: True
505
+ - `dataloader_persistent_workers`: False
506
+ - `skip_memory_metrics`: True
507
+ - `use_legacy_prediction_loop`: False
508
+ - `push_to_hub`: False
509
+ - `resume_from_checkpoint`: None
510
+ - `hub_model_id`: None
511
+ - `hub_strategy`: every_save
512
+ - `hub_private_repo`: None
513
+ - `hub_always_push`: False
514
+ - `hub_revision`: None
515
+ - `gradient_checkpointing`: False
516
+ - `gradient_checkpointing_kwargs`: None
517
+ - `include_inputs_for_metrics`: False
518
+ - `include_for_metrics`: []
519
+ - `eval_do_concat_batches`: True
520
+ - `fp16_backend`: auto
521
+ - `push_to_hub_model_id`: None
522
+ - `push_to_hub_organization`: None
523
+ - `mp_parameters`:
524
+ - `auto_find_batch_size`: False
525
+ - `full_determinism`: False
526
+ - `torchdynamo`: None
527
+ - `ray_scope`: last
528
+ - `ddp_timeout`: 1800
529
+ - `torch_compile`: False
530
+ - `torch_compile_backend`: None
531
+ - `torch_compile_mode`: None
532
+ - `include_tokens_per_second`: False
533
+ - `include_num_input_tokens_seen`: no
534
+ - `neftune_noise_alpha`: None
535
+ - `optim_target_modules`: None
536
+ - `batch_eval_metrics`: False
537
+ - `eval_on_start`: False
538
+ - `use_liger_kernel`: False
539
+ - `liger_kernel_config`: None
540
+ - `eval_use_gather_object`: False
541
+ - `average_tokens_across_devices`: True
542
+ - `prompts`: None
543
+ - `batch_sampler`: batch_sampler
544
+ - `multi_dataset_batch_sampler`: round_robin
545
+ - `router_mapping`: {}
546
+ - `learning_rate_mapping`: {}
547
+
548
+ </details>
549
+
550
+ ### Framework Versions
551
+ - Python: 3.12.12
552
+ - Sentence Transformers: 5.1.2
553
+ - Transformers: 4.57.3
554
+ - PyTorch: 2.9.0+cu126
555
+ - Accelerate: 1.12.0
556
+ - Datasets: 4.0.0
557
+ - Tokenizers: 0.22.1
558
+
559
+ ## Citation
560
+
561
+ ### BibTeX
562
+
563
+ #### Sentence Transformers
564
+ ```bibtex
565
+ @inproceedings{reimers-2019-sentence-bert,
566
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
567
+ author = "Reimers, Nils and Gurevych, Iryna",
568
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
569
+ month = "11",
570
+ year = "2019",
571
+ publisher = "Association for Computational Linguistics",
572
+ url = "https://arxiv.org/abs/1908.10084",
573
+ }
574
+ ```
575
+
576
+ #### LoggableMNRL
577
+ ```bibtex
578
+ @misc{henderson2017efficient,
579
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
580
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
581
+ year={2017},
582
+ eprint={1705.00652},
583
+ archivePrefix={arXiv},
584
+ primaryClass={cs.CL}
585
+ }
586
+ ```
587
+
588
+ <!--
589
+ ## Glossary
590
+
591
+ *Clearly define terms in order to be accessible across audiences.*
592
+ -->
593
+
594
+ <!--
595
+ ## Model Card Authors
596
+
597
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
598
+ -->
599
+
600
+ <!--
601
+ ## Model Card Contact
602
+
603
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
604
+ -->
config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertModel"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "dtype": "float32",
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 3072,
14
+ "layer_norm_eps": 1e-12,
15
+ "max_position_embeddings": 512,
16
+ "model_type": "bert",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 12,
19
+ "pad_token_id": 0,
20
+ "position_embedding_type": "absolute",
21
+ "transformers_version": "4.57.3",
22
+ "type_vocab_size": 2,
23
+ "use_cache": true,
24
+ "vocab_size": 30522
25
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "SentenceTransformer",
3
+ "__version__": {
4
+ "sentence_transformers": "5.1.2",
5
+ "transformers": "4.57.3",
6
+ "pytorch": "2.9.0+cu126"
7
+ },
8
+ "prompts": {
9
+ "query": "",
10
+ "document": ""
11
+ },
12
+ "default_prompt_name": null,
13
+ "similarity_fn_name": "cosine"
14
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9ed15161e93b6d775eb8e319f918d7e8368759e48d450bd621529c8ea56b2339
3
+ size 437951328
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": false,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "extra_special_tokens": {},
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "pad_token": "[PAD]",
51
+ "sep_token": "[SEP]",
52
+ "strip_accents": null,
53
+ "tokenize_chinese_chars": true,
54
+ "tokenizer_class": "BertTokenizer",
55
+ "unk_token": "[UNK]"
56
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff