omarkamali commited on
Commit
26eb33e
·
verified ·
1 Parent(s): 8f235b3

Upload all models and assets for ann (latest)

Browse files
README.md CHANGED
@@ -36,7 +36,7 @@ metrics:
36
  value: 4.353
37
  - name: best_isotropy
38
  type: isotropy
39
- value: 0.1710
40
  - name: vocabulary_size
41
  type: vocab
42
  value: 0
@@ -97,26 +97,26 @@ We analyze tokenizers, n-gram models, Markov chains, vocabulary statistics, and
97
 
98
  Below are sample sentences tokenized with each vocabulary size:
99
 
100
- **Sample 1:** `Ọrọn môkọt ire: Ebi Ọrọn (ife) Ido Ọrọn (ama ere) Ọrọn (Mkpulu-ija) Usem Ọrọn...`
101
 
102
  | Vocab | Tokens | Count |
103
  |-------|--------|-------|
104
- | 8k | `▁ọrọnmôkọtire : ebi ▁ọrọn( ife ) ido ... (+17 more)` | 27 |
105
- | 16k | `▁ọrọnmôkọtire : ebi ▁ọrọn( ife ) ido ... (+17 more)` | 27 |
106
 
107
- **Sample 2:** `Nde ìre oke mgbọ òsoso usen jaaba. Ekisa nde ifuk onyan̄ isa ifuk acha si. Nd...`
108
 
109
  | Vocab | Tokens | Count |
110
  |-------|--------|-------|
111
- | 8k | `▁nde ▁ìre okemgbọ ▁òsoso ▁usenjaaba . ekisande ... (+27 more)` | 37 |
112
- | 16k | `▁nde ▁ìreokemgbọ ▁òsoso usenjaaba .ekisande ... (+26 more)` | 36 |
113
 
114
- **Sample 3:** `Ọngari (òrere Hungary me usem Ebeke, ire Magyarország me usem Ọn̄gari) ìre id...`
115
 
116
  | Vocab | Tokens | Count |
117
  |-------|--------|-------|
118
- | 8k | `▁ọ n gari ( òrereh ungarymeusemebeke ... (+28 more)` | 38 |
119
- | 16k | `▁ọngari( òrere hungarymeusemebeke , ire ... (+19 more)` | 29 |
120
 
121
 
122
  ### Key Findings
@@ -266,27 +266,27 @@ Below are text samples generated from each word-based Markov chain model:
266
 
267
  **Context Size 1:**
268
 
269
- 1. `me lek kiban̄ ekitimbe akọn̄ ofirikosok gbọgbọ otu ifuk ikpọk ya ìre isilam inu`
270
- 2. `mè ijọn̄ ido ya me naijiria agan̄ ichep ura ogwu òkitaak chieen̄ ikpọmgbọ`
271
- 3. `agan̄ mkpulu uwu usọ ifuk ene ewabe ichit me emen mgbọ etiopia ìkup me agọọk nkween̄`
272
 
273
  **Context Size 2:**
274
 
275
- 1. `me lek adasi nkwukwuuk cha isa ikije isa me ikeya lesoto ìre ge me lek ijọn̄`
276
- 2. `me agan̄ inyọn̄ mbum ura emen awaji atik me agan̄ inyọn̄ ichep ura agan̄ ichep`
277
- 3. `me emen wire môkọtbe irọ inu due process odobe`
278
 
279
  **Context Size 3:**
280
 
281
- 1. `agan̄ ichep ura oniin̄ ikpọkpọk ikire ibot ikọ me ukpatu ebi uga ifuk ibot chereyi kperiọọn̄ owuw...`
282
- 2. `me ido ya ìnire usem furenchi mèlek usem wolof sa me ebi otoko wolof erebe ebi ìwawa ichit`
283
- 3. `me agan̄ osiki ruwanda burundi kongo kinshasa ekup me agan̄ inyọn̄ abia akwa ibom me`
284
 
285
  **Context Size 4:**
286
 
287
- 1. `me agan̄ mbum ura kan̄ emen awaji atilantik otap ikana ọmọ me agan̄ inyọn̄ afirika agan̄ inyọn̄ ò...`
288
- 2. `me agan̄ ichep ura isi ire iteke indus me agan̄ mbum ura me afirika ire si òso 20`
289
- 3. `me ido ya ìre furench ire îre akọp irek me efit si re akọp irek go me efit`
290
 
291
 
292
  ### Generated Text Samples (Subword-based)
@@ -295,27 +295,27 @@ Below are text samples generated from each subword-based Markov chain model:
295
 
296
  **Context Size 1:**
297
 
298
- 1. `_idem_ijan̄_masee`
299
- 2. `e_erelukp_mi_lup`
300
- 3. `irit_enyikp_n_si`
301
 
302
  **Context Size 2:**
303
 
304
- 1. `e_obageeleki_[cor`
305
- 2. `_ike_ubọ_erere_ik`
306
- 3. `_me_ebi_mè_mem_ya`
307
 
308
  **Context Size 3:**
309
 
310
- 1. `_me_emire_ge,_mè_d`
311
- 2. `me_jodan_ichechich`
312
- 3. `re_ge,_ìkigwook,_ò`
313
 
314
  **Context Size 4:**
315
 
316
- 1. `_me_si_inwàn_ikwaan̄`
317
- 2. `_mè_ikikaan̄ge;_me_<`
318
- 3. `lek_ebi_ìkike_eriọọ`
319
 
320
 
321
  ### Key Findings
@@ -420,18 +420,18 @@ Below are text samples generated from each subword-based Markov chain model:
420
 
421
  | Model | Dimension | Isotropy | Semantic Density | Alignment R@1 | Alignment R@10 |
422
  |-------|-----------|----------|------------------|---------------|----------------|
423
- | **mono_32d** | 32 | 0.1710 🏆 | 0.5365 | N/A | N/A |
424
- | **mono_64d** | 64 | 0.0323 | 0.5579 | N/A | N/A |
425
- | **mono_128d** | 128 | 0.0059 | 0.5505 | N/A | N/A |
426
- | **aligned_32d** | 32 | 0.1710 | 0.5499 | 0.0111 | 0.1274 |
427
- | **aligned_64d** | 64 | 0.0323 | 0.5740 | 0.0222 | 0.1717 |
428
- | **aligned_128d** | 128 | 0.0059 | 0.5600 | 0.0139 | 0.1634 |
429
 
430
  ### Key Findings
431
 
432
- - **Best Isotropy:** mono_32d with 0.1710 (more uniform distribution)
433
- - **Semantic Density:** Average pairwise similarity of 0.5548. Lower values indicate better semantic separation.
434
- - **Alignment Quality:** Aligned models achieve up to 2.2% R@1 in cross-lingual retrieval.
435
  - **Recommendation:** 128d aligned for best cross-lingual performance
436
 
437
  ---
@@ -453,14 +453,14 @@ These are the most productive prefixes and suffixes identified by sampling the v
453
  #### Productive Prefixes
454
  | Prefix | Examples |
455
  |--------|----------|
456
- | `-ek` | eket, ekpabe, ekibene |
457
- | `-ik` | ikpele, ikinen̄e, ikinyam |
458
 
459
  #### Productive Suffixes
460
  | Suffix | Examples |
461
  |--------|----------|
462
- | `-n̄` | ugwun̄, akwaan̄, esun̄ |
463
- | `-be` | ekpabe, îkwube, ejibibe |
464
 
465
  ### 6.3 Bound Stems (Lexical Roots)
466
 
@@ -468,18 +468,18 @@ Bound stems are high-frequency subword units that are semantically cohesive but
468
 
469
  | Stem | Cohesion | Substitutability | Examples |
470
  |------|----------|------------------|----------|
471
- | `gọọk` | 1.51x | 19 contexts | igọọk, agọọk, ogọọk |
472
- | `tumu` | 1.46x | 21 contexts | ntumu, etumu, itumu |
473
- | `kpul` | 1.58x | 16 contexts | ikpulu, îkpulu, òkpulu |
474
- | `sibi` | 1.51x | 18 contexts | nsibi, ìsibi, îsibi |
475
- | `kikp` | 1.44x | 19 contexts | òkikpa, òkikpọ, ekikpọ |
476
- | `kana` | 1.41x | 20 contexts | ekana, ìkana, nkana |
477
- | `chie` | 1.55x | 14 contexts | chief, ichieek, ìchieek |
478
- | `riọọ` | 1.63x | 12 contexts | nriọọk, riọọn̄, nriọọn̄ |
479
- | `kisa` | 1.43x | 17 contexts | òkisa, ìkisa, îkisa |
480
- | `gbaa` | 1.46x | 15 contexts | ògbaan̄, egbaan̄, ìgbaan̄ |
481
- | `ikaa` | 1.61x | 11 contexts | ikaan̄, enikaan̄, ebikaan̄ |
482
- | `kpọk` | 1.39x | 16 contexts | ukpọk, okpọk, ọkpọk |
483
 
484
  ### 6.4 Affix Compatibility (Co-occurrence)
485
 
@@ -487,9 +487,9 @@ This table shows which prefixes and suffixes most frequently co-occur on the sam
487
 
488
  | Prefix | Suffix | Frequency | Examples |
489
  |--------|--------|-----------|----------|
490
- | `-ik` | `-n̄` | 15 words | ikikpan̄, ikwaan̄ |
491
- | `-ek` | `-be` | 13 words | ekpabe, ekaan̄be |
492
- | `-ek` | `-n̄` | 10 words | ekijeen̄, ekitoon̄ |
493
 
494
  ### 6.5 Recursive Morpheme Segmentation
495
 
@@ -497,19 +497,19 @@ Using **Recursive Hierarchical Substitutability**, we decompose complex words in
497
 
498
  | Word | Suggested Split | Confidence | Stem |
499
  |------|-----------------|------------|------|
500
- | ekinyambe | **`ek-inyam-be`** | 6.0 | `inyam` |
501
  | ekitumube | **`ek-itumu-be`** | 6.0 | `itumu` |
 
502
  | ekigwenbe | **`ek-igwen-be`** | 6.0 | `igwen` |
503
  | ikichieek | **`ik-ichieek`** | 4.5 | `ichieek` |
504
- | ekichichini | **`ek-ichichini`** | 4.5 | `ichichini` |
505
  | echichinibe | **`echichini-be`** | 4.5 | `echichini` |
506
- | ekiweweek | **`ek-iweweek`** | 4.5 | `iweweek` |
507
- | ekikpulube | **`ek-ik-pulu-be`** | 4.5 | `pulu` |
508
  | ekekikpulu | **`ek-ek-ik-pulu`** | 4.5 | `pulu` |
 
509
  | echieekbe | **`echieek-be`** | 4.5 | `echieek` |
510
- | ekikpukpo | **`ek-ik-pukpo`** | 3.0 | `pukpo` |
511
- | ikpọchieen̄ | **`ik-pọchiee-n̄`** | 3.0 | `pọchiee` |
512
- | ikibieen̄ | **`ik-ibiee-n̄`** | 3.0 | `ibiee` |
 
 
513
  | eriọọn̄be | **`eriọọ-n̄-be`** | 3.0 | `eriọọ` |
514
  | egbaan̄be | **`egbaa-n̄-be`** | 3.0 | `egbaa` |
515
 
@@ -745,4 +745,4 @@ MIT License - Free for academic and commercial use.
745
  ---
746
  *Generated by Wikilangs Models Pipeline*
747
 
748
- *Report Date: 2026-01-03 14:12:13*
 
36
  value: 4.353
37
  - name: best_isotropy
38
  type: isotropy
39
+ value: 0.1716
40
  - name: vocabulary_size
41
  type: vocab
42
  value: 0
 
97
 
98
  Below are sample sentences tokenized with each vocabulary size:
99
 
100
+ **Sample 1:** `Ida Obolo ìre ikpa etip-usen eyi ebi Ogbo Ikwaan̄ Usem Obolo ekisan̄a isibi me e...`
101
 
102
  | Vocab | Tokens | Count |
103
  |-------|--------|-------|
104
+ | 8k | `▁idaobolo ▁ìre ikpaetip - usen eyi ▁ebiogbo ... (+22 more)` | 32 |
105
+ | 16k | `▁idaobolo ▁ìre ikpaetip - usen eyi ▁ebiogbo ... (+21 more)` | 31 |
106
 
107
+ **Sample 2:** `Jameni (òrere Deutschland me usem Jameni, ire Germany me usem Ebeke) ìre ido ...`
108
 
109
  | Vocab | Tokens | Count |
110
  |-------|--------|-------|
111
+ | 8k | `▁jameni( òrere de uts ch land meusemjameni ... (+15 more)` | 25 |
112
+ | 16k | `▁jameni ▁( òrere deutschlandmeusemjameni ,ire ... (+12 more)` | 22 |
113
 
114
+ **Sample 3:** `ìre ikọ ekisa ìjeen̄ uyok uyok ekiket kubọk nriki ònan̄a me inu ikeke ene chieen...`
115
 
116
  | Vocab | Tokens | Count |
117
  |-------|--------|-------|
118
+ | 8k | `▁ìre ▁ikọekisa ▁ìjeen̄uyok ▁uyokekiketkubọknriki ▁ònan̄a ... (+17 more)` | 27 |
119
+ | 16k | `▁ìreikọekisa ▁ìjeen̄ uyokuyokekiketkubọknriki ▁ònan̄a ... (+17 more)` | 27 |
120
 
121
 
122
  ### Key Findings
 
266
 
267
  **Context Size 1:**
268
 
269
+ 1. `me atasuk eyi akọp ìkigwat lek ogugo ijọn̄ afirika etete udun̄nde òrere dmitri mendeleev me esese`
270
+ 2. `mè anam ge me ere ònire agan̄ inyọn̄ me lek ijọn̄ sudan îgbuku igwookikisa`
271
+ 3. `agan̄ erumfaka kiristien itap mbit pọtugalu ekisabe ibọp ekwu òkukup me usem obolo usini ekilọk`
272
 
273
  **Context Size 2:**
274
 
275
+ 1. `me lek èwê sayara me emen ido yi usem komoros furenchi usem afarì igwen okwaan̄`
276
+ 2. `me agan̄ inyọn̄ sabum mgbọ keyi ebi ene ewa ichit ebi un enyibe me acha ifofo belin`
277
+ 3. `me emen kan̄ mgbọ îkanabe ogwu biriten îsan̄a nchọi iba me emen utikpa ya okisibi igwook me`
278
 
279
  **Context Size 3:**
280
 
281
+ 1. `agan̄ ichep ura otutuuk ekup me lek ogbọn̄ ikput lek ema ekitim akọn̄ me lek ido india me`
282
+ 2. `me ido ya efele oka agan̄ mkpulu gongola isa ichili taraba me 27 ọgọs me ukot mkpulu kè`
283
+ 3. `me agan̄ osiki ichep ura egwen agan̄ mkpulu rivas delita ijọ ijaw ore usem eikimalek itumu me`
284
 
285
  **Context Size 4:**
286
 
287
+ 1. `me agan̄ mbum ura me agan̄ ichep ura îre ido ini ingilan skọtilan weelis ailan agan̄ inyọn̄ me`
288
+ 2. `me agan̄ ichep ura isi ire lek emen awaji pàsifik me agan̄ mbum ura sudan me agan̄ osiki mbum`
289
+ 3. `me ido ya ìre mamoudizou me grande terree acho eyi ilile usem mkpulu ìre furenchi eyi owuwa ene ekit...`
290
 
291
 
292
  ### Generated Text Samples (Subword-based)
 
295
 
296
  **Context Size 1:**
297
 
298
+ 1. `_me_e_mọọn̄),_eki`
299
+ 2. `eranlatenctiko._`
300
+ 3. `ik_e_e_gbe_okire`
301
 
302
  **Context Size 2:**
303
 
304
+ 1. `e_nathe_in_iritim`
305
+ 2. `_igọọn̄_mîturusiny`
306
+ 3. `_me_ban̄_ge_ema_mf`
307
 
308
  **Context Size 3:**
309
 
310
+ 1. `_me_òbeluk_sọn_bro`
311
+ 2. `me_ere_<gdp>_òkike`
312
+ 3. `re_ere_òta_irọ_yi_`
313
 
314
  **Context Size 4:**
315
 
316
+ 1. `_me_emen_oka_akat._`
317
+ 2. `_mè_onan̄a_me_efit_e`
318
+ 3. `lek_ichit_me_agan̄_i`
319
 
320
 
321
  ### Key Findings
 
420
 
421
  | Model | Dimension | Isotropy | Semantic Density | Alignment R@1 | Alignment R@10 |
422
  |-------|-----------|----------|------------------|---------------|----------------|
423
+ | **mono_32d** | 32 | 0.1716 🏆 | 0.5548 | N/A | N/A |
424
+ | **mono_64d** | 64 | 0.0315 | 0.5662 | N/A | N/A |
425
+ | **mono_128d** | 128 | 0.0057 | 0.5736 | N/A | N/A |
426
+ | **aligned_32d** | 32 | 0.1716 | 0.5361 | 0.0083 | 0.1330 |
427
+ | **aligned_64d** | 64 | 0.0315 | 0.5580 | 0.0166 | 0.1745 |
428
+ | **aligned_128d** | 128 | 0.0057 | 0.5602 | 0.0139 | 0.1717 |
429
 
430
  ### Key Findings
431
 
432
+ - **Best Isotropy:** mono_32d with 0.1716 (more uniform distribution)
433
+ - **Semantic Density:** Average pairwise similarity of 0.5582. Lower values indicate better semantic separation.
434
+ - **Alignment Quality:** Aligned models achieve up to 1.7% R@1 in cross-lingual retrieval.
435
  - **Recommendation:** 128d aligned for best cross-lingual performance
436
 
437
  ---
 
453
  #### Productive Prefixes
454
  | Prefix | Examples |
455
  |--------|----------|
456
+ | `-ek` | ekijeje, ekpukpo, ekwukwu |
457
+ | `-ik` | ikira, ikikween̄, ikpọkpọ |
458
 
459
  #### Productive Suffixes
460
  | Suffix | Examples |
461
  |--------|----------|
462
+ | `-n̄` | kpekaan̄, utọn̄, ikikween̄ |
463
+ | `-be` | ojotbe, erọkọbe, egobobe |
464
 
465
  ### 6.3 Bound Stems (Lexical Roots)
466
 
 
468
 
469
  | Stem | Cohesion | Substitutability | Examples |
470
  |------|----------|------------------|----------|
471
+ | `tumu` | 1.48x | 21 contexts | etumu, îtumu, itumu |
472
+ | `gọọk` | 1.52x | 19 contexts | agọọk, egọọk, îgọọk |
473
+ | `kpul` | 1.59x | 16 contexts | ìkpulu, îkpulu, òkpulu |
474
+ | `sibi` | 1.51x | 18 contexts | ìsibi, îsibi, osibi |
475
+ | `kikp` | 1.46x | 19 contexts | ìkikpa, òkikpọ, òkikpa |
476
+ | `kana` | 1.42x | 20 contexts | nkana, îkana, ekana |
477
+ | `kisa` | 1.45x | 17 contexts | îkisa, ikisa, ìkisa |
478
+ | `chie` | 1.54x | 14 contexts | chief, nchieek, chieen̄ |
479
+ | `riọọ` | 1.62x | 12 contexts | riọọn̄, nriọọk, iriọọn̄ |
480
+ | `gbaa` | 1.47x | 15 contexts | igbaan̄, egbaan̄, ogbaan̄ |
481
+ | `ikaa` | 1.61x | 11 contexts | ikaan̄, ìkikaan̄, ebikaan̄ |
482
+ | `kpọk` | 1.39x | 16 contexts | okpọk, ukpọk, ikpọk |
483
 
484
  ### 6.4 Affix Compatibility (Co-occurrence)
485
 
 
487
 
488
  | Prefix | Suffix | Frequency | Examples |
489
  |--------|--------|-----------|----------|
490
+ | `-ik` | `-n̄` | 15 words | ikikween̄, ikpan̄ |
491
+ | `-ek` | `-be` | 13 words | ekpan̄be, ekpukbe |
492
+ | `-ek` | `-n̄` | 10 words | ekigbaan̄, ekimun̄ |
493
 
494
  ### 6.5 Recursive Morpheme Segmentation
495
 
 
497
 
498
  | Word | Suggested Split | Confidence | Stem |
499
  |------|-----------------|------------|------|
 
500
  | ekitumube | **`ek-itumu-be`** | 6.0 | `itumu` |
501
+ | ekinyambe | **`ek-inyam-be`** | 6.0 | `inyam` |
502
  | ekigwenbe | **`ek-igwen-be`** | 6.0 | `igwen` |
503
  | ikichieek | **`ik-ichieek`** | 4.5 | `ichieek` |
 
504
  | echichinibe | **`echichini-be`** | 4.5 | `echichini` |
 
 
505
  | ekekikpulu | **`ek-ek-ik-pulu`** | 4.5 | `pulu` |
506
+ | ekiweweek | **`ek-iweweek`** | 4.5 | `iweweek` |
507
  | echieekbe | **`echieek-be`** | 4.5 | `echieek` |
508
+ | ekikpulube | **`ek-ik-pulu-be`** | 4.5 | `pulu` |
509
+ | ekichichini | **`ek-ichichini`** | 4.5 | `ichichini` |
510
+ | ikikween̄ | **`ik-ik-ween̄`** | 3.0 | `ween̄` |
511
+ | ekigbaan̄ | **`ek-igbaa-n̄`** | 3.0 | `igbaa` |
512
+ | îriọọn̄be | **`îriọọ-n̄-be`** | 3.0 | `îriọọ` |
513
  | eriọọn̄be | **`eriọọ-n̄-be`** | 3.0 | `eriọọ` |
514
  | egbaan̄be | **`egbaa-n̄-be`** | 3.0 | `egbaa` |
515
 
 
745
  ---
746
  *Generated by Wikilangs Models Pipeline*
747
 
748
+ *Report Date: 2026-01-03 16:25:38*
models/embeddings/aligned/ann_128d.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:dc209d93d13b120c8c95cba7ed8ea3511e7f3a3be58c71d6b52f823aba7e3a58
3
  size 1025972707
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4ec3b02a29131159583e1802e66a0d3a117ff648330d804416e1080ffea1596e
3
  size 1025972707
models/embeddings/aligned/ann_128d.projection.npy CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:6ecbf9194a3110e9db3f94be28e4d970c89d4ae602cd76c5e4114d930138c369
3
  size 65664
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3873a81cc2944d94ca6990cc7557d6d55aea718409cedcb7289f53215b936c57
3
  size 65664
models/embeddings/aligned/ann_32d.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:046109a4422a9a603a48c2abd0d192b758882631e72ac429557dc6f857e7b375
3
  size 256516579
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6ef8cdd15f948d80be885263043c7688460fd182172f2154c015b80d3a886320
3
  size 256516579
models/embeddings/aligned/ann_32d.projection.npy CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9ac69b84943611045a295f4bb05aa98d313c747e3e3ff20df0a86d90596c100c
3
  size 4224
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:91d45dc35817a7af0f3e2baf4943e62863a7f004ecdf51c6c171ce0499b800eb
3
  size 4224
models/embeddings/aligned/ann_64d.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b00d28e874e41679c615dce915575ad329279f02507e33677323ebb64d12a95e
3
  size 513001955
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:84916d177660159df0e2cd46c3284a2381f09279130834a1add5cb653d1676b5
3
  size 513001955
models/embeddings/aligned/ann_64d.projection.npy CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:8417b28498d4ded38028edb9ebd5965b99df3b52961d1191e017b63208828a47
3
  size 16512
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:572e129219bb969b2237639b710483579e3d554be99bbf1f655d2d070f2f16a5
3
  size 16512
models/embeddings/monolingual/ann_128d.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:dc209d93d13b120c8c95cba7ed8ea3511e7f3a3be58c71d6b52f823aba7e3a58
3
  size 1025972707
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4ec3b02a29131159583e1802e66a0d3a117ff648330d804416e1080ffea1596e
3
  size 1025972707
models/embeddings/monolingual/ann_32d.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:046109a4422a9a603a48c2abd0d192b758882631e72ac429557dc6f857e7b375
3
  size 256516579
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6ef8cdd15f948d80be885263043c7688460fd182172f2154c015b80d3a886320
3
  size 256516579
models/embeddings/monolingual/ann_64d.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b00d28e874e41679c615dce915575ad329279f02507e33677323ebb64d12a95e
3
  size 513001955
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:84916d177660159df0e2cd46c3284a2381f09279130834a1add5cb653d1676b5
3
  size 513001955
models/tokenizer/ann_tokenizer_16k.model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b74f59549b94c9bbb91cffef6bc44cacadb4f440a30004c5287afd482a9adcf2
3
  size 510998
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e4bb1ade5597ab4a46c633e58eb5a704bbb6e44b9af4b87b84cd7063c2be7a82
3
  size 510998
models/tokenizer/ann_tokenizer_8k.model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:89d553eae70a0c59b46d72bb3b7a2e84791949a1b7ea431452fca72f093214e2
3
  size 374759
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:677dae64939a27b2e6d16b59a170e54e602e80ae6e57143d5f85c4f79bfd5219
3
  size 374759
visualizations/embedding_alignment_quality.png CHANGED
visualizations/embedding_isotropy.png CHANGED
visualizations/embedding_norms.png CHANGED
visualizations/embedding_similarity.png CHANGED

Git LFS Details

  • SHA256: 18c89a55fdcb08e3fe461cbfa1e51a759d65e5d5e3e882f7e231124b3539a983
  • Pointer size: 131 Bytes
  • Size of remote file: 153 kB

Git LFS Details

  • SHA256: 8aa992e09286cb62217073dd61249b3da0bc25fa1ef5b63bb8e5e25bd315e64b
  • Pointer size: 131 Bytes
  • Size of remote file: 154 kB
visualizations/embedding_tsne_multilingual.png CHANGED

Git LFS Details

  • SHA256: c3383573acb7bf6d46f85865bc1d7daba8d55a8a799cb149e767fc27045eec93
  • Pointer size: 131 Bytes
  • Size of remote file: 290 kB

Git LFS Details

  • SHA256: 4f54cb629d65dbf59414479491f3cf0e62b024bc6419110a75cd1dc2c3f816bf
  • Pointer size: 131 Bytes
  • Size of remote file: 272 kB
visualizations/performance_dashboard.png CHANGED

Git LFS Details

  • SHA256: 80e8df63f4854d85f7aefd5eefe48ae9ce7742e77c12e9df2a77f1fca24c892e
  • Pointer size: 131 Bytes
  • Size of remote file: 366 kB

Git LFS Details

  • SHA256: 424b92470631e55442d3f8d5aca5c2b73dce1fe05c150c26feb457f09c7c1731
  • Pointer size: 131 Bytes
  • Size of remote file: 373 kB
visualizations/position_encoding_comparison.png CHANGED

Git LFS Details

  • SHA256: 873ce43d441c2c4076d0a848e3f4f96b858ea86cc9337bbdfa7cd3b43cdfa133
  • Pointer size: 131 Bytes
  • Size of remote file: 112 kB

Git LFS Details

  • SHA256: 5d86bc2702d1d3f4ce4a53bb9bdc54f1e0812b496e67eecc084a8e29e3f7e8ea
  • Pointer size: 131 Bytes
  • Size of remote file: 112 kB
visualizations/tsne_sentences.png CHANGED

Git LFS Details

  • SHA256: 6a99d5a6a7c259b3fe08655fbc379e859fbfe060eeb632215620f56d89fb3022
  • Pointer size: 131 Bytes
  • Size of remote file: 278 kB

Git LFS Details

  • SHA256: 6b295c742481868c95d1c235e10a1d1716cd7e294b3f4472a1a5d3935c7d6df1
  • Pointer size: 131 Bytes
  • Size of remote file: 279 kB
visualizations/tsne_words.png CHANGED

Git LFS Details

  • SHA256: 535d2632086f5bfd85104fbc97ad1a77b33ce3dc4e19559afa1800b1fb5a60b5
  • Pointer size: 131 Bytes
  • Size of remote file: 704 kB

Git LFS Details

  • SHA256: a23f1e9cc523dcc386e4c269735a7f00b5fb340db30e4b306163790acb303c3a
  • Pointer size: 131 Bytes
  • Size of remote file: 700 kB