Initial upload
Browse files- .ipynb_checkpoints/README-checkpoint.md +698 -0
- 1_Pooling/config.json +10 -0
- README.md +17 -0
- config.json +37 -0
- config_sentence_transformers.json +14 -0
- model.safetensors +3 -0
- modules.json +14 -0
- sentence_bert_config.json +4 -0
- special_tokens_map.json +51 -0
- tokenizer.json +0 -0
- tokenizer_config.json +859 -0
.ipynb_checkpoints/README-checkpoint.md
ADDED
|
@@ -0,0 +1,698 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
tags:
|
| 3 |
+
- sentence-transformers
|
| 4 |
+
- sentence-similarity
|
| 5 |
+
- feature-extraction
|
| 6 |
+
- dense
|
| 7 |
+
- generated_from_trainer
|
| 8 |
+
- dataset_size:2573632
|
| 9 |
+
- loss:MultipleNegativesRankingLoss
|
| 10 |
+
- dataset_size:22925
|
| 11 |
+
- loss:CosineSimilarityLoss
|
| 12 |
+
base_model: muchad/mdeberta-hybrid-30k
|
| 13 |
+
widget:
|
| 14 |
+
- source_sentence: Siapakah yang menciptakan telegraf ?
|
| 15 |
+
sentences:
|
| 16 |
+
- 'Teknologi berbasis listrik
|
| 17 |
+
|
| 18 |
+
Pada sistem telegraf, suatu pesan akan diterjemahkan menjadi sederetan kode morse.
|
| 19 |
+
Kode tersebut selanjutnya dikirim melalui kawat peghubung sebagai media perantara
|
| 20 |
+
dan diterima oleh bagian penerima. Pada bagian penerima, kode-kode tersebut diterjemahkan
|
| 21 |
+
kembali menjadi pesan seperti aslinya.'
|
| 22 |
+
- 'Mintakat riparian
|
| 23 |
+
|
| 24 |
+
Untuk melindungi keberadaan dan keberlangsungan fungsi wilayah riparian, tiap-tiap
|
| 25 |
+
negara mengeluarkan peraturan yang berbeda-beda. Indonesia, misalnya, memiliki
|
| 26 |
+
peraturan untuk memelihara dan mempertahankan apa yang disebut sebagai sempadan
|
| 27 |
+
sungai. Peraturan ini pada dasarnya menganjurkan pengelola wilayah, umpamanya
|
| 28 |
+
pemegang HPH, untuk memelihara kawasan dengan lebar tertentu, sejajar dan di sepanjang
|
| 29 |
+
tepian kanan-kiri sungai. Lebar sempadan ini bergantung kepada ukuran sungai itu
|
| 30 |
+
sendiri, kondisi tepiannya (apakah masih alami atau buatan), serta letaknya (apakah
|
| 31 |
+
di hutan, kawasan perkebunan atau di perkotaan).'
|
| 32 |
+
- 'Advent Bangun
|
| 33 |
+
|
| 34 |
+
Film pertamanya adalah "Rajawali Sakti" pada tahun 1976, kemudian menjadi pemeran
|
| 35 |
+
utama bersama aktris Enny Beatrice dalam film "Satria Bambu Kuning" 1985, "Anita"
|
| 36 |
+
1984, dan "Dendam Jagoan" 1986. Film-filmnya sering disutradarai oleh Atok Suharto,
|
| 37 |
+
Ratno Timoer, dan Sisworo Gautama yang sangat populer pada waktu itu.
|
| 38 |
+
|
| 39 |
+
Pada tanggal 10 Februari 2018, Advent Bangun meninggal dunia di RSUP Fatmawati,
|
| 40 |
+
Jakarta Selatan. Ia meninggal akibat penyakit diabetes yang kemudian menjadi komplikasi
|
| 41 |
+
ke ginjal.'
|
| 42 |
+
- source_sentence: Kapan pertama kali telepon genggam ditemukan?
|
| 43 |
+
sentences:
|
| 44 |
+
- 'Olimpiade
|
| 45 |
+
|
| 46 |
+
Indonesia pertama kali berpartisipasi dalam Olimpiade Helsinki 1952 di Finlandia.
|
| 47 |
+
Setelah itu Indonesia sempat dua kali tidak ikut Olimpiade yaitu pada Olimpiade
|
| 48 |
+
Tokyo 1964 dan Olimpiade Moskwa 1980 karena boikot sehubungan dengan perang Soviet-Afganistan.
|
| 49 |
+
Sejak awal keikutsertaannya, tercatat Indonesia sudah mengumpulkan total 27 medali,
|
| 50 |
+
dengan rincian: 6 medali emas, 10 medali perak dan 11 medali perunggu. Berikut
|
| 51 |
+
pencapaian Indonesia selama mengikuti Olimpiade:'
|
| 52 |
+
- 'Pelita Air Service
|
| 53 |
+
|
| 54 |
+
PT. Pelita Air Service, biasanya disingkat menjadi Pelita Air atau PAS, adalah
|
| 55 |
+
maskapai penerbangan nasional di Indonesia. PT PAS memiliki basis udara ("air
|
| 56 |
+
base") di Bandar Udara Internasional Halim Perdanakusuma, dan memiliki Bandar
|
| 57 |
+
Udara Pondok Cabe (Jakarta Selatan). Kantor pusatnya yang terletak di Jl. Abdul
|
| 58 |
+
Muis, Jakarta Pusat, memiliki ratusan karyawan yang terdiri dari staf manajemen
|
| 59 |
+
serta jajaran karyawan udara ("air crew") yang terlatih dan dapat diandalkan.'
|
| 60 |
+
- 'Telepon genggam
|
| 61 |
+
|
| 62 |
+
Telepon genggam generasi pertama disebut juga 1G. 1-G merupakan telepon genggam
|
| 63 |
+
pertama yang sebenarnya. Tahun 1973, Martin Cooper dari Motorola Corp menemukan
|
| 64 |
+
telepon genggam pertama dan diperkenalkan kepada publik pada 3 April 1973. Telepon
|
| 65 |
+
genggam yang ditemukan oleh Cooper memiliki berat 30 ons atau sekitar 800 gram.
|
| 66 |
+
Penemuan inilah yang telah mengubah dunia selamanya.
|
| 67 |
+
|
| 68 |
+
Teknologi yang digunakan 1-G masih bersifat analog dan dikenal dengan istilah
|
| 69 |
+
AMPS. AMPS menggunakan frekuensi antara 825 Mhz- 894 Mhz dan dioperasikan pada
|
| 70 |
+
Band 800 Mhz. Karena bersifat analog, maka sistem yang digunakan masih bersifat
|
| 71 |
+
regional.
|
| 72 |
+
|
| 73 |
+
Salah satu kekurangan generasi 1-G adalah karena ukurannya yang terlalu besar
|
| 74 |
+
untuk dipegang oleh tangan. Ukuran yang besar ini dikarenakan keperluan tenaga
|
| 75 |
+
dan performa baterai yang kurang baik. Selain itu generasi 1-G masih memiliki
|
| 76 |
+
masalah dengan mobilitas pengguna. Pada saat melakukan panggilan, mobilitas pengguna
|
| 77 |
+
terbatas pada jangkauan area telepon genggam.'
|
| 78 |
+
- source_sentence: apakah jenis mesin yang digunakan Mikoyan-Gurevich MiG-23?
|
| 79 |
+
sentences:
|
| 80 |
+
- 'Nifas
|
| 81 |
+
|
| 82 |
+
Bisa jadi waktu keluarnya lama/panjang, dan terkadang singkat. Tidak ada batasan
|
| 83 |
+
minimal waktu nifas ini. Adapun waktu maksimalnya menurut mazhab Hambali adalah
|
| 84 |
+
40 hari, dan bila lebih dari 40 hari darah masih keluar sementara tidak bertepatan
|
| 85 |
+
dengan kebiasaan datangnya waktu haid maka darah tersebut adalah darah istihadhah.
|
| 86 |
+
Namun menurut pendapat yang shahih, tidak ada pula batasan waktu maksimal dari
|
| 87 |
+
nifas ini.'
|
| 88 |
+
- 'Mikoyan-Gurevich MiG-19
|
| 89 |
+
|
| 90 |
+
Di Uni Soviet MiG-19 telah berakhir produksinya pada akhir tahun 1950-an, karena
|
| 91 |
+
fokus untuk produksi MiG-21. Namun pada tahun 1958 lisensi untuk produksi MiG-19
|
| 92 |
+
yang telah disepakati dengan Cina tetapi, setelah persetujuain itu, hubungan antara
|
| 93 |
+
kedua negara memburuk. Namun produksi MiG-19 Cina tetap berjalan dengan kode F-6
|
| 94 |
+
(MiG-19S), terbang pertama kali Desember 1961. F-6 menjadi pesawat standar AU
|
| 95 |
+
Cina dari pertengahan 1962.'
|
| 96 |
+
- 'Peziarahan Kristen
|
| 97 |
+
|
| 98 |
+
Pada abad ke-7, Tanah Suci jatuh ke penaklukan Muslim, dan karena peziarahan menuju
|
| 99 |
+
Tanah Suci sekarang menjadi lebih sulit bagi Kristen Eropa, situs-situs peziarahan
|
| 100 |
+
utama berkembang di Eropa Barat, yang paling terkenal Santiago de Compostela pada
|
| 101 |
+
abad ke-9.'
|
| 102 |
+
- source_sentence: Apa yang dimaksud Hokage dalam serial komik Naruto ?
|
| 103 |
+
sentences:
|
| 104 |
+
- 'Daftar karakter Naruto
|
| 105 |
+
|
| 106 |
+
Konoha memiliki tujuh sejak Hokage Pertama mendirikan desa Konoha. Para Hokage
|
| 107 |
+
pada umumnya adalah pemimpin desa dan ninja terkuat Konoha. Hokage berasal dari
|
| 108 |
+
kata "ho" yang berarti api dan "kage" yang berarti bayangan. Jadi "hokage" dapat
|
| 109 |
+
berarti "bayangan api."'
|
| 110 |
+
- 'Algoritme gabung
|
| 111 |
+
|
| 112 |
+
Algoritme urut gabung membagi tabel menjadi dua tabel yang sama besar. Masing-masing
|
| 113 |
+
tabel diurutkan secara rekursif, dan kemudian digabungkan kembali untuk membentuk
|
| 114 |
+
tabel yang terurut. Implementasi dasar dari algoritme urut gabung memakai tiga
|
| 115 |
+
buah tabel, dua untuk menyimpan elemen dari tabel yang telah di bagi dua dan satu
|
| 116 |
+
untuk menyimpan elemen yang telah terurut. Namun algoritme ini dapat juga dilakukan
|
| 117 |
+
langsung pada dua tabel, sehingga menghemat ruang atau memori yang dibutuhkan.'
|
| 118 |
+
- 'Sangiang
|
| 119 |
+
|
| 120 |
+
Dalam upacara kematian (Mandung) roh-roh leluhur atau Sangiang utama yang dipanggil
|
| 121 |
+
ada dua, yaitu Sangiang Duhung Mama Tandang-Langkah Sawang Mama Bangai dan Rawing
|
| 122 |
+
Tempon Telu. Kedua Sangiang Utusan Tuhan itu turun untuk melepaskan pali atau
|
| 123 |
+
tabu dari pribadi atau keluarga atau desa yang melaksanakan upacara. Melepaskan
|
| 124 |
+
pali juga untuk membebaskan orang yang mati dari kesalahan selama dia hidup. Dalam
|
| 125 |
+
sastera suci suku Dayak Ngaju, Panaturan. Digambarkan disana, Raja Banjar yang
|
| 126 |
+
bernama Raja Maruhum beserta Putri Dayak yang menjadi isterinya yang bernama Nyai
|
| 127 |
+
Siti Diang Lawai adalah bagian leluhur orang Dayak Ngaju. Bahkan mereka berdua
|
| 128 |
+
juga diproyeksikan sebagai sangiang (manusia illahi) yang tinggal di Lewu Tambak
|
| 129 |
+
Raja, salah satu tempat di Lewu Sangiang (Perkampungan para Dewa). Karena Sang
|
| 130 |
+
Raja beragama Islam maka disana disebutkan juga ada masjid.'
|
| 131 |
+
- source_sentence: Apakah kartu sim pertama di Indonesia?
|
| 132 |
+
sentences:
|
| 133 |
+
- 'Fusajiro Yamauchi
|
| 134 |
+
|
| 135 |
+
Pada tanggal 6 November 1889, Fusajiro Yamauchi membuka "Hanafuda" (kartu bunga)
|
| 136 |
+
pertama toko kartu yang disebut "Nintendo Koppai" di saat pemerintah Jepang melarang
|
| 137 |
+
bermain kartu dari tangan masyarakat, karena mereka terikat dengan perjudian ,
|
| 138 |
+
dengan pengecualian dari bermain kartu Yamauchi ini. Dengan sukses besar yang
|
| 139 |
+
ada dalam penjualan kartu tersebut, hal itu dengan cepat mulai berkembang dan
|
| 140 |
+
membuka toko kartu lain di Osaka. Dia kemudian melanjutkan untuk menciptakan permainan
|
| 141 |
+
kartu lebih banyak lagi. Dia membuat Nintendo setelah kena "banned".'
|
| 142 |
+
- 'Detektif Conan
|
| 143 |
+
|
| 144 |
+
Selanjutnya dalam seri ini, tokoh utama lainnya, Ai Haibara, muncul. Ai adalah
|
| 145 |
+
seorang mantan anggota Organisasi Hitam, yang memiliki nama sandi "Sherry". Nama
|
| 146 |
+
aslinya adalah Shiho Miyano, seorang ilmuan yang mengembangkan racun APTX 4869
|
| 147 |
+
yang membuat tubuh Shinichi mengecil. Setelah kakaknya secara kejam dibunuh oleh
|
| 148 |
+
anggota Organisasi Hitam, ia mencoba keluar dari organisasi itu, namun ia ditangkap.
|
| 149 |
+
Dia mencoba bunuh diri dengan menelan pil APTX 4869, namun ternyata tubuhnya mengecil,
|
| 150 |
+
dan dia berhasil kabur dari organisasi tersebut. Dia kemudian bersekolah di SD
|
| 151 |
+
Teitan dengan nama samaran "Ai Haibara". Dia mengetahui identitas asli Conan dan
|
| 152 |
+
membantunya dalam perjuangan Conan untuk menjatuhkan Organisasi Hitam. Selain
|
| 153 |
+
Ai Haibara, Profesor Agasa, dan kedua orangtua Shinichi, orang yang juga mengetahui
|
| 154 |
+
identitas asli Conan adalah Heiji Hattori, seorang detektif SMA dari Osaka. Pada
|
| 155 |
+
mulanya, Heiji dikenal sebagai detektif SMA dari Barat dan mengaku sebagai saingan
|
| 156 |
+
Shinichi, detektif SMA dari Timur. Dengan berkembangnya cerita, Heiji pun kini
|
| 157 |
+
menjadi sahabat dekat Shinichi (Conan).'
|
| 158 |
+
- 'Hari Republik
|
| 159 |
+
|
| 160 |
+
"
|
| 161 |
+
|
| 162 |
+
India mendapatkan kemerdekaan pada 15 Agustus 1947 setelah undang-undang dasar
|
| 163 |
+
dibuat. Undang-undang dasar India disahkan pada 26 November 1949 oleh Badan Konstituante.
|
| 164 |
+
UUD tersebut mulai digunakan pada 26 Januari 1950 dengan sistem pemerintahan yang
|
| 165 |
+
demokratis, ketika negara ini benar-benar menjadi republik. Pemilihan tanggal
|
| 166 |
+
26 Januari didasarkan pada hari yang sama pada 1930 ketika Proklamasi Kemerdekaan
|
| 167 |
+
India disetujui.'
|
| 168 |
+
pipeline_tag: sentence-similarity
|
| 169 |
+
library_name: sentence-transformers
|
| 170 |
+
metrics:
|
| 171 |
+
- pearson_cosine
|
| 172 |
+
- spearman_cosine
|
| 173 |
+
model-index:
|
| 174 |
+
- name: SentenceTransformer based on muchad/mdeberta-hybrid-30k
|
| 175 |
+
results:
|
| 176 |
+
- task:
|
| 177 |
+
type: semantic-similarity
|
| 178 |
+
name: Semantic Similarity
|
| 179 |
+
dataset:
|
| 180 |
+
name: sts combined val
|
| 181 |
+
type: sts-combined-val
|
| 182 |
+
metrics:
|
| 183 |
+
- type: pearson_cosine
|
| 184 |
+
value: 0.7664247640751756
|
| 185 |
+
name: Pearson Cosine
|
| 186 |
+
- type: spearman_cosine
|
| 187 |
+
value: 0.7703953644725257
|
| 188 |
+
name: Spearman Cosine
|
| 189 |
+
---
|
| 190 |
+
|
| 191 |
+
# SentenceTransformer based on muchad/mdeberta-hybrid-30k
|
| 192 |
+
|
| 193 |
+
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [muchad/mdeberta-hybrid-30k](https://huggingface.co/muchad/mdeberta-hybrid-30k). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
|
| 194 |
+
|
| 195 |
+
## Model Details
|
| 196 |
+
|
| 197 |
+
### Model Description
|
| 198 |
+
- **Model Type:** Sentence Transformer
|
| 199 |
+
- **Base model:** [muchad/mdeberta-hybrid-30k](https://huggingface.co/muchad/mdeberta-hybrid-30k) <!-- at revision 0ea6ba25471bcde44f652fa228e89dc89b10458f -->
|
| 200 |
+
- **Maximum Sequence Length:** 512 tokens
|
| 201 |
+
- **Output Dimensionality:** 768 dimensions
|
| 202 |
+
- **Similarity Function:** Cosine Similarity
|
| 203 |
+
<!-- - **Training Dataset:** Unknown -->
|
| 204 |
+
<!-- - **Language:** Unknown -->
|
| 205 |
+
<!-- - **License:** Unknown -->
|
| 206 |
+
|
| 207 |
+
### Model Sources
|
| 208 |
+
|
| 209 |
+
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
|
| 210 |
+
- **Repository:** [Sentence Transformers on GitHub](https://github.com/huggingface/sentence-transformers)
|
| 211 |
+
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
|
| 212 |
+
|
| 213 |
+
### Full Model Architecture
|
| 214 |
+
|
| 215 |
+
```
|
| 216 |
+
SentenceTransformer(
|
| 217 |
+
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'DebertaV2Model'})
|
| 218 |
+
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
|
| 219 |
+
)
|
| 220 |
+
```
|
| 221 |
+
|
| 222 |
+
## Usage
|
| 223 |
+
|
| 224 |
+
### Direct Usage (Sentence Transformers)
|
| 225 |
+
|
| 226 |
+
First install the Sentence Transformers library:
|
| 227 |
+
|
| 228 |
+
```bash
|
| 229 |
+
pip install -U sentence-transformers
|
| 230 |
+
```
|
| 231 |
+
|
| 232 |
+
Then you can load this model and run inference.
|
| 233 |
+
```python
|
| 234 |
+
from sentence_transformers import SentenceTransformer
|
| 235 |
+
|
| 236 |
+
# Download from the π€ Hub
|
| 237 |
+
model = SentenceTransformer("sentence_transformers_model_id")
|
| 238 |
+
# Run inference
|
| 239 |
+
sentences = [
|
| 240 |
+
'Apakah kartu sim pertama di Indonesia?',
|
| 241 |
+
'Fusajiro Yamauchi\nPada tanggal 6 November 1889, Fusajiro Yamauchi membuka "Hanafuda" (kartu bunga) pertama toko kartu yang disebut "Nintendo Koppai" di saat pemerintah Jepang melarang bermain kartu dari tangan masyarakat, karena mereka terikat dengan perjudian , dengan pengecualian dari bermain kartu Yamauchi ini. Dengan sukses besar yang ada dalam penjualan kartu tersebut, hal itu dengan cepat mulai berkembang dan membuka toko kartu lain di Osaka. Dia kemudian melanjutkan untuk menciptakan permainan kartu lebih banyak lagi. Dia membuat Nintendo setelah kena "banned".',
|
| 242 |
+
'Detektif Conan\nSelanjutnya dalam seri ini, tokoh utama lainnya, Ai Haibara, muncul. Ai adalah seorang mantan anggota Organisasi Hitam, yang memiliki nama sandi "Sherry". Nama aslinya adalah Shiho Miyano, seorang ilmuan yang mengembangkan racun APTX 4869 yang membuat tubuh Shinichi mengecil. Setelah kakaknya secara kejam dibunuh oleh anggota Organisasi Hitam, ia mencoba keluar dari organisasi itu, namun ia ditangkap. Dia mencoba bunuh diri dengan menelan pil APTX 4869, namun ternyata tubuhnya mengecil, dan dia berhasil kabur dari organisasi tersebut. Dia kemudian bersekolah di SD Teitan dengan nama samaran "Ai Haibara". Dia mengetahui identitas asli Conan dan membantunya dalam perjuangan Conan untuk menjatuhkan Organisasi Hitam. Selain Ai Haibara, Profesor Agasa, dan kedua orangtua Shinichi, orang yang juga mengetahui identitas asli Conan adalah Heiji Hattori, seorang detektif SMA dari Osaka. Pada mulanya, Heiji dikenal sebagai detektif SMA dari Barat dan mengaku sebagai saingan Shinichi, detektif SMA dari Timur. Dengan berkembangnya cerita, Heiji pun kini menjadi sahabat dekat Shinichi (Conan).',
|
| 243 |
+
]
|
| 244 |
+
embeddings = model.encode(sentences)
|
| 245 |
+
print(embeddings.shape)
|
| 246 |
+
# [3, 768]
|
| 247 |
+
|
| 248 |
+
# Get the similarity scores for the embeddings
|
| 249 |
+
similarities = model.similarity(embeddings, embeddings)
|
| 250 |
+
print(similarities)
|
| 251 |
+
# tensor([[1.0001, 0.2194, 0.0071],
|
| 252 |
+
# [0.2194, 1.0000, 0.2776],
|
| 253 |
+
# [0.0071, 0.2776, 1.0001]])
|
| 254 |
+
```
|
| 255 |
+
|
| 256 |
+
<!--
|
| 257 |
+
### Direct Usage (Transformers)
|
| 258 |
+
|
| 259 |
+
<details><summary>Click to see the direct usage in Transformers</summary>
|
| 260 |
+
|
| 261 |
+
</details>
|
| 262 |
+
-->
|
| 263 |
+
|
| 264 |
+
<!--
|
| 265 |
+
### Downstream Usage (Sentence Transformers)
|
| 266 |
+
|
| 267 |
+
You can finetune this model on your own dataset.
|
| 268 |
+
|
| 269 |
+
<details><summary>Click to expand</summary>
|
| 270 |
+
|
| 271 |
+
</details>
|
| 272 |
+
-->
|
| 273 |
+
|
| 274 |
+
<!--
|
| 275 |
+
### Out-of-Scope Use
|
| 276 |
+
|
| 277 |
+
*List how the model may foreseeably be misused and address what users ought not to do with the model.*
|
| 278 |
+
-->
|
| 279 |
+
|
| 280 |
+
## Evaluation
|
| 281 |
+
|
| 282 |
+
### Metrics
|
| 283 |
+
|
| 284 |
+
#### Semantic Similarity
|
| 285 |
+
|
| 286 |
+
* Dataset: `sts-combined-val`
|
| 287 |
+
* Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
|
| 288 |
+
|
| 289 |
+
| Metric | Value |
|
| 290 |
+
|:--------------------|:-----------|
|
| 291 |
+
| pearson_cosine | 0.7664 |
|
| 292 |
+
| **spearman_cosine** | **0.7704** |
|
| 293 |
+
|
| 294 |
+
<!--
|
| 295 |
+
## Bias, Risks and Limitations
|
| 296 |
+
|
| 297 |
+
*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
|
| 298 |
+
-->
|
| 299 |
+
|
| 300 |
+
<!--
|
| 301 |
+
### Recommendations
|
| 302 |
+
|
| 303 |
+
*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
|
| 304 |
+
-->
|
| 305 |
+
|
| 306 |
+
## Training Details
|
| 307 |
+
|
| 308 |
+
### Training Dataset
|
| 309 |
+
|
| 310 |
+
#### Unnamed Dataset
|
| 311 |
+
|
| 312 |
+
* Size: 22,925 training samples
|
| 313 |
+
* Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
|
| 314 |
+
* Approximate statistics based on the first 1000 samples:
|
| 315 |
+
| | sentence_0 | sentence_1 | label |
|
| 316 |
+
|:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------|
|
| 317 |
+
| type | string | string | float |
|
| 318 |
+
| details | <ul><li>min: 7 tokens</li><li>mean: 18.61 tokens</li><li>max: 82 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 18.75 tokens</li><li>max: 58 tokens</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.52</li><li>max: 1.0</li></ul> |
|
| 319 |
+
* Samples:
|
| 320 |
+
| sentence_0 | sentence_1 | label |
|
| 321 |
+
|:----------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------|:------------------|
|
| 322 |
+
| <code>Potensi Desa (Podes) adalah survei tiga tahunan yang mengumpulkan data mengenai potensi dan perkembangan desa.</code> | <code>Data Podes digunakan untuk perencanaan pembangunan wilayah dan evaluasi program desa.</code> | <code>0.8</code> |
|
| 323 |
+
| <code>Produksi batubara Indonesia tahun 2022 mencapai 687 juta ton.</code> | <code>Ekspor batubara Indonesia tahun 2022 mencapai 450 juta ton.</code> | <code>0.6</code> |
|
| 324 |
+
| <code>Tidak begitu Super Tuesday untuk Mitt Romney</code> | <code>Romney memimpin Santorum dalam kontes Super Tuesday GOP</code> | <code>0.32</code> |
|
| 325 |
+
* Loss: [<code>CosineSimilarityLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosinesimilarityloss) with these parameters:
|
| 326 |
+
```json
|
| 327 |
+
{
|
| 328 |
+
"loss_fct": "torch.nn.modules.loss.MSELoss"
|
| 329 |
+
}
|
| 330 |
+
```
|
| 331 |
+
|
| 332 |
+
### Training Hyperparameters
|
| 333 |
+
#### Non-Default Hyperparameters
|
| 334 |
+
|
| 335 |
+
- `eval_strategy`: steps
|
| 336 |
+
- `per_device_train_batch_size`: 48
|
| 337 |
+
- `per_device_eval_batch_size`: 48
|
| 338 |
+
- `num_train_epochs`: 8
|
| 339 |
+
- `fp16`: True
|
| 340 |
+
- `multi_dataset_batch_sampler`: round_robin
|
| 341 |
+
|
| 342 |
+
#### All Hyperparameters
|
| 343 |
+
<details><summary>Click to expand</summary>
|
| 344 |
+
|
| 345 |
+
- `overwrite_output_dir`: False
|
| 346 |
+
- `do_predict`: False
|
| 347 |
+
- `eval_strategy`: steps
|
| 348 |
+
- `prediction_loss_only`: True
|
| 349 |
+
- `per_device_train_batch_size`: 48
|
| 350 |
+
- `per_device_eval_batch_size`: 48
|
| 351 |
+
- `per_gpu_train_batch_size`: None
|
| 352 |
+
- `per_gpu_eval_batch_size`: None
|
| 353 |
+
- `gradient_accumulation_steps`: 1
|
| 354 |
+
- `eval_accumulation_steps`: None
|
| 355 |
+
- `torch_empty_cache_steps`: None
|
| 356 |
+
- `learning_rate`: 5e-05
|
| 357 |
+
- `weight_decay`: 0.0
|
| 358 |
+
- `adam_beta1`: 0.9
|
| 359 |
+
- `adam_beta2`: 0.999
|
| 360 |
+
- `adam_epsilon`: 1e-08
|
| 361 |
+
- `max_grad_norm`: 1
|
| 362 |
+
- `num_train_epochs`: 8
|
| 363 |
+
- `max_steps`: -1
|
| 364 |
+
- `lr_scheduler_type`: linear
|
| 365 |
+
- `lr_scheduler_kwargs`: {}
|
| 366 |
+
- `warmup_ratio`: 0.0
|
| 367 |
+
- `warmup_steps`: 0
|
| 368 |
+
- `log_level`: passive
|
| 369 |
+
- `log_level_replica`: warning
|
| 370 |
+
- `log_on_each_node`: True
|
| 371 |
+
- `logging_nan_inf_filter`: True
|
| 372 |
+
- `save_safetensors`: True
|
| 373 |
+
- `save_on_each_node`: False
|
| 374 |
+
- `save_only_model`: False
|
| 375 |
+
- `restore_callback_states_from_checkpoint`: False
|
| 376 |
+
- `no_cuda`: False
|
| 377 |
+
- `use_cpu`: False
|
| 378 |
+
- `use_mps_device`: False
|
| 379 |
+
- `seed`: 42
|
| 380 |
+
- `data_seed`: None
|
| 381 |
+
- `jit_mode_eval`: False
|
| 382 |
+
- `use_ipex`: False
|
| 383 |
+
- `bf16`: False
|
| 384 |
+
- `fp16`: True
|
| 385 |
+
- `fp16_opt_level`: O1
|
| 386 |
+
- `half_precision_backend`: auto
|
| 387 |
+
- `bf16_full_eval`: False
|
| 388 |
+
- `fp16_full_eval`: False
|
| 389 |
+
- `tf32`: None
|
| 390 |
+
- `local_rank`: 0
|
| 391 |
+
- `ddp_backend`: None
|
| 392 |
+
- `tpu_num_cores`: None
|
| 393 |
+
- `tpu_metrics_debug`: False
|
| 394 |
+
- `debug`: []
|
| 395 |
+
- `dataloader_drop_last`: False
|
| 396 |
+
- `dataloader_num_workers`: 0
|
| 397 |
+
- `dataloader_prefetch_factor`: None
|
| 398 |
+
- `past_index`: -1
|
| 399 |
+
- `disable_tqdm`: False
|
| 400 |
+
- `remove_unused_columns`: True
|
| 401 |
+
- `label_names`: None
|
| 402 |
+
- `load_best_model_at_end`: False
|
| 403 |
+
- `ignore_data_skip`: False
|
| 404 |
+
- `fsdp`: []
|
| 405 |
+
- `fsdp_min_num_params`: 0
|
| 406 |
+
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
|
| 407 |
+
- `fsdp_transformer_layer_cls_to_wrap`: None
|
| 408 |
+
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
|
| 409 |
+
- `deepspeed`: None
|
| 410 |
+
- `label_smoothing_factor`: 0.0
|
| 411 |
+
- `optim`: adamw_torch
|
| 412 |
+
- `optim_args`: None
|
| 413 |
+
- `adafactor`: False
|
| 414 |
+
- `group_by_length`: False
|
| 415 |
+
- `length_column_name`: length
|
| 416 |
+
- `ddp_find_unused_parameters`: None
|
| 417 |
+
- `ddp_bucket_cap_mb`: None
|
| 418 |
+
- `ddp_broadcast_buffers`: False
|
| 419 |
+
- `dataloader_pin_memory`: True
|
| 420 |
+
- `dataloader_persistent_workers`: False
|
| 421 |
+
- `skip_memory_metrics`: True
|
| 422 |
+
- `use_legacy_prediction_loop`: False
|
| 423 |
+
- `push_to_hub`: False
|
| 424 |
+
- `resume_from_checkpoint`: None
|
| 425 |
+
- `hub_model_id`: None
|
| 426 |
+
- `hub_strategy`: every_save
|
| 427 |
+
- `hub_private_repo`: False
|
| 428 |
+
- `hub_always_push`: False
|
| 429 |
+
- `gradient_checkpointing`: False
|
| 430 |
+
- `gradient_checkpointing_kwargs`: None
|
| 431 |
+
- `include_inputs_for_metrics`: False
|
| 432 |
+
- `include_for_metrics`: []
|
| 433 |
+
- `eval_do_concat_batches`: True
|
| 434 |
+
- `fp16_backend`: auto
|
| 435 |
+
- `push_to_hub_model_id`: None
|
| 436 |
+
- `push_to_hub_organization`: None
|
| 437 |
+
- `mp_parameters`:
|
| 438 |
+
- `auto_find_batch_size`: False
|
| 439 |
+
- `full_determinism`: False
|
| 440 |
+
- `torchdynamo`: None
|
| 441 |
+
- `ray_scope`: last
|
| 442 |
+
- `ddp_timeout`: 1800
|
| 443 |
+
- `torch_compile`: False
|
| 444 |
+
- `torch_compile_backend`: None
|
| 445 |
+
- `torch_compile_mode`: None
|
| 446 |
+
- `dispatch_batches`: None
|
| 447 |
+
- `split_batches`: None
|
| 448 |
+
- `include_tokens_per_second`: False
|
| 449 |
+
- `include_num_input_tokens_seen`: False
|
| 450 |
+
- `neftune_noise_alpha`: None
|
| 451 |
+
- `optim_target_modules`: None
|
| 452 |
+
- `batch_eval_metrics`: False
|
| 453 |
+
- `eval_on_start`: False
|
| 454 |
+
- `use_liger_kernel`: False
|
| 455 |
+
- `eval_use_gather_object`: False
|
| 456 |
+
- `average_tokens_across_devices`: False
|
| 457 |
+
- `prompts`: None
|
| 458 |
+
- `batch_sampler`: batch_sampler
|
| 459 |
+
- `multi_dataset_batch_sampler`: round_robin
|
| 460 |
+
- `router_mapping`: {}
|
| 461 |
+
- `learning_rate_mapping`: {}
|
| 462 |
+
|
| 463 |
+
</details>
|
| 464 |
+
|
| 465 |
+
### Training Logs
|
| 466 |
+
<details><summary>Click to expand</summary>
|
| 467 |
+
|
| 468 |
+
| Epoch | Step | Training Loss | sts-combined-val_spearman_cosine |
|
| 469 |
+
|:------:|:-----:|:-------------:|:--------------------------------:|
|
| 470 |
+
| 0.0124 | 500 | 3.9542 | - |
|
| 471 |
+
| 0.0249 | 1000 | 2.4436 | - |
|
| 472 |
+
| 0.0373 | 1500 | 1.4335 | - |
|
| 473 |
+
| 0.0497 | 2000 | 1.0818 | - |
|
| 474 |
+
| 0.0622 | 2500 | 0.8748 | - |
|
| 475 |
+
| 0.0746 | 3000 | 0.7437 | - |
|
| 476 |
+
| 0.0870 | 3500 | 0.6698 | - |
|
| 477 |
+
| 0.0995 | 4000 | 0.5969 | - |
|
| 478 |
+
| 0.1119 | 4500 | 0.5466 | - |
|
| 479 |
+
| 0.1243 | 5000 | 0.513 | - |
|
| 480 |
+
| 0.1368 | 5500 | 0.4734 | - |
|
| 481 |
+
| 0.1492 | 6000 | 0.4369 | - |
|
| 482 |
+
| 0.1616 | 6500 | 0.4156 | - |
|
| 483 |
+
| 0.1741 | 7000 | 0.4056 | - |
|
| 484 |
+
| 0.1865 | 7500 | 0.3758 | - |
|
| 485 |
+
| 0.1989 | 8000 | 0.366 | - |
|
| 486 |
+
| 0.2114 | 8500 | 0.3599 | - |
|
| 487 |
+
| 0.2238 | 9000 | 0.3493 | - |
|
| 488 |
+
| 0.2362 | 9500 | 0.3417 | - |
|
| 489 |
+
| 0.2487 | 10000 | 0.331 | - |
|
| 490 |
+
| 0.2611 | 10500 | 0.3186 | - |
|
| 491 |
+
| 0.2735 | 11000 | 0.3182 | - |
|
| 492 |
+
| 0.2860 | 11500 | 0.3123 | - |
|
| 493 |
+
| 0.2984 | 12000 | 0.2948 | - |
|
| 494 |
+
| 0.3108 | 12500 | 0.2987 | - |
|
| 495 |
+
| 0.3233 | 13000 | 0.2907 | - |
|
| 496 |
+
| 0.3357 | 13500 | 0.2843 | - |
|
| 497 |
+
| 0.3481 | 14000 | 0.2791 | - |
|
| 498 |
+
| 0.3606 | 14500 | 0.2734 | - |
|
| 499 |
+
| 0.3730 | 15000 | 0.2664 | - |
|
| 500 |
+
| 0.3854 | 15500 | 0.2612 | - |
|
| 501 |
+
| 0.3979 | 16000 | 0.2594 | - |
|
| 502 |
+
| 0.4103 | 16500 | 0.2549 | - |
|
| 503 |
+
| 0.4227 | 17000 | 0.2503 | - |
|
| 504 |
+
| 0.4352 | 17500 | 0.2535 | - |
|
| 505 |
+
| 0.4476 | 18000 | 0.2481 | - |
|
| 506 |
+
| 0.4601 | 18500 | 0.237 | - |
|
| 507 |
+
| 0.4725 | 19000 | 0.2381 | - |
|
| 508 |
+
| 0.4849 | 19500 | 0.247 | - |
|
| 509 |
+
| 0.4974 | 20000 | 0.2297 | - |
|
| 510 |
+
| 0.5098 | 20500 | 0.2269 | - |
|
| 511 |
+
| 0.5222 | 21000 | 0.2298 | - |
|
| 512 |
+
| 0.5347 | 21500 | 0.2275 | - |
|
| 513 |
+
| 0.5471 | 22000 | 0.2259 | - |
|
| 514 |
+
| 0.5595 | 22500 | 0.2188 | - |
|
| 515 |
+
| 0.5720 | 23000 | 0.2176 | - |
|
| 516 |
+
| 0.5844 | 23500 | 0.2167 | - |
|
| 517 |
+
| 0.5968 | 24000 | 0.2175 | - |
|
| 518 |
+
| 0.6093 | 24500 | 0.2145 | - |
|
| 519 |
+
| 0.6217 | 25000 | 0.2113 | - |
|
| 520 |
+
| 0.6341 | 25500 | 0.2125 | - |
|
| 521 |
+
| 0.6466 | 26000 | 0.2086 | - |
|
| 522 |
+
| 0.6590 | 26500 | 0.2076 | - |
|
| 523 |
+
| 0.6714 | 27000 | 0.2023 | - |
|
| 524 |
+
| 0.6839 | 27500 | 0.2026 | - |
|
| 525 |
+
| 0.6963 | 28000 | 0.2011 | - |
|
| 526 |
+
| 0.7087 | 28500 | 0.1971 | - |
|
| 527 |
+
| 0.7212 | 29000 | 0.1968 | - |
|
| 528 |
+
| 0.7336 | 29500 | 0.2009 | - |
|
| 529 |
+
| 0.7460 | 30000 | 0.1858 | - |
|
| 530 |
+
| 0.7585 | 30500 | 0.1924 | - |
|
| 531 |
+
| 0.7709 | 31000 | 0.1902 | - |
|
| 532 |
+
| 0.7833 | 31500 | 0.1898 | - |
|
| 533 |
+
| 0.7958 | 32000 | 0.1884 | - |
|
| 534 |
+
| 0.8082 | 32500 | 0.1861 | - |
|
| 535 |
+
| 0.8206 | 33000 | 0.1844 | - |
|
| 536 |
+
| 0.8331 | 33500 | 0.1846 | - |
|
| 537 |
+
| 0.8455 | 34000 | 0.1791 | - |
|
| 538 |
+
| 0.8579 | 34500 | 0.1852 | - |
|
| 539 |
+
| 0.8704 | 35000 | 0.181 | - |
|
| 540 |
+
| 0.8828 | 35500 | 0.1818 | - |
|
| 541 |
+
| 0.8952 | 36000 | 0.1771 | - |
|
| 542 |
+
| 0.9077 | 36500 | 0.1791 | - |
|
| 543 |
+
| 0.9201 | 37000 | 0.172 | - |
|
| 544 |
+
| 0.9325 | 37500 | 0.1678 | - |
|
| 545 |
+
| 0.9450 | 38000 | 0.1713 | - |
|
| 546 |
+
| 0.9574 | 38500 | 0.1702 | - |
|
| 547 |
+
| 0.9698 | 39000 | 0.1692 | - |
|
| 548 |
+
| 0.9823 | 39500 | 0.1678 | - |
|
| 549 |
+
| 0.9947 | 40000 | 0.1691 | - |
|
| 550 |
+
| 1.0071 | 40500 | 0.1617 | - |
|
| 551 |
+
| 1.0196 | 41000 | 0.1482 | - |
|
| 552 |
+
| 1.0320 | 41500 | 0.1513 | - |
|
| 553 |
+
| 1.0444 | 42000 | 0.1506 | - |
|
| 554 |
+
| 1.0569 | 42500 | 0.1508 | - |
|
| 555 |
+
| 1.0693 | 43000 | 0.1493 | - |
|
| 556 |
+
| 1.0817 | 43500 | 0.1513 | - |
|
| 557 |
+
| 1.0942 | 44000 | 0.1482 | - |
|
| 558 |
+
| 1.1066 | 44500 | 0.1482 | - |
|
| 559 |
+
| 1.1190 | 45000 | 0.1495 | - |
|
| 560 |
+
| 1.1315 | 45500 | 0.1443 | - |
|
| 561 |
+
| 1.1439 | 46000 | 0.1433 | - |
|
| 562 |
+
| 1.1563 | 46500 | 0.1435 | - |
|
| 563 |
+
| 1.1688 | 47000 | 0.1407 | - |
|
| 564 |
+
| 1.1812 | 47500 | 0.1476 | - |
|
| 565 |
+
| 1.1936 | 48000 | 0.1434 | - |
|
| 566 |
+
| 1.2061 | 48500 | 0.1441 | - |
|
| 567 |
+
| 1.2185 | 49000 | 0.1442 | - |
|
| 568 |
+
| 1.2309 | 49500 | 0.1401 | - |
|
| 569 |
+
| 1.2434 | 50000 | 0.1439 | - |
|
| 570 |
+
| 1.2558 | 50500 | 0.1409 | - |
|
| 571 |
+
| 1.2682 | 51000 | 0.1395 | - |
|
| 572 |
+
| 1.2807 | 51500 | 0.1354 | - |
|
| 573 |
+
| 1.2931 | 52000 | 0.1396 | - |
|
| 574 |
+
| 1.3055 | 52500 | 0.1369 | - |
|
| 575 |
+
| 1.3180 | 53000 | 0.1362 | - |
|
| 576 |
+
| 1.3304 | 53500 | 0.1397 | - |
|
| 577 |
+
| 1.3428 | 54000 | 0.1402 | - |
|
| 578 |
+
| 1.3553 | 54500 | 0.1405 | - |
|
| 579 |
+
| 1.3677 | 55000 | 0.1305 | - |
|
| 580 |
+
| 1.3802 | 55500 | 0.1391 | - |
|
| 581 |
+
| 1.3926 | 56000 | 0.1371 | - |
|
| 582 |
+
| 1.4050 | 56500 | 0.1381 | - |
|
| 583 |
+
| 1.4175 | 57000 | 0.1388 | - |
|
| 584 |
+
| 1.4299 | 57500 | 0.1286 | - |
|
| 585 |
+
| 1.4423 | 58000 | 0.1352 | - |
|
| 586 |
+
| 1.4548 | 58500 | 0.1353 | - |
|
| 587 |
+
| 1.4672 | 59000 | 0.1304 | - |
|
| 588 |
+
| 1.4796 | 59500 | 0.1331 | - |
|
| 589 |
+
| 1.4921 | 60000 | 0.1324 | - |
|
| 590 |
+
| 1.5045 | 60500 | 0.133 | - |
|
| 591 |
+
| 1.5169 | 61000 | 0.1263 | - |
|
| 592 |
+
| 1.5294 | 61500 | 0.129 | - |
|
| 593 |
+
| 1.5418 | 62000 | 0.1276 | - |
|
| 594 |
+
| 1.5542 | 62500 | 0.1249 | - |
|
| 595 |
+
| 1.5667 | 63000 | 0.1278 | - |
|
| 596 |
+
| 1.5791 | 63500 | 0.1289 | - |
|
| 597 |
+
| 1.5915 | 64000 | 0.1278 | - |
|
| 598 |
+
| 1.6040 | 64500 | 0.1271 | - |
|
| 599 |
+
| 1.6164 | 65000 | 0.1286 | - |
|
| 600 |
+
| 1.6288 | 65500 | 0.1278 | - |
|
| 601 |
+
| 1.6413 | 66000 | 0.1242 | - |
|
| 602 |
+
| 1.6537 | 66500 | 0.123 | - |
|
| 603 |
+
| 1.6661 | 67000 | 0.1244 | - |
|
| 604 |
+
| 1.6786 | 67500 | 0.1247 | - |
|
| 605 |
+
| 1.6910 | 68000 | 0.1256 | - |
|
| 606 |
+
| 1.7034 | 68500 | 0.1279 | - |
|
| 607 |
+
| 1.7159 | 69000 | 0.1253 | - |
|
| 608 |
+
| 1.7283 | 69500 | 0.1314 | - |
|
| 609 |
+
| 1.7407 | 70000 | 0.1248 | - |
|
| 610 |
+
| 1.7532 | 70500 | 0.1266 | - |
|
| 611 |
+
| 1.7656 | 71000 | 0.1235 | - |
|
| 612 |
+
| 1.7780 | 71500 | 0.1223 | - |
|
| 613 |
+
| 1.7905 | 72000 | 0.1234 | - |
|
| 614 |
+
| 1.8029 | 72500 | 0.1249 | - |
|
| 615 |
+
| 1.8153 | 73000 | 0.1212 | - |
|
| 616 |
+
| 1.8278 | 73500 | 0.1232 | - |
|
| 617 |
+
| 1.8402 | 74000 | 0.1268 | - |
|
| 618 |
+
| 1.8526 | 74500 | 0.1235 | - |
|
| 619 |
+
| 1.8651 | 75000 | 0.1279 | - |
|
| 620 |
+
| 1.8775 | 75500 | 0.12 | - |
|
| 621 |
+
| 1.8899 | 76000 | 0.1212 | - |
|
| 622 |
+
| 1.9024 | 76500 | 0.1225 | - |
|
| 623 |
+
| 1.9148 | 77000 | 0.1254 | - |
|
| 624 |
+
| 1.9272 | 77500 | 0.1205 | - |
|
| 625 |
+
| 1.9397 | 78000 | 0.1255 | - |
|
| 626 |
+
| 1.9521 | 78500 | 0.1257 | - |
|
| 627 |
+
| 1.9645 | 79000 | 0.118 | - |
|
| 628 |
+
| 1.9770 | 79500 | 0.1245 | - |
|
| 629 |
+
| 1.9894 | 80000 | 0.1234 | - |
|
| 630 |
+
| 0.4184 | 200 | - | 0.7129 |
|
| 631 |
+
| 0.8368 | 400 | - | 0.7375 |
|
| 632 |
+
| 1.0 | 478 | - | 0.7452 |
|
| 633 |
+
| 1.0460 | 500 | 0.0663 | - |
|
| 634 |
+
| 1.2552 | 600 | - | 0.7493 |
|
| 635 |
+
| 1.6736 | 800 | - | 0.7518 |
|
| 636 |
+
| 2.0 | 956 | - | 0.7587 |
|
| 637 |
+
| 2.0921 | 1000 | 0.0326 | 0.7580 |
|
| 638 |
+
| 2.5105 | 1200 | - | 0.7623 |
|
| 639 |
+
| 2.9289 | 1400 | - | 0.7636 |
|
| 640 |
+
| 3.0 | 1434 | - | 0.7623 |
|
| 641 |
+
| 3.1381 | 1500 | 0.0248 | - |
|
| 642 |
+
| 3.3473 | 1600 | - | 0.7638 |
|
| 643 |
+
| 3.7657 | 1800 | - | 0.7642 |
|
| 644 |
+
| 4.0 | 1912 | - | 0.7666 |
|
| 645 |
+
| 4.1841 | 2000 | 0.0207 | 0.7667 |
|
| 646 |
+
| 4.6025 | 2200 | - | 0.7680 |
|
| 647 |
+
| 5.0 | 2390 | - | 0.7685 |
|
| 648 |
+
| 5.0209 | 2400 | - | 0.7679 |
|
| 649 |
+
| 5.2301 | 2500 | 0.018 | - |
|
| 650 |
+
| 5.4393 | 2600 | - | 0.7685 |
|
| 651 |
+
| 5.8577 | 2800 | - | 0.7692 |
|
| 652 |
+
| 6.0 | 2868 | - | 0.7704 |
|
| 653 |
+
|
| 654 |
+
</details>
|
| 655 |
+
|
| 656 |
+
### Framework Versions
|
| 657 |
+
- Python: 3.10.18
|
| 658 |
+
- Sentence Transformers: 5.1.2
|
| 659 |
+
- Transformers: 4.46.3
|
| 660 |
+
- PyTorch: 2.5.1
|
| 661 |
+
- Accelerate: 1.10.1
|
| 662 |
+
- Datasets: 4.1.1
|
| 663 |
+
- Tokenizers: 0.20.3
|
| 664 |
+
|
| 665 |
+
## Citation
|
| 666 |
+
|
| 667 |
+
### BibTeX
|
| 668 |
+
|
| 669 |
+
#### Sentence Transformers
|
| 670 |
+
```bibtex
|
| 671 |
+
@inproceedings{reimers-2019-sentence-bert,
|
| 672 |
+
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
|
| 673 |
+
author = "Reimers, Nils and Gurevych, Iryna",
|
| 674 |
+
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
|
| 675 |
+
month = "11",
|
| 676 |
+
year = "2019",
|
| 677 |
+
publisher = "Association for Computational Linguistics",
|
| 678 |
+
url = "https://arxiv.org/abs/1908.10084",
|
| 679 |
+
}
|
| 680 |
+
```
|
| 681 |
+
|
| 682 |
+
<!--
|
| 683 |
+
## Glossary
|
| 684 |
+
|
| 685 |
+
*Clearly define terms in order to be accessible across audiences.*
|
| 686 |
+
-->
|
| 687 |
+
|
| 688 |
+
<!--
|
| 689 |
+
## Model Card Authors
|
| 690 |
+
|
| 691 |
+
*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
|
| 692 |
+
-->
|
| 693 |
+
|
| 694 |
+
<!--
|
| 695 |
+
## Model Card Contact
|
| 696 |
+
|
| 697 |
+
*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
|
| 698 |
+
-->
|
1_Pooling/config.json
ADDED
|
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"word_embedding_dimension": 768,
|
| 3 |
+
"pooling_mode_cls_token": false,
|
| 4 |
+
"pooling_mode_mean_tokens": true,
|
| 5 |
+
"pooling_mode_max_tokens": false,
|
| 6 |
+
"pooling_mode_mean_sqrt_len_tokens": false,
|
| 7 |
+
"pooling_mode_weightedmean_tokens": false,
|
| 8 |
+
"pooling_mode_lasttoken": false,
|
| 9 |
+
"include_prompt": true
|
| 10 |
+
}
|
README.md
ADDED
|
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
base_model: muchad/mdeberta-hybrid-30k
|
| 3 |
+
library_name: sentence-transformers
|
| 4 |
+
pipeline_tag: sentence-similarity
|
| 5 |
+
language:
|
| 6 |
+
- id
|
| 7 |
+
tags:
|
| 8 |
+
- sentence-transformers
|
| 9 |
+
- feature-extraction
|
| 10 |
+
- sentence-similarity
|
| 11 |
+
- transformers
|
| 12 |
+
license: apache-2.0
|
| 13 |
+
---
|
| 14 |
+
|
| 15 |
+
# Embed-ID
|
| 16 |
+
|
| 17 |
+
_Note: This model is part of an ongoing research project._
|
config.json
ADDED
|
@@ -0,0 +1,37 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"_name_or_path": "muchad/mdeberta-hybrid-30k",
|
| 3 |
+
"architectures": [
|
| 4 |
+
"DebertaV2Model"
|
| 5 |
+
],
|
| 6 |
+
"attention_probs_dropout_prob": 0.1,
|
| 7 |
+
"dtype": "float32",
|
| 8 |
+
"hidden_act": "gelu",
|
| 9 |
+
"hidden_dropout_prob": 0.1,
|
| 10 |
+
"hidden_size": 768,
|
| 11 |
+
"initializer_range": 0.02,
|
| 12 |
+
"intermediate_size": 3072,
|
| 13 |
+
"layer_norm_eps": 1e-07,
|
| 14 |
+
"legacy": true,
|
| 15 |
+
"max_position_embeddings": 512,
|
| 16 |
+
"max_relative_positions": -1,
|
| 17 |
+
"model_type": "deberta-v2",
|
| 18 |
+
"norm_rel_ebd": "layer_norm",
|
| 19 |
+
"num_attention_heads": 12,
|
| 20 |
+
"num_hidden_layers": 12,
|
| 21 |
+
"pad_token_id": 0,
|
| 22 |
+
"pooler_dropout": 0,
|
| 23 |
+
"pooler_hidden_act": "gelu",
|
| 24 |
+
"pooler_hidden_size": 768,
|
| 25 |
+
"pos_att_type": [
|
| 26 |
+
"p2c",
|
| 27 |
+
"c2p"
|
| 28 |
+
],
|
| 29 |
+
"position_biased_input": false,
|
| 30 |
+
"position_buckets": 256,
|
| 31 |
+
"relative_attention": true,
|
| 32 |
+
"share_att_key": true,
|
| 33 |
+
"torch_dtype": "float32",
|
| 34 |
+
"transformers_version": "4.46.3",
|
| 35 |
+
"type_vocab_size": 0,
|
| 36 |
+
"vocab_size": 30000
|
| 37 |
+
}
|
config_sentence_transformers.json
ADDED
|
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"model_type": "SentenceTransformer",
|
| 3 |
+
"__version__": {
|
| 4 |
+
"sentence_transformers": "5.1.2",
|
| 5 |
+
"transformers": "4.46.3",
|
| 6 |
+
"pytorch": "2.5.1"
|
| 7 |
+
},
|
| 8 |
+
"prompts": {
|
| 9 |
+
"query": "",
|
| 10 |
+
"document": ""
|
| 11 |
+
},
|
| 12 |
+
"default_prompt_name": null,
|
| 13 |
+
"similarity_fn_name": "cosine"
|
| 14 |
+
}
|
model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:bf8b35022970b62b1a8d6864cd34eef41315c8abae8c584c7ea3483949f631b8
|
| 3 |
+
size 433985616
|
modules.json
ADDED
|
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[
|
| 2 |
+
{
|
| 3 |
+
"idx": 0,
|
| 4 |
+
"name": "0",
|
| 5 |
+
"path": "",
|
| 6 |
+
"type": "sentence_transformers.models.Transformer"
|
| 7 |
+
},
|
| 8 |
+
{
|
| 9 |
+
"idx": 1,
|
| 10 |
+
"name": "1",
|
| 11 |
+
"path": "1_Pooling",
|
| 12 |
+
"type": "sentence_transformers.models.Pooling"
|
| 13 |
+
}
|
| 14 |
+
]
|
sentence_bert_config.json
ADDED
|
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"max_seq_length": 512,
|
| 3 |
+
"do_lower_case": false
|
| 4 |
+
}
|
special_tokens_map.json
ADDED
|
@@ -0,0 +1,51 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"bos_token": {
|
| 3 |
+
"content": "[CLS]",
|
| 4 |
+
"lstrip": false,
|
| 5 |
+
"normalized": false,
|
| 6 |
+
"rstrip": false,
|
| 7 |
+
"single_word": false
|
| 8 |
+
},
|
| 9 |
+
"cls_token": {
|
| 10 |
+
"content": "[CLS]",
|
| 11 |
+
"lstrip": false,
|
| 12 |
+
"normalized": false,
|
| 13 |
+
"rstrip": false,
|
| 14 |
+
"single_word": false
|
| 15 |
+
},
|
| 16 |
+
"eos_token": {
|
| 17 |
+
"content": "[SEP]",
|
| 18 |
+
"lstrip": false,
|
| 19 |
+
"normalized": false,
|
| 20 |
+
"rstrip": false,
|
| 21 |
+
"single_word": false
|
| 22 |
+
},
|
| 23 |
+
"mask_token": {
|
| 24 |
+
"content": "[MASK]",
|
| 25 |
+
"lstrip": false,
|
| 26 |
+
"normalized": false,
|
| 27 |
+
"rstrip": false,
|
| 28 |
+
"single_word": false
|
| 29 |
+
},
|
| 30 |
+
"pad_token": {
|
| 31 |
+
"content": "[PAD]",
|
| 32 |
+
"lstrip": false,
|
| 33 |
+
"normalized": false,
|
| 34 |
+
"rstrip": false,
|
| 35 |
+
"single_word": false
|
| 36 |
+
},
|
| 37 |
+
"sep_token": {
|
| 38 |
+
"content": "[SEP]",
|
| 39 |
+
"lstrip": false,
|
| 40 |
+
"normalized": false,
|
| 41 |
+
"rstrip": false,
|
| 42 |
+
"single_word": false
|
| 43 |
+
},
|
| 44 |
+
"unk_token": {
|
| 45 |
+
"content": "[UNK]",
|
| 46 |
+
"lstrip": false,
|
| 47 |
+
"normalized": true,
|
| 48 |
+
"rstrip": false,
|
| 49 |
+
"single_word": false
|
| 50 |
+
}
|
| 51 |
+
}
|
tokenizer.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
tokenizer_config.json
ADDED
|
@@ -0,0 +1,859 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"added_tokens_decoder": {
|
| 3 |
+
"0": {
|
| 4 |
+
"content": "[PAD]",
|
| 5 |
+
"lstrip": false,
|
| 6 |
+
"normalized": false,
|
| 7 |
+
"rstrip": false,
|
| 8 |
+
"single_word": false,
|
| 9 |
+
"special": true
|
| 10 |
+
},
|
| 11 |
+
"1": {
|
| 12 |
+
"content": "[CLS]",
|
| 13 |
+
"lstrip": false,
|
| 14 |
+
"normalized": false,
|
| 15 |
+
"rstrip": false,
|
| 16 |
+
"single_word": false,
|
| 17 |
+
"special": true
|
| 18 |
+
},
|
| 19 |
+
"2": {
|
| 20 |
+
"content": "[SEP]",
|
| 21 |
+
"lstrip": false,
|
| 22 |
+
"normalized": false,
|
| 23 |
+
"rstrip": false,
|
| 24 |
+
"single_word": false,
|
| 25 |
+
"special": true
|
| 26 |
+
},
|
| 27 |
+
"3": {
|
| 28 |
+
"content": "[UNK]",
|
| 29 |
+
"lstrip": false,
|
| 30 |
+
"normalized": true,
|
| 31 |
+
"rstrip": false,
|
| 32 |
+
"single_word": false,
|
| 33 |
+
"special": true
|
| 34 |
+
},
|
| 35 |
+
"29999": {
|
| 36 |
+
"content": "[MASK]",
|
| 37 |
+
"lstrip": false,
|
| 38 |
+
"normalized": false,
|
| 39 |
+
"rstrip": false,
|
| 40 |
+
"single_word": false,
|
| 41 |
+
"special": true
|
| 42 |
+
},
|
| 43 |
+
"30000": {
|
| 44 |
+
"content": "β<extra_id_99>",
|
| 45 |
+
"lstrip": false,
|
| 46 |
+
"normalized": false,
|
| 47 |
+
"rstrip": false,
|
| 48 |
+
"single_word": false,
|
| 49 |
+
"special": false
|
| 50 |
+
},
|
| 51 |
+
"30001": {
|
| 52 |
+
"content": "β<extra_id_98>",
|
| 53 |
+
"lstrip": false,
|
| 54 |
+
"normalized": false,
|
| 55 |
+
"rstrip": false,
|
| 56 |
+
"single_word": false,
|
| 57 |
+
"special": false
|
| 58 |
+
},
|
| 59 |
+
"30002": {
|
| 60 |
+
"content": "β<extra_id_97>",
|
| 61 |
+
"lstrip": false,
|
| 62 |
+
"normalized": false,
|
| 63 |
+
"rstrip": false,
|
| 64 |
+
"single_word": false,
|
| 65 |
+
"special": false
|
| 66 |
+
},
|
| 67 |
+
"30003": {
|
| 68 |
+
"content": "β<extra_id_96>",
|
| 69 |
+
"lstrip": false,
|
| 70 |
+
"normalized": false,
|
| 71 |
+
"rstrip": false,
|
| 72 |
+
"single_word": false,
|
| 73 |
+
"special": false
|
| 74 |
+
},
|
| 75 |
+
"30004": {
|
| 76 |
+
"content": "β<extra_id_95>",
|
| 77 |
+
"lstrip": false,
|
| 78 |
+
"normalized": false,
|
| 79 |
+
"rstrip": false,
|
| 80 |
+
"single_word": false,
|
| 81 |
+
"special": false
|
| 82 |
+
},
|
| 83 |
+
"30005": {
|
| 84 |
+
"content": "β<extra_id_94>",
|
| 85 |
+
"lstrip": false,
|
| 86 |
+
"normalized": false,
|
| 87 |
+
"rstrip": false,
|
| 88 |
+
"single_word": false,
|
| 89 |
+
"special": false
|
| 90 |
+
},
|
| 91 |
+
"30006": {
|
| 92 |
+
"content": "β<extra_id_93>",
|
| 93 |
+
"lstrip": false,
|
| 94 |
+
"normalized": false,
|
| 95 |
+
"rstrip": false,
|
| 96 |
+
"single_word": false,
|
| 97 |
+
"special": false
|
| 98 |
+
},
|
| 99 |
+
"30007": {
|
| 100 |
+
"content": "β<extra_id_92>",
|
| 101 |
+
"lstrip": false,
|
| 102 |
+
"normalized": false,
|
| 103 |
+
"rstrip": false,
|
| 104 |
+
"single_word": false,
|
| 105 |
+
"special": false
|
| 106 |
+
},
|
| 107 |
+
"30008": {
|
| 108 |
+
"content": "β<extra_id_91>",
|
| 109 |
+
"lstrip": false,
|
| 110 |
+
"normalized": false,
|
| 111 |
+
"rstrip": false,
|
| 112 |
+
"single_word": false,
|
| 113 |
+
"special": false
|
| 114 |
+
},
|
| 115 |
+
"30009": {
|
| 116 |
+
"content": "β<extra_id_90>",
|
| 117 |
+
"lstrip": false,
|
| 118 |
+
"normalized": false,
|
| 119 |
+
"rstrip": false,
|
| 120 |
+
"single_word": false,
|
| 121 |
+
"special": false
|
| 122 |
+
},
|
| 123 |
+
"30010": {
|
| 124 |
+
"content": "β<extra_id_89>",
|
| 125 |
+
"lstrip": false,
|
| 126 |
+
"normalized": false,
|
| 127 |
+
"rstrip": false,
|
| 128 |
+
"single_word": false,
|
| 129 |
+
"special": false
|
| 130 |
+
},
|
| 131 |
+
"30011": {
|
| 132 |
+
"content": "β<extra_id_88>",
|
| 133 |
+
"lstrip": false,
|
| 134 |
+
"normalized": false,
|
| 135 |
+
"rstrip": false,
|
| 136 |
+
"single_word": false,
|
| 137 |
+
"special": false
|
| 138 |
+
},
|
| 139 |
+
"30012": {
|
| 140 |
+
"content": "β<extra_id_87>",
|
| 141 |
+
"lstrip": false,
|
| 142 |
+
"normalized": false,
|
| 143 |
+
"rstrip": false,
|
| 144 |
+
"single_word": false,
|
| 145 |
+
"special": false
|
| 146 |
+
},
|
| 147 |
+
"30013": {
|
| 148 |
+
"content": "β<extra_id_86>",
|
| 149 |
+
"lstrip": false,
|
| 150 |
+
"normalized": false,
|
| 151 |
+
"rstrip": false,
|
| 152 |
+
"single_word": false,
|
| 153 |
+
"special": false
|
| 154 |
+
},
|
| 155 |
+
"30014": {
|
| 156 |
+
"content": "β<extra_id_85>",
|
| 157 |
+
"lstrip": false,
|
| 158 |
+
"normalized": false,
|
| 159 |
+
"rstrip": false,
|
| 160 |
+
"single_word": false,
|
| 161 |
+
"special": false
|
| 162 |
+
},
|
| 163 |
+
"30015": {
|
| 164 |
+
"content": "β<extra_id_84>",
|
| 165 |
+
"lstrip": false,
|
| 166 |
+
"normalized": false,
|
| 167 |
+
"rstrip": false,
|
| 168 |
+
"single_word": false,
|
| 169 |
+
"special": false
|
| 170 |
+
},
|
| 171 |
+
"30016": {
|
| 172 |
+
"content": "β<extra_id_83>",
|
| 173 |
+
"lstrip": false,
|
| 174 |
+
"normalized": false,
|
| 175 |
+
"rstrip": false,
|
| 176 |
+
"single_word": false,
|
| 177 |
+
"special": false
|
| 178 |
+
},
|
| 179 |
+
"30017": {
|
| 180 |
+
"content": "β<extra_id_82>",
|
| 181 |
+
"lstrip": false,
|
| 182 |
+
"normalized": false,
|
| 183 |
+
"rstrip": false,
|
| 184 |
+
"single_word": false,
|
| 185 |
+
"special": false
|
| 186 |
+
},
|
| 187 |
+
"30018": {
|
| 188 |
+
"content": "β<extra_id_81>",
|
| 189 |
+
"lstrip": false,
|
| 190 |
+
"normalized": false,
|
| 191 |
+
"rstrip": false,
|
| 192 |
+
"single_word": false,
|
| 193 |
+
"special": false
|
| 194 |
+
},
|
| 195 |
+
"30019": {
|
| 196 |
+
"content": "β<extra_id_80>",
|
| 197 |
+
"lstrip": false,
|
| 198 |
+
"normalized": false,
|
| 199 |
+
"rstrip": false,
|
| 200 |
+
"single_word": false,
|
| 201 |
+
"special": false
|
| 202 |
+
},
|
| 203 |
+
"30020": {
|
| 204 |
+
"content": "β<extra_id_79>",
|
| 205 |
+
"lstrip": false,
|
| 206 |
+
"normalized": false,
|
| 207 |
+
"rstrip": false,
|
| 208 |
+
"single_word": false,
|
| 209 |
+
"special": false
|
| 210 |
+
},
|
| 211 |
+
"30021": {
|
| 212 |
+
"content": "β<extra_id_78>",
|
| 213 |
+
"lstrip": false,
|
| 214 |
+
"normalized": false,
|
| 215 |
+
"rstrip": false,
|
| 216 |
+
"single_word": false,
|
| 217 |
+
"special": false
|
| 218 |
+
},
|
| 219 |
+
"30022": {
|
| 220 |
+
"content": "β<extra_id_77>",
|
| 221 |
+
"lstrip": false,
|
| 222 |
+
"normalized": false,
|
| 223 |
+
"rstrip": false,
|
| 224 |
+
"single_word": false,
|
| 225 |
+
"special": false
|
| 226 |
+
},
|
| 227 |
+
"30023": {
|
| 228 |
+
"content": "β<extra_id_76>",
|
| 229 |
+
"lstrip": false,
|
| 230 |
+
"normalized": false,
|
| 231 |
+
"rstrip": false,
|
| 232 |
+
"single_word": false,
|
| 233 |
+
"special": false
|
| 234 |
+
},
|
| 235 |
+
"30024": {
|
| 236 |
+
"content": "β<extra_id_75>",
|
| 237 |
+
"lstrip": false,
|
| 238 |
+
"normalized": false,
|
| 239 |
+
"rstrip": false,
|
| 240 |
+
"single_word": false,
|
| 241 |
+
"special": false
|
| 242 |
+
},
|
| 243 |
+
"30025": {
|
| 244 |
+
"content": "β<extra_id_74>",
|
| 245 |
+
"lstrip": false,
|
| 246 |
+
"normalized": false,
|
| 247 |
+
"rstrip": false,
|
| 248 |
+
"single_word": false,
|
| 249 |
+
"special": false
|
| 250 |
+
},
|
| 251 |
+
"30026": {
|
| 252 |
+
"content": "β<extra_id_73>",
|
| 253 |
+
"lstrip": false,
|
| 254 |
+
"normalized": false,
|
| 255 |
+
"rstrip": false,
|
| 256 |
+
"single_word": false,
|
| 257 |
+
"special": false
|
| 258 |
+
},
|
| 259 |
+
"30027": {
|
| 260 |
+
"content": "β<extra_id_72>",
|
| 261 |
+
"lstrip": false,
|
| 262 |
+
"normalized": false,
|
| 263 |
+
"rstrip": false,
|
| 264 |
+
"single_word": false,
|
| 265 |
+
"special": false
|
| 266 |
+
},
|
| 267 |
+
"30028": {
|
| 268 |
+
"content": "β<extra_id_71>",
|
| 269 |
+
"lstrip": false,
|
| 270 |
+
"normalized": false,
|
| 271 |
+
"rstrip": false,
|
| 272 |
+
"single_word": false,
|
| 273 |
+
"special": false
|
| 274 |
+
},
|
| 275 |
+
"30029": {
|
| 276 |
+
"content": "β<extra_id_70>",
|
| 277 |
+
"lstrip": false,
|
| 278 |
+
"normalized": false,
|
| 279 |
+
"rstrip": false,
|
| 280 |
+
"single_word": false,
|
| 281 |
+
"special": false
|
| 282 |
+
},
|
| 283 |
+
"30030": {
|
| 284 |
+
"content": "β<extra_id_69>",
|
| 285 |
+
"lstrip": false,
|
| 286 |
+
"normalized": false,
|
| 287 |
+
"rstrip": false,
|
| 288 |
+
"single_word": false,
|
| 289 |
+
"special": false
|
| 290 |
+
},
|
| 291 |
+
"30031": {
|
| 292 |
+
"content": "β<extra_id_68>",
|
| 293 |
+
"lstrip": false,
|
| 294 |
+
"normalized": false,
|
| 295 |
+
"rstrip": false,
|
| 296 |
+
"single_word": false,
|
| 297 |
+
"special": false
|
| 298 |
+
},
|
| 299 |
+
"30032": {
|
| 300 |
+
"content": "β<extra_id_67>",
|
| 301 |
+
"lstrip": false,
|
| 302 |
+
"normalized": false,
|
| 303 |
+
"rstrip": false,
|
| 304 |
+
"single_word": false,
|
| 305 |
+
"special": false
|
| 306 |
+
},
|
| 307 |
+
"30033": {
|
| 308 |
+
"content": "β<extra_id_66>",
|
| 309 |
+
"lstrip": false,
|
| 310 |
+
"normalized": false,
|
| 311 |
+
"rstrip": false,
|
| 312 |
+
"single_word": false,
|
| 313 |
+
"special": false
|
| 314 |
+
},
|
| 315 |
+
"30034": {
|
| 316 |
+
"content": "β<extra_id_65>",
|
| 317 |
+
"lstrip": false,
|
| 318 |
+
"normalized": false,
|
| 319 |
+
"rstrip": false,
|
| 320 |
+
"single_word": false,
|
| 321 |
+
"special": false
|
| 322 |
+
},
|
| 323 |
+
"30035": {
|
| 324 |
+
"content": "β<extra_id_64>",
|
| 325 |
+
"lstrip": false,
|
| 326 |
+
"normalized": false,
|
| 327 |
+
"rstrip": false,
|
| 328 |
+
"single_word": false,
|
| 329 |
+
"special": false
|
| 330 |
+
},
|
| 331 |
+
"30036": {
|
| 332 |
+
"content": "β<extra_id_63>",
|
| 333 |
+
"lstrip": false,
|
| 334 |
+
"normalized": false,
|
| 335 |
+
"rstrip": false,
|
| 336 |
+
"single_word": false,
|
| 337 |
+
"special": false
|
| 338 |
+
},
|
| 339 |
+
"30037": {
|
| 340 |
+
"content": "β<extra_id_62>",
|
| 341 |
+
"lstrip": false,
|
| 342 |
+
"normalized": false,
|
| 343 |
+
"rstrip": false,
|
| 344 |
+
"single_word": false,
|
| 345 |
+
"special": false
|
| 346 |
+
},
|
| 347 |
+
"30038": {
|
| 348 |
+
"content": "β<extra_id_61>",
|
| 349 |
+
"lstrip": false,
|
| 350 |
+
"normalized": false,
|
| 351 |
+
"rstrip": false,
|
| 352 |
+
"single_word": false,
|
| 353 |
+
"special": false
|
| 354 |
+
},
|
| 355 |
+
"30039": {
|
| 356 |
+
"content": "β<extra_id_60>",
|
| 357 |
+
"lstrip": false,
|
| 358 |
+
"normalized": false,
|
| 359 |
+
"rstrip": false,
|
| 360 |
+
"single_word": false,
|
| 361 |
+
"special": false
|
| 362 |
+
},
|
| 363 |
+
"30040": {
|
| 364 |
+
"content": "β<extra_id_59>",
|
| 365 |
+
"lstrip": false,
|
| 366 |
+
"normalized": false,
|
| 367 |
+
"rstrip": false,
|
| 368 |
+
"single_word": false,
|
| 369 |
+
"special": false
|
| 370 |
+
},
|
| 371 |
+
"30041": {
|
| 372 |
+
"content": "β<extra_id_58>",
|
| 373 |
+
"lstrip": false,
|
| 374 |
+
"normalized": false,
|
| 375 |
+
"rstrip": false,
|
| 376 |
+
"single_word": false,
|
| 377 |
+
"special": false
|
| 378 |
+
},
|
| 379 |
+
"30042": {
|
| 380 |
+
"content": "β<extra_id_57>",
|
| 381 |
+
"lstrip": false,
|
| 382 |
+
"normalized": false,
|
| 383 |
+
"rstrip": false,
|
| 384 |
+
"single_word": false,
|
| 385 |
+
"special": false
|
| 386 |
+
},
|
| 387 |
+
"30043": {
|
| 388 |
+
"content": "β<extra_id_56>",
|
| 389 |
+
"lstrip": false,
|
| 390 |
+
"normalized": false,
|
| 391 |
+
"rstrip": false,
|
| 392 |
+
"single_word": false,
|
| 393 |
+
"special": false
|
| 394 |
+
},
|
| 395 |
+
"30044": {
|
| 396 |
+
"content": "β<extra_id_55>",
|
| 397 |
+
"lstrip": false,
|
| 398 |
+
"normalized": false,
|
| 399 |
+
"rstrip": false,
|
| 400 |
+
"single_word": false,
|
| 401 |
+
"special": false
|
| 402 |
+
},
|
| 403 |
+
"30045": {
|
| 404 |
+
"content": "β<extra_id_54>",
|
| 405 |
+
"lstrip": false,
|
| 406 |
+
"normalized": false,
|
| 407 |
+
"rstrip": false,
|
| 408 |
+
"single_word": false,
|
| 409 |
+
"special": false
|
| 410 |
+
},
|
| 411 |
+
"30046": {
|
| 412 |
+
"content": "β<extra_id_53>",
|
| 413 |
+
"lstrip": false,
|
| 414 |
+
"normalized": false,
|
| 415 |
+
"rstrip": false,
|
| 416 |
+
"single_word": false,
|
| 417 |
+
"special": false
|
| 418 |
+
},
|
| 419 |
+
"30047": {
|
| 420 |
+
"content": "β<extra_id_52>",
|
| 421 |
+
"lstrip": false,
|
| 422 |
+
"normalized": false,
|
| 423 |
+
"rstrip": false,
|
| 424 |
+
"single_word": false,
|
| 425 |
+
"special": false
|
| 426 |
+
},
|
| 427 |
+
"30048": {
|
| 428 |
+
"content": "β<extra_id_51>",
|
| 429 |
+
"lstrip": false,
|
| 430 |
+
"normalized": false,
|
| 431 |
+
"rstrip": false,
|
| 432 |
+
"single_word": false,
|
| 433 |
+
"special": false
|
| 434 |
+
},
|
| 435 |
+
"30049": {
|
| 436 |
+
"content": "β<extra_id_50>",
|
| 437 |
+
"lstrip": false,
|
| 438 |
+
"normalized": false,
|
| 439 |
+
"rstrip": false,
|
| 440 |
+
"single_word": false,
|
| 441 |
+
"special": false
|
| 442 |
+
},
|
| 443 |
+
"30050": {
|
| 444 |
+
"content": "β<extra_id_49>",
|
| 445 |
+
"lstrip": false,
|
| 446 |
+
"normalized": false,
|
| 447 |
+
"rstrip": false,
|
| 448 |
+
"single_word": false,
|
| 449 |
+
"special": false
|
| 450 |
+
},
|
| 451 |
+
"30051": {
|
| 452 |
+
"content": "β<extra_id_48>",
|
| 453 |
+
"lstrip": false,
|
| 454 |
+
"normalized": false,
|
| 455 |
+
"rstrip": false,
|
| 456 |
+
"single_word": false,
|
| 457 |
+
"special": false
|
| 458 |
+
},
|
| 459 |
+
"30052": {
|
| 460 |
+
"content": "β<extra_id_47>",
|
| 461 |
+
"lstrip": false,
|
| 462 |
+
"normalized": false,
|
| 463 |
+
"rstrip": false,
|
| 464 |
+
"single_word": false,
|
| 465 |
+
"special": false
|
| 466 |
+
},
|
| 467 |
+
"30053": {
|
| 468 |
+
"content": "β<extra_id_46>",
|
| 469 |
+
"lstrip": false,
|
| 470 |
+
"normalized": false,
|
| 471 |
+
"rstrip": false,
|
| 472 |
+
"single_word": false,
|
| 473 |
+
"special": false
|
| 474 |
+
},
|
| 475 |
+
"30054": {
|
| 476 |
+
"content": "β<extra_id_45>",
|
| 477 |
+
"lstrip": false,
|
| 478 |
+
"normalized": false,
|
| 479 |
+
"rstrip": false,
|
| 480 |
+
"single_word": false,
|
| 481 |
+
"special": false
|
| 482 |
+
},
|
| 483 |
+
"30055": {
|
| 484 |
+
"content": "β<extra_id_44>",
|
| 485 |
+
"lstrip": false,
|
| 486 |
+
"normalized": false,
|
| 487 |
+
"rstrip": false,
|
| 488 |
+
"single_word": false,
|
| 489 |
+
"special": false
|
| 490 |
+
},
|
| 491 |
+
"30056": {
|
| 492 |
+
"content": "β<extra_id_43>",
|
| 493 |
+
"lstrip": false,
|
| 494 |
+
"normalized": false,
|
| 495 |
+
"rstrip": false,
|
| 496 |
+
"single_word": false,
|
| 497 |
+
"special": false
|
| 498 |
+
},
|
| 499 |
+
"30057": {
|
| 500 |
+
"content": "β<extra_id_42>",
|
| 501 |
+
"lstrip": false,
|
| 502 |
+
"normalized": false,
|
| 503 |
+
"rstrip": false,
|
| 504 |
+
"single_word": false,
|
| 505 |
+
"special": false
|
| 506 |
+
},
|
| 507 |
+
"30058": {
|
| 508 |
+
"content": "β<extra_id_41>",
|
| 509 |
+
"lstrip": false,
|
| 510 |
+
"normalized": false,
|
| 511 |
+
"rstrip": false,
|
| 512 |
+
"single_word": false,
|
| 513 |
+
"special": false
|
| 514 |
+
},
|
| 515 |
+
"30059": {
|
| 516 |
+
"content": "β<extra_id_40>",
|
| 517 |
+
"lstrip": false,
|
| 518 |
+
"normalized": false,
|
| 519 |
+
"rstrip": false,
|
| 520 |
+
"single_word": false,
|
| 521 |
+
"special": false
|
| 522 |
+
},
|
| 523 |
+
"30060": {
|
| 524 |
+
"content": "β<extra_id_39>",
|
| 525 |
+
"lstrip": false,
|
| 526 |
+
"normalized": false,
|
| 527 |
+
"rstrip": false,
|
| 528 |
+
"single_word": false,
|
| 529 |
+
"special": false
|
| 530 |
+
},
|
| 531 |
+
"30061": {
|
| 532 |
+
"content": "β<extra_id_38>",
|
| 533 |
+
"lstrip": false,
|
| 534 |
+
"normalized": false,
|
| 535 |
+
"rstrip": false,
|
| 536 |
+
"single_word": false,
|
| 537 |
+
"special": false
|
| 538 |
+
},
|
| 539 |
+
"30062": {
|
| 540 |
+
"content": "β<extra_id_37>",
|
| 541 |
+
"lstrip": false,
|
| 542 |
+
"normalized": false,
|
| 543 |
+
"rstrip": false,
|
| 544 |
+
"single_word": false,
|
| 545 |
+
"special": false
|
| 546 |
+
},
|
| 547 |
+
"30063": {
|
| 548 |
+
"content": "β<extra_id_36>",
|
| 549 |
+
"lstrip": false,
|
| 550 |
+
"normalized": false,
|
| 551 |
+
"rstrip": false,
|
| 552 |
+
"single_word": false,
|
| 553 |
+
"special": false
|
| 554 |
+
},
|
| 555 |
+
"30064": {
|
| 556 |
+
"content": "β<extra_id_35>",
|
| 557 |
+
"lstrip": false,
|
| 558 |
+
"normalized": false,
|
| 559 |
+
"rstrip": false,
|
| 560 |
+
"single_word": false,
|
| 561 |
+
"special": false
|
| 562 |
+
},
|
| 563 |
+
"30065": {
|
| 564 |
+
"content": "β<extra_id_34>",
|
| 565 |
+
"lstrip": false,
|
| 566 |
+
"normalized": false,
|
| 567 |
+
"rstrip": false,
|
| 568 |
+
"single_word": false,
|
| 569 |
+
"special": false
|
| 570 |
+
},
|
| 571 |
+
"30066": {
|
| 572 |
+
"content": "β<extra_id_33>",
|
| 573 |
+
"lstrip": false,
|
| 574 |
+
"normalized": false,
|
| 575 |
+
"rstrip": false,
|
| 576 |
+
"single_word": false,
|
| 577 |
+
"special": false
|
| 578 |
+
},
|
| 579 |
+
"30067": {
|
| 580 |
+
"content": "β<extra_id_32>",
|
| 581 |
+
"lstrip": false,
|
| 582 |
+
"normalized": false,
|
| 583 |
+
"rstrip": false,
|
| 584 |
+
"single_word": false,
|
| 585 |
+
"special": false
|
| 586 |
+
},
|
| 587 |
+
"30068": {
|
| 588 |
+
"content": "β<extra_id_31>",
|
| 589 |
+
"lstrip": false,
|
| 590 |
+
"normalized": false,
|
| 591 |
+
"rstrip": false,
|
| 592 |
+
"single_word": false,
|
| 593 |
+
"special": false
|
| 594 |
+
},
|
| 595 |
+
"30069": {
|
| 596 |
+
"content": "β<extra_id_30>",
|
| 597 |
+
"lstrip": false,
|
| 598 |
+
"normalized": false,
|
| 599 |
+
"rstrip": false,
|
| 600 |
+
"single_word": false,
|
| 601 |
+
"special": false
|
| 602 |
+
},
|
| 603 |
+
"30070": {
|
| 604 |
+
"content": "β<extra_id_29>",
|
| 605 |
+
"lstrip": false,
|
| 606 |
+
"normalized": false,
|
| 607 |
+
"rstrip": false,
|
| 608 |
+
"single_word": false,
|
| 609 |
+
"special": false
|
| 610 |
+
},
|
| 611 |
+
"30071": {
|
| 612 |
+
"content": "β<extra_id_28>",
|
| 613 |
+
"lstrip": false,
|
| 614 |
+
"normalized": false,
|
| 615 |
+
"rstrip": false,
|
| 616 |
+
"single_word": false,
|
| 617 |
+
"special": false
|
| 618 |
+
},
|
| 619 |
+
"30072": {
|
| 620 |
+
"content": "β<extra_id_27>",
|
| 621 |
+
"lstrip": false,
|
| 622 |
+
"normalized": false,
|
| 623 |
+
"rstrip": false,
|
| 624 |
+
"single_word": false,
|
| 625 |
+
"special": false
|
| 626 |
+
},
|
| 627 |
+
"30073": {
|
| 628 |
+
"content": "β<extra_id_26>",
|
| 629 |
+
"lstrip": false,
|
| 630 |
+
"normalized": false,
|
| 631 |
+
"rstrip": false,
|
| 632 |
+
"single_word": false,
|
| 633 |
+
"special": false
|
| 634 |
+
},
|
| 635 |
+
"30074": {
|
| 636 |
+
"content": "β<extra_id_25>",
|
| 637 |
+
"lstrip": false,
|
| 638 |
+
"normalized": false,
|
| 639 |
+
"rstrip": false,
|
| 640 |
+
"single_word": false,
|
| 641 |
+
"special": false
|
| 642 |
+
},
|
| 643 |
+
"30075": {
|
| 644 |
+
"content": "β<extra_id_24>",
|
| 645 |
+
"lstrip": false,
|
| 646 |
+
"normalized": false,
|
| 647 |
+
"rstrip": false,
|
| 648 |
+
"single_word": false,
|
| 649 |
+
"special": false
|
| 650 |
+
},
|
| 651 |
+
"30076": {
|
| 652 |
+
"content": "β<extra_id_23>",
|
| 653 |
+
"lstrip": false,
|
| 654 |
+
"normalized": false,
|
| 655 |
+
"rstrip": false,
|
| 656 |
+
"single_word": false,
|
| 657 |
+
"special": false
|
| 658 |
+
},
|
| 659 |
+
"30077": {
|
| 660 |
+
"content": "β<extra_id_22>",
|
| 661 |
+
"lstrip": false,
|
| 662 |
+
"normalized": false,
|
| 663 |
+
"rstrip": false,
|
| 664 |
+
"single_word": false,
|
| 665 |
+
"special": false
|
| 666 |
+
},
|
| 667 |
+
"30078": {
|
| 668 |
+
"content": "β<extra_id_21>",
|
| 669 |
+
"lstrip": false,
|
| 670 |
+
"normalized": false,
|
| 671 |
+
"rstrip": false,
|
| 672 |
+
"single_word": false,
|
| 673 |
+
"special": false
|
| 674 |
+
},
|
| 675 |
+
"30079": {
|
| 676 |
+
"content": "β<extra_id_20>",
|
| 677 |
+
"lstrip": false,
|
| 678 |
+
"normalized": false,
|
| 679 |
+
"rstrip": false,
|
| 680 |
+
"single_word": false,
|
| 681 |
+
"special": false
|
| 682 |
+
},
|
| 683 |
+
"30080": {
|
| 684 |
+
"content": "β<extra_id_19>",
|
| 685 |
+
"lstrip": false,
|
| 686 |
+
"normalized": false,
|
| 687 |
+
"rstrip": false,
|
| 688 |
+
"single_word": false,
|
| 689 |
+
"special": false
|
| 690 |
+
},
|
| 691 |
+
"30081": {
|
| 692 |
+
"content": "β<extra_id_18>",
|
| 693 |
+
"lstrip": false,
|
| 694 |
+
"normalized": false,
|
| 695 |
+
"rstrip": false,
|
| 696 |
+
"single_word": false,
|
| 697 |
+
"special": false
|
| 698 |
+
},
|
| 699 |
+
"30082": {
|
| 700 |
+
"content": "β<extra_id_17>",
|
| 701 |
+
"lstrip": false,
|
| 702 |
+
"normalized": false,
|
| 703 |
+
"rstrip": false,
|
| 704 |
+
"single_word": false,
|
| 705 |
+
"special": false
|
| 706 |
+
},
|
| 707 |
+
"30083": {
|
| 708 |
+
"content": "β<extra_id_16>",
|
| 709 |
+
"lstrip": false,
|
| 710 |
+
"normalized": false,
|
| 711 |
+
"rstrip": false,
|
| 712 |
+
"single_word": false,
|
| 713 |
+
"special": false
|
| 714 |
+
},
|
| 715 |
+
"30084": {
|
| 716 |
+
"content": "β<extra_id_15>",
|
| 717 |
+
"lstrip": false,
|
| 718 |
+
"normalized": false,
|
| 719 |
+
"rstrip": false,
|
| 720 |
+
"single_word": false,
|
| 721 |
+
"special": false
|
| 722 |
+
},
|
| 723 |
+
"30085": {
|
| 724 |
+
"content": "β<extra_id_14>",
|
| 725 |
+
"lstrip": false,
|
| 726 |
+
"normalized": false,
|
| 727 |
+
"rstrip": false,
|
| 728 |
+
"single_word": false,
|
| 729 |
+
"special": false
|
| 730 |
+
},
|
| 731 |
+
"30086": {
|
| 732 |
+
"content": "β<extra_id_13>",
|
| 733 |
+
"lstrip": false,
|
| 734 |
+
"normalized": false,
|
| 735 |
+
"rstrip": false,
|
| 736 |
+
"single_word": false,
|
| 737 |
+
"special": false
|
| 738 |
+
},
|
| 739 |
+
"30087": {
|
| 740 |
+
"content": "β<extra_id_12>",
|
| 741 |
+
"lstrip": false,
|
| 742 |
+
"normalized": false,
|
| 743 |
+
"rstrip": false,
|
| 744 |
+
"single_word": false,
|
| 745 |
+
"special": false
|
| 746 |
+
},
|
| 747 |
+
"30088": {
|
| 748 |
+
"content": "β<extra_id_11>",
|
| 749 |
+
"lstrip": false,
|
| 750 |
+
"normalized": false,
|
| 751 |
+
"rstrip": false,
|
| 752 |
+
"single_word": false,
|
| 753 |
+
"special": false
|
| 754 |
+
},
|
| 755 |
+
"30089": {
|
| 756 |
+
"content": "β<extra_id_10>",
|
| 757 |
+
"lstrip": false,
|
| 758 |
+
"normalized": false,
|
| 759 |
+
"rstrip": false,
|
| 760 |
+
"single_word": false,
|
| 761 |
+
"special": false
|
| 762 |
+
},
|
| 763 |
+
"30090": {
|
| 764 |
+
"content": "β<extra_id_9>",
|
| 765 |
+
"lstrip": false,
|
| 766 |
+
"normalized": false,
|
| 767 |
+
"rstrip": false,
|
| 768 |
+
"single_word": false,
|
| 769 |
+
"special": false
|
| 770 |
+
},
|
| 771 |
+
"30091": {
|
| 772 |
+
"content": "β<extra_id_8>",
|
| 773 |
+
"lstrip": false,
|
| 774 |
+
"normalized": false,
|
| 775 |
+
"rstrip": false,
|
| 776 |
+
"single_word": false,
|
| 777 |
+
"special": false
|
| 778 |
+
},
|
| 779 |
+
"30092": {
|
| 780 |
+
"content": "β<extra_id_7>",
|
| 781 |
+
"lstrip": false,
|
| 782 |
+
"normalized": false,
|
| 783 |
+
"rstrip": false,
|
| 784 |
+
"single_word": false,
|
| 785 |
+
"special": false
|
| 786 |
+
},
|
| 787 |
+
"30093": {
|
| 788 |
+
"content": "β<extra_id_6>",
|
| 789 |
+
"lstrip": false,
|
| 790 |
+
"normalized": false,
|
| 791 |
+
"rstrip": false,
|
| 792 |
+
"single_word": false,
|
| 793 |
+
"special": false
|
| 794 |
+
},
|
| 795 |
+
"30094": {
|
| 796 |
+
"content": "β<extra_id_5>",
|
| 797 |
+
"lstrip": false,
|
| 798 |
+
"normalized": false,
|
| 799 |
+
"rstrip": false,
|
| 800 |
+
"single_word": false,
|
| 801 |
+
"special": false
|
| 802 |
+
},
|
| 803 |
+
"30095": {
|
| 804 |
+
"content": "β<extra_id_4>",
|
| 805 |
+
"lstrip": false,
|
| 806 |
+
"normalized": false,
|
| 807 |
+
"rstrip": false,
|
| 808 |
+
"single_word": false,
|
| 809 |
+
"special": false
|
| 810 |
+
},
|
| 811 |
+
"30096": {
|
| 812 |
+
"content": "β<extra_id_3>",
|
| 813 |
+
"lstrip": false,
|
| 814 |
+
"normalized": false,
|
| 815 |
+
"rstrip": false,
|
| 816 |
+
"single_word": false,
|
| 817 |
+
"special": false
|
| 818 |
+
},
|
| 819 |
+
"30097": {
|
| 820 |
+
"content": "β<extra_id_2>",
|
| 821 |
+
"lstrip": false,
|
| 822 |
+
"normalized": false,
|
| 823 |
+
"rstrip": false,
|
| 824 |
+
"single_word": false,
|
| 825 |
+
"special": false
|
| 826 |
+
},
|
| 827 |
+
"30098": {
|
| 828 |
+
"content": "β<extra_id_1>",
|
| 829 |
+
"lstrip": false,
|
| 830 |
+
"normalized": false,
|
| 831 |
+
"rstrip": false,
|
| 832 |
+
"single_word": false,
|
| 833 |
+
"special": false
|
| 834 |
+
},
|
| 835 |
+
"30099": {
|
| 836 |
+
"content": "β<extra_id_0>",
|
| 837 |
+
"lstrip": false,
|
| 838 |
+
"normalized": false,
|
| 839 |
+
"rstrip": false,
|
| 840 |
+
"single_word": false,
|
| 841 |
+
"special": false
|
| 842 |
+
}
|
| 843 |
+
},
|
| 844 |
+
"bos_token": "[CLS]",
|
| 845 |
+
"clean_up_tokenization_spaces": false,
|
| 846 |
+
"cls_token": "[CLS]",
|
| 847 |
+
"do_lower_case": false,
|
| 848 |
+
"eos_token": "[SEP]",
|
| 849 |
+
"extra_special_tokens": {},
|
| 850 |
+
"mask_token": "[MASK]",
|
| 851 |
+
"model_max_length": 512,
|
| 852 |
+
"pad_token": "[PAD]",
|
| 853 |
+
"sep_token": "[SEP]",
|
| 854 |
+
"sp_model_kwargs": {},
|
| 855 |
+
"split_by_punct": false,
|
| 856 |
+
"tokenizer_class": "DebertaV2Tokenizer",
|
| 857 |
+
"unk_token": "[UNK]",
|
| 858 |
+
"vocab_type": "spm"
|
| 859 |
+
}
|