Spaces:
Paused
Paused
| """ | |
| CrossEncoder ๆๆกฃๅค็่ฏฆ่งฃ | |
| ่งฃ็ญ๏ผDocument ๆฏไฝไธบๆดไฝ่ฟๆฏๆๅๆ sentences๏ผ | |
| """ | |
| print("=" * 80) | |
| print("CrossEncoder ๅฆไฝๅค็ Document๏ผ") | |
| print("=" * 80) | |
| # ============================================================================ | |
| # Part 1: Document ็ๅฎ้ ๅค็ๆนๅผ | |
| # ============================================================================ | |
| print("\n" + "=" * 80) | |
| print("๐ Part 1: Document ็ๅฎ้ ๅค็ๆนๅผ") | |
| print("=" * 80) | |
| query = "ไปไนๆฏไบบๅทฅๆบ่ฝ๏ผ" | |
| document = """ไบบๅทฅๆบ่ฝๆฏ่ฎก็ฎๆบ็งๅญฆ็ไธไธชๅๆฏใๅฎ่ดๅไบๅๅปบๆบ่ฝ็ณป็ปใ | |
| ่ฟไบ็ณป็ปๅฏไปฅๆง่ก้่ฆไบบ็ฑปๆบ่ฝ็ไปปๅกใไบบๅทฅๆบ่ฝๅ ๆฌๆบๅจๅญฆไน ็ญๅญ้ขๅใ""" | |
| print(f"\nๅๅง่พๅ ฅ๏ผ") | |
| print(f"Query: {query}") | |
| print(f"\nDocument (ๅ ๅซๅคไธชๅฅๅญ):") | |
| print(f"{document}") | |
| print("\n" + "-" * 80) | |
| print("ๅ ณ้ฎ้ฎ้ข๏ผDocument ๆๅคไธชๅฅๅญ๏ผCrossEncoder ๅฆไฝๅค็๏ผ") | |
| print("-" * 80) | |
| print(""" | |
| ็ญๆก๏ผCrossEncoder ๆๆดไธช Document ไฝไธบไธไธชๆดไฝๅค็๏ผ | |
| ๅ ทไฝ่ฟ็จ๏ผ | |
| 1. ่พๅ ฅๆผๆฅ๏ผ[CLS] Query [SEP] Document [SEP] | |
| โโ Document ็ๆๆๅฅๅญ้ฝๆผๆฅๅจไธ่ตท | |
| 2. ๅ่ฏ๏ผๆดไธชๅบๅ่ขซๅๅๆ tokens | |
| โโ ไธๆฏๆๅฅๅญๅ๏ผ่ๆฏๆดไธช Document ไธ่ตทๅ่ฏ | |
| 3. ็ๆ embeddings๏ผ | |
| โโ ๆฏไธช token ไธไธชๅ้๏ผไธๆฏๆฏไธชๅฅๅญไธไธชๅ้๏ผ๏ผ | |
| โโ Document ๅฏ่ฝๆ 100 ไธช tokens = 100 ไธชๅ้ | |
| """) | |
| # ============================================================================ | |
| # Part 2: ่ฏฆ็ป็ Token ็บงๅซๅค็ | |
| # ============================================================================ | |
| print("\n" + "=" * 80) | |
| print("๐ค Part 2: Token ็บงๅซ็ๅค็๏ผๅฎ้ ๅ็็ไบๆ ๏ผ") | |
| print("=" * 80) | |
| # ๆจกๆ็ๅฎ็ๅค็่ฟ็จ | |
| concatenated = f"[CLS] {query} [SEP] {document} [SEP]" | |
| print(f"\nๆญฅ้ชค1๏ผๆผๆฅๆๅไธๅบๅ") | |
| print(f"{'โ' * 40}") | |
| print(f"{concatenated[:100]}...") | |
| # ็ฎๅ็ๅ่ฏ๏ผๅฎ้ BERT tokenizer ไผ็จ WordPiece๏ผ | |
| def tokenize_chinese(text): | |
| """็ฎๅ็ไธญๆๅ่ฏ""" | |
| tokens = [] | |
| i = 0 | |
| while i < len(text): | |
| if text[i:i+5] == '[CLS]': | |
| tokens.append('[CLS]') | |
| i += 5 | |
| elif text[i:i+5] == '[SEP]': | |
| tokens.append('[SEP]') | |
| i += 5 | |
| elif text[i] == ' ': | |
| i += 1 | |
| continue | |
| else: | |
| tokens.append(text[i]) | |
| i += 1 | |
| return tokens | |
| tokens = tokenize_chinese(concatenated) | |
| print(f"\nๆญฅ้ชค2๏ผๅ่ฏ๏ผๆฏไธชๅญ/่ฏๅๆ token๏ผ") | |
| print(f"{'โ' * 40}") | |
| print(f"ๆปๅ ฑ {len(tokens)} ไธช tokens") | |
| print(f"ๅ 30 ไธช tokens: {tokens[:30]}") | |
| print(f"\nๆญฅ้ชค3๏ผๆฏไธช token ็ๆไธไธชๅ้") | |
| print(f"{'โ' * 40}") | |
| print(f""" | |
| Token ๅบๅ (้ฟๅบฆ={len(tokens)}): | |
| tokens[0] = '[CLS]' โ embedding[0] (768็ปดๅ้) | |
| tokens[1] = 'ไป' โ embedding[1] (768็ปดๅ้) | |
| tokens[2] = 'ไน' โ embedding[2] (768็ปดๅ้) | |
| ... | |
| tokens[10] = '[SEP]' โ embedding[10] (768็ปดๅ้) | |
| tokens[11] = 'ไบบ' โ embedding[11] (768็ปดๅ้) โ Document ๅผๅง | |
| tokens[12] = 'ๅทฅ' โ embedding[12] (768็ปดๅ้) | |
| tokens[13] = 'ๆบ' โ embedding[13] (768็ปดๅ้) | |
| tokens[14] = '่ฝ' โ embedding[14] (768็ปดๅ้) | |
| ... | |
| tokens[{len(tokens)-1}] = '[SEP]' โ embedding[{len(tokens)-1}] (768็ปดๅ้) | |
| ๅ ณ้ฎ็น๏ผ | |
| โ Document ไธๆฏไธไธชๅ้๏ผ | |
| โ Document ็ๆฏไธชๅญ/่ฏ้ฝๆฏไธไธชๅ้๏ผ | |
| โ ๅณไฝฟ Document ๆๅคไธชๅฅๅญ๏ผไนๆฏ่ฟ็ปญ็ token ๅบๅ | |
| """) | |
| # ============================================================================ | |
| # Part 3: ๆณจๆๅๅฆไฝ่ทจๅฅๅญๅทฅไฝ | |
| # ============================================================================ | |
| print("\n" + "=" * 80) | |
| print("๐ Part 3: ๆณจๆๅๆบๅถ่ทจๅฅๅญๅทฅไฝ") | |
| print("=" * 80) | |
| print(""" | |
| Document ๆๅคไธชๅฅๅญๆถ็ๆณจๆๅ่ฎก็ฎ๏ผ | |
| ๅ่ฎพ Document = "ๅฅๅญ1ใๅฅๅญ2ใๅฅๅญ3ใ" | |
| Tokenๅบๅ๏ผ | |
| [CLS] Query่ฏ1 Query่ฏ2 [SEP] ๅฅๅญ1่ฏ1 ๅฅๅญ1่ฏ2 ใ ๅฅๅญ2่ฏ1 ๅฅๅญ2่ฏ2 ใ ๅฅๅญ3่ฏ1 [SEP] | |
| โ โ โ โ โ โ โ โ โ โ โ โ | |
| t[0] t[1] t[2] t[3] t[4] t[5] t[6] t[7] t[8] t[9] t[10] t[11] | |
| Self-Attention ่ฎก็ฎ๏ผ | |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | |
| Query่ฏ1 (t[1]) ็ๆณจๆๅ๏ผ | |
| - ๅฏไปฅๅ ณๆณจ ๅฅๅญ1่ฏ1 (t[4]) โ | |
| - ๅฏไปฅๅ ณๆณจ ๅฅๅญ2่ฏ1 (t[7]) โ | |
| - ๅฏไปฅๅ ณๆณจ ๅฅๅญ3่ฏ1 (t[10]) โ | |
| โ Query ็่ฏๅฏไปฅ็ๅฐ Document ๆๆๅฅๅญ็ๆๆ่ฏ๏ผ | |
| ๅฅๅญ1่ฏ1 (t[4]) ็ๆณจๆๅ๏ผ | |
| - ๅฏไปฅๅ ณๆณจ Query่ฏ1 (t[1]) โ | |
| - ๅฏไปฅๅ ณๆณจ ๅฅๅญ2่ฏ1 (t[7]) โ (่ทจๅฅๅญ๏ผ) | |
| - ๅฏไปฅๅ ณๆณจ ๅฅๅญ3่ฏ1 (t[10]) โ (่ทจๅฅๅญ๏ผ) | |
| โ Document ๅ ็ไธๅๅฅๅญไน่ฝไบ็ธ็ๅฐ๏ผ | |
| ่ฟๅฐฑๆฏ"ๅ จๅฑๆณจๆๅ"(Global Attention)๏ผ | |
| ๆฏไธช token ้ฝ่ฝ็ๅฐๆดไธชๅบๅ็ๆๆ token๏ผ | |
| """) | |
| # ============================================================================ | |
| # Part 4: ไธบไปไนไธๆๅๆๅฅๅญ๏ผ | |
| # ============================================================================ | |
| print("\n" + "=" * 80) | |
| print("โ Part 4: ไธบไปไนไธๆ Document ๆๆๅคไธชๅฅๅญ๏ผ") | |
| print("=" * 80) | |
| print(""" | |
| ๆนๆกA๏ผๆ Document ๅฝๆดไฝ๏ผCrossEncoder ๅฎ้ ๅๆณ๏ผโ | |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | |
| ่พๅ ฅ๏ผ[CLS] Query [SEP] ๅฅๅญ1+ๅฅๅญ2+ๅฅๅญ3 [SEP] | |
| โ | |
| ๅๆฌกๆจ็๏ผๅพๅฐไธไธชๅๆฐ: 8.5 | |
| ไผ็น๏ผ | |
| โ ไธๆฌก่ฎก็ฎ๏ผ้ๅบฆๅฟซ | |
| โ ๅฅๅญไน้ดๅฏไปฅไบ็ธๅ ณๆณจ๏ผ็่งฃไธไธๆ | |
| โ ๆดไฝ่ฏญไน็่งฃๆดๅฅฝ | |
| ็ผบ็น๏ผ | |
| โ ๏ธ ๆ้ฟๅบฆ้ๅถ๏ผ้ๅธธ 512 tokens๏ผ | |
| ๅฆๆ Document ๅคช้ฟไผ่ขซๆชๆญ | |
| ๆนๆกB๏ผๆๆๅคไธชๅฅๅญๅๅซ่ฎก็ฎ๏ผไธๆจ่๏ผโ | |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | |
| ่พๅ ฅ1๏ผ[CLS] Query [SEP] ๅฅๅญ1 [SEP] โ ๅๆฐ: 7.2 | |
| ่พๅ ฅ2๏ผ[CLS] Query [SEP] ๅฅๅญ2 [SEP] โ ๅๆฐ: 8.1 | |
| ่พๅ ฅ3๏ผ[CLS] Query [SEP] ๅฅๅญ3 [SEP] โ ๅๆฐ: 6.5 | |
| ็ถๅๅๅนณๅๆๆๅคงๅผ๏ผ | |
| ็ผบ็น๏ผ | |
| โ ้่ฆ่ฎก็ฎ 3 ๆฌก๏ผ้ๅบฆๆ ข 3 ๅ | |
| โ ๅฅๅญไน้ดๆ ๆณไบ็ธ็่งฃ | |
| โ ไธขๅคฑไบไธไธๆไฟกๆฏ | |
| โ ๅฆไฝ่ๅๅๆฐ๏ผๅนณๅ๏ผๆๅคง๏ผ้ฝไธๅฎ็พ | |
| """) | |
| # ============================================================================ | |
| # Part 5: ๅฎ้ ไปฃ็ ็คบไพ | |
| # ============================================================================ | |
| print("\n" + "=" * 80) | |
| print("๐ป Part 5: ๅฎ้ ไปฃ็ ็คบไพ") | |
| print("=" * 80) | |
| print(""" | |
| ไฝฟ็จ CrossEncoder ็็ๅฎไปฃ็ ๏ผ | |
| ```python | |
| from sentence_transformers import CrossEncoder | |
| model = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2') | |
| query = "ไปไนๆฏไบบๅทฅๆบ่ฝ๏ผ" | |
| # Document ๆๅคไธชๅฅๅญ | |
| document = \"\"\" | |
| ไบบๅทฅๆบ่ฝๆฏ่ฎก็ฎๆบ็งๅญฆ็ไธไธชๅๆฏใ | |
| ๅฎ่ดๅไบๅๅปบๆบ่ฝ็ณป็ปใ | |
| ่ฟไบ็ณป็ปๅฏไปฅๆง่ก้่ฆไบบ็ฑปๆบ่ฝ็ไปปๅกใ | |
| \"\"\" | |
| # ็ดๆฅไผ ๅ ฅๆดไธช Document๏ผ | |
| pairs = [[query, document]] # โ ๆณจๆ๏ผๆดไธช document ไฝไธบไธไธชๅญ็ฌฆไธฒ | |
| # ๆจกๅๅ ้จไผ่ชๅจ๏ผ | |
| # 1. ๆผๆฅ๏ผ[CLS] query [SEP] document [SEP] | |
| # 2. ๅ่ฏ๏ผๅๅๆ tokens๏ผๅฏ่ฝๆ 50-100 ไธช๏ผ | |
| # 3. ็ผ็ ๏ผๆฏไธช token ไธไธชๅ้ | |
| # 4. ๆณจๆๅ๏ผๆๆ tokens ไบ็ธๅ ณๆณจ | |
| # 5. ่พๅบ๏ผไธไธชๅๆฐ | |
| scores = model.predict(pairs) | |
| print(f"็ธๅ ณๆงๅๆฐ: {scores[0]}") # ่พๅบ: 8.26 | |
| ``` | |
| ๅ ณ้ฎ็่งฃ๏ผ | |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | |
| Document ไธไผ่ขซๆๅ๏ผ | |
| Document ็ๆฏไธชๅญ/่ฏ้ฝไผๅๆไธไธชๅ้๏ผ | |
| ๆๆๅ้้่ฟๆณจๆๅๆบๅถไบ็ธ่ฟๆฅ๏ผ | |
| ๆ็ป่พๅบไธไธชๆดไฝ็็ธๅ ณๆงๅๆฐ๏ผ | |
| """) | |
| # ============================================================================ | |
| # Part 6: Token ้ๅถ้ฎ้ข | |
| # ============================================================================ | |
| print("\n" + "=" * 80) | |
| print("โ ๏ธ Part 6: Document ๅคช้ฟๆไนๅ๏ผ") | |
| print("=" * 80) | |
| print(""" | |
| CrossEncoder ๆ้ฟๅบฆ้ๅถ๏ผ้ๅธธ 512 tokens๏ผ | |
| ๅฆๆ Document ๅคช้ฟ๏ผๆฏๅฆ 1000 ไธชๅญ๏ผ๏ผ | |
| ่งฃๅณๆนๆก1๏ผๆชๆญ๏ผๆๅธธ็จ๏ผ | |
| โโโโโโโโโโโโโโโโโโโโโโโ | |
| ๅชไฟ็ๅ 512 tokens๏ผ | |
| [CLS] Query [SEP] Documentๅ400ไธชๅญ [SEP] | |
| ไผ็น๏ผ็ฎๅๅฟซ้ | |
| ็ผบ็น๏ผๅฏ่ฝไธขๅคฑ้่ฆไฟกๆฏ | |
| ่งฃๅณๆนๆก2๏ผๆปๅจ็ชๅฃ | |
| โโโโโโโโโโโโโโโโโ | |
| ๅๆๅคไธช็ชๅฃ๏ผๆฏไธช็ชๅฃๅ็ฌ่ฎก็ฎ๏ผ | |
| ็ชๅฃ1: [CLS] Query [SEP] Document[0:400] [SEP] โ ๅๆฐ: 7.2 | |
| ็ชๅฃ2: [CLS] Query [SEP] Document[200:600] [SEP] โ ๅๆฐ: 8.5 | |
| ็ชๅฃ3: [CLS] Query [SEP] Document[400:800] [SEP] โ ๅๆฐ: 6.8 | |
| ๅๆ้ซๅ: 8.5 | |
| ไผ็น๏ผไธไผไธขๅคฑไฟกๆฏ | |
| ็ผบ็น๏ผ่ฎก็ฎ้ๅขๅ | |
| ่งฃๅณๆนๆก3๏ผๅ ็จ Bi-Encoder ็ฒๆ | |
| โโโโโโโโโโโโโโโโโโโโโโโโโโ | |
| 1. ๆ้ฟ Document ๆๆๆฎต่ฝ | |
| 2. ็จ Bi-Encoder ๅฟซ้ๆพๅฐๆ็ธๅ ณ็ 1-2 ไธชๆฎต่ฝ | |
| 3. ๅชๅฏน่ฟไบๆฎต่ฝ็จ CrossEncoder ้ๆ | |
| ไผ็น๏ผ้ๅบฆๅฟซ๏ผๅ็กฎ็้ซ | |
| ็ผบ็น๏ผไธค้ถๆฎตๅค็ | |
| ไฝ ็้กน็ฎไฝฟ็จ็ๆฏๆนๆก1๏ผๆชๆญ๏ผ๏ผ | |
| โโโโโโโโโโโโโโโโโโโโโโโโโ | |
| ๅจ reranker.py ไธญ๏ผ | |
| CrossEncoderReranker(max_length=512) โ ่ถ ่ฟ 512 ไผ่ชๅจๆชๆญ | |
| """) | |
| # ============================================================================ | |
| # Part 7: ๅฏ่งๅๆป็ป | |
| # ============================================================================ | |
| print("\n" + "=" * 80) | |
| print("๐ Part 7: ๅฏ่งๅๆป็ป") | |
| print("=" * 80) | |
| print(""" | |
| Document ๅค็็ๅฎๆดๆต็จ๏ผ | |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | |
| ่พๅ ฅ Document (ๅคๅฅๅญ): | |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | |
| โ "ไบบๅทฅๆบ่ฝๆฏ่ฎก็ฎๆบ็งๅญฆ็ไธไธชๅๆฏใๅฎ่ดๅไบๅๅปบๆบ่ฝ็ณป็ปใ" โ | |
| โ ๅฅๅญ1 ๅฅๅญ2 โ | |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | |
| โ | |
| ๆผๆฅๆๅไธๅบๅ | |
| โ | |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | |
| โ [CLS] ไปไนๆฏไบบๅทฅๆบ่ฝ๏ผ [SEP] ไบบๅทฅๆบ่ฝๆฏ...ๆบ่ฝ็ณป็ปใ [SEP] โ | |
| โ ็นๆฎ Query tokens ๅ้ Document tokens ็ปๆ โ | |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | |
| โ | |
| ๅ่ฏ (Tokenization) | |
| โ | |
| โโโโโโโฌโโโโโโฌโโโโโโฌโโโโโโฌโโโโโโฌโโโโโโฌโโโโโโฌโโโโโโฌโโโโโโฌโโโโโโ | |
| โ[CLS]โ ไป โ ไน โ[SEP]โ ไบบ โ ๅทฅ โ ...โ ็ป โ ใ โ[SEP]โ | |
| โโโโโโโดโโโโโโดโโโโโโดโโโโโโดโโโโโโดโโโโโโดโโโโโโดโโโโโโดโโโโโโดโโโโโโ | |
| โ | |
| ๆฏไธช token โ ไธไธช 768็ปดๅ้ | |
| โ | |
| โโโโโโโฌโโโโโโฌโโโโโโฌโโโโโโฌโโโโโโฌโโโโโโฌโโโโโโฌโโโโโโฌโโโโโโฌโโโโโโ | |
| โ Vโ โ Vโ โ Vโ โ Vโ โ Vโ โ Vโ โ ... โ Vโโโโ Vโโโโ Vโ โ | |
| โ768็ปดโ768็ปดโ768็ปดโ768็ปดโ768็ปดโ768็ปดโ ... โ768็ปดโ768็ปดโ768็ปดโ | |
| โโโโโโโดโโโโโโดโโโโโโดโโโโโโดโโโโโโดโโโโโโดโโโโโโดโโโโโโดโโโโโโดโโโโโโ | |
| โ | |
| Self-Attention (12 ๅฑ) | |
| ๆฏไธชๅ้้ฝ่ฝ"็ๅฐ"ๆๆๅ ถไปๅ้ | |
| โ | |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | |
| โ Vโ' (ๆดๆฐๅ็ [CLS] ๅ้) โ | |
| โ ๅ ๅซไบๆดไธชๅบๅ็ไฟกๆฏ โ | |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | |
| โ | |
| ๅ จ่ฟๆฅๅฑ (ๅ็ฑปๅคด) | |
| โ | |
| ็ธๅ ณๆงๅๆฐ | |
| 8.26 | |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | |
| ๅ ณ้ฎ็นๆป็ป๏ผ | |
| 1. Document ๆดไฝๅค็ โ | |
| โโ ไธๆฏไธไธชๅ้๏ผๆฏๅพๅคๅ้็ๅบๅ | |
| 2. ๆฏไธชๅญ/่ฏไธไธชๅ้ โ | |
| โโ ไธๆฏๆฏไธชๅฅๅญไธไธชๅ้ | |
| 3. ๅ จๅฑๆณจๆๅ โ | |
| โโ Query ็่ฏ่ฝ็ๅฐ Document ๆๆๅฅๅญ็ๆๆ่ฏ | |
| 4. ๆ็ปไธไธชๅๆฐ โ | |
| โโ ไป [CLS] ๅ้ๆๅๅบๆฅ | |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | |
| """) | |
| # ============================================================================ | |
| # Part 8: ๅฏนๆฏ Bi-Encoder ็ๅค็ๆนๅผ | |
| # ============================================================================ | |
| print("\n" + "=" * 80) | |
| print("๐ Part 8: ๅฏนๆฏ Bi-Encoder ็ๅค็ๆนๅผ") | |
| print("=" * 80) | |
| print(""" | |
| Bi-Encoder (ๅ้ๆฃ็ดข): | |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | |
| Document: "ๅฅๅญ1ใๅฅๅญ2ใๅฅๅญ3ใ" | |
| โ | |
| Encoder (BERT) | |
| โ | |
| ๅ [CLS] ๅ้ | |
| โ | |
| ๅไธชๅ้ (768็ปด) โ Document ่ขซๅ็ผฉๆไธไธชๅ้๏ผ | |
| โ | |
| ไธ Query ๅ้ๅไฝๅผฆ็ธไผผๅบฆ | |
| โ | |
| ็ธๅ ณๆงๅๆฐ | |
| CrossEncoder (ๆทฑๅบฆ้ๆ): | |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | |
| Query + Document: "[CLS] Query [SEP] ๅฅๅญ1ใๅฅๅญ2ใๅฅๅญ3ใ [SEP]" | |
| โ | |
| Encoder (BERT) | |
| โ | |
| ไฟ็ๆๆ token ็ๅ้ | |
| โ | |
| ๅ้ๅบๅ (n ร 768) โ ไฟ็ไบๆๆ็ป่๏ผ | |
| โ | |
| Self-Attention ่ฎฉๆๆ่ฏไบ็ธ็่งฃ | |
| โ | |
| ็ธๅ ณๆงๅๆฐ | |
| ๅบๅซ๏ผ | |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | |
| Bi-Encoder: Document โ 1 ไธชๅ้ (ไฟกๆฏๅ็ผฉ) | |
| CrossEncoder: Document โ n ไธชๅ้ (ไฟกๆฏไฟ็) | |
| Bi-Encoder: Query ๅ Document ๅๅผๅค็ | |
| CrossEncoder: Query ๅ Document ไธ่ตทๅค็ | |
| Bi-Encoder: ๅฟซ้ไฝไธๅคๅ็กฎ | |
| CrossEncoder: ๆ ขไฝ้ๅธธๅ็กฎ | |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | |
| """) | |
| print("\n" + "=" * 80) | |
| print("โ ๆป็ป็ญๆก") | |
| print("=" * 80) | |
| print(""" | |
| ไฝ ็้ฎ้ข๏ผDocument ๆฏๅๆไธไธช embedding๏ผ่ฟๆฏๆฏไธช sentence ๅๆไธๅ ๅ้๏ผ | |
| ็ญๆก๏ผ้ฝไธๆฏ๏ผ ๐ | |
| ๆญฃ็กฎ็่งฃ๏ผ | |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | |
| โ Document ๆดไฝไฝไธบ่พๅ ฅ๏ผไธๆๅๅฅๅญ๏ผ | |
| โ ไฝ Document ็ๆฏไธชๅญ/่ฏ้ฝไผ็ๆไธไธชๅ้ | |
| โ ไธๆฏ"ไธไธช embedding"๏ผ่ๆฏ"ไธไธชๅ้ๅบๅ" | |
| โ ไธๆฏ"ๆๅฅๅญๅ"๏ผ่ๆฏ"ๆๅญ/่ฏๅ" | |
| Document (50ไธชๅญ) โ 50 ไธชๅ้ (ๆฏไธช 768 ็ปด) | |
| ไธๆฏ 1 ไธชๅ้ | |
| ไนไธๆฏ 3 ไธชๅ้(ๅฆๆๆ3ไธชๅฅๅญ) | |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | |
| ่ฟๅฐฑๆฏไธบไปไน CrossEncoder ่ฝ็่งฃ็ป็ฒๅบฆ็่ฏญไนๅ ณ็ณป๏ผ | |
| """) | |
| print("\n๐ก ็ฐๅจไฝ ็่งฃไบๅ๏ผๅฆๆ็้ฎ๏ผ่ฏท็ปง็ปญๆ้ฎ๏ผ\n") | |