Update README.md
Browse files
README.md
CHANGED
|
@@ -52,18 +52,21 @@ print(embedding.shape) # (1, hidden_dim)
|
|
| 52 |
|
| 53 |
## 🧩 Model Overview
|
| 54 |
|
| 55 |
-
| Property | Description |
|
| 56 |
-
|
| 57 |
-
| **Architecture** | Sentence-BERT (all-MiniLM-L6-v2 backbone) |
|
| 58 |
-
| **Pooling** | Weighted mean aggregation over transcript chunks |
|
| 59 |
-
| **Max
|
| 60 |
-
| **
|
| 61 |
-
| **Objective** | Contrastive learning (DPR-style) using binary similarity loss |
|
|
|
|
| 62 |
| **Task** | Encode scientific talks into a shared semantic space with their cited papers |
|
|
|
|
| 63 |
|
| 64 |
---
|
| 65 |
|
| 66 |
|
|
|
|
| 67 |
## Citation
|
| 68 |
|
| 69 |
If you use this dataset, please cite the following paper:
|
|
|
|
| 52 |
|
| 53 |
## 🧩 Model Overview
|
| 54 |
|
| 55 |
+
| **Property** | **Description** |
|
| 56 |
+
|:-------------|:----------------|
|
| 57 |
+
| **Architecture** | Sentence-BERT (`all-MiniLM-L6-v2` backbone) |
|
| 58 |
+
| **Pooling Strategy** | Weighted mean aggregation over transcript chunks |
|
| 59 |
+
| **Max Tokens per Chunk** | 512 |
|
| 60 |
+
| **Input Representation** | Transcript + talk title + publication year |
|
| 61 |
+
| **Training Objective** | Contrastive learning (DPR-style) using binary similarity loss |
|
| 62 |
+
| **Training Data** | [Talk2Ref dataset](https://huggingface.co/datasets/s8frbroy/talk2ref) – transcripts of 6,279 scientific talks |
|
| 63 |
| **Task** | Encode scientific talks into a shared semantic space with their cited papers |
|
| 64 |
+
| **Paired Model** | [Talk2Ref Cited Paper Encoder](https://huggingface.co/s8frbroy/talk2ref_ref_key_cited_paper_encoder) |
|
| 65 |
|
| 66 |
---
|
| 67 |
|
| 68 |
|
| 69 |
+
|
| 70 |
## Citation
|
| 71 |
|
| 72 |
If you use this dataset, please cite the following paper:
|