s8frbroy
/

talk2ref_query_talk_encoder

Feature Extraction

scientific-retrieval

dense-passage-retrieval

sentence-embedding

Model card Files Files and versions

s8frbroy commited on Oct 29, 2025

Commit

bda8d80

·

verified ·

1 Parent(s): c5ec30f

Update README.md

Files changed (1) hide show

README.md +10 -7

README.md CHANGED Viewed

@@ -52,18 +52,21 @@ print(embedding.shape)  # (1, hidden_dim)
 ## 🧩 Model Overview
-| Property | Description |
-|-----------|--------------|
-| **Architecture** | Sentence-BERT (all-MiniLM-L6-v2 backbone) |
-| **Pooling** | Weighted mean aggregation over transcript chunks |
-| **Max tokens per chunk** | 512 |
-| **Trained on** | Talk2Ref dataset — transcripts of 6,279 scientific talks |
-| **Objective** | Contrastive learning (DPR-style) using binary similarity loss |
 | **Task** | Encode scientific talks into a shared semantic space with their cited papers |
 ---
 ## Citation
 If you use this dataset, please cite the following paper:

 ## 🧩 Model Overview
+| **Property** | **Description** |
+|:-------------|:----------------|
+| **Architecture** | Sentence-BERT (`all-MiniLM-L6-v2` backbone) |
+| **Pooling Strategy** | Weighted mean aggregation over transcript chunks |
+| **Max Tokens per Chunk** | 512 |
+| **Input Representation** | Transcript + talk title + publication year |
+| **Training Objective** | Contrastive learning (DPR-style) using binary similarity loss |
+| **Training Data** | [Talk2Ref dataset](https://huggingface.co/datasets/s8frbroy/talk2ref) – transcripts of 6,279 scientific talks |
 | **Task** | Encode scientific talks into a shared semantic space with their cited papers |
+| **Paired Model** | [Talk2Ref Cited Paper Encoder](https://huggingface.co/s8frbroy/talk2ref_ref_key_cited_paper_encoder) |
 ---
 ## Citation
 If you use this dataset, please cite the following paper: