| | --- |
| | license: cc-by-nc-4.0 |
| | language: |
| | - en |
| | base_model: |
| | - anferico/bert-for-patents |
| | tags: |
| | - patent |
| | - embeddings |
| | - contrastive-learning |
| | - information-retrieval |
| | pipeline_tag: feature-extraction |
| | --- |
| | |
| | # PatentMap-V0-SecPair-SummaryDrawing |
| |
|
| | **PatentMap-V0-SecPair-SummaryDrawing** is a patent embedding model trained on abstract + summary + drawing sections with section-pair augmentation. It is part of the PatentMap V0 model collection. |
| |
|
| | ## Model Details |
| |
|
| | - **Base Model:** [anferico/bert-for-patents](https://huggingface.co/anferico/bert-for-patents) |
| | - **Training Objective:** Contrastive learning (InfoNCE loss) |
| | - **Architecture:** BERT-large (340M parameters) |
| | - **Embedding Dimension:** 1024 |
| | - **Max Sequence Length:** 512 tokens |
| | - **Vocabulary Size:** 39860 |
| | - **Training Data:** USPTO patent applications (2010-2018) from [HUPD corpus](https://huggingface.co/datasets/HUPD/hupd) |
| |
|
| | ### Training Configuration |
| |
|
| | - **Patent Sections Used:** abstract + summary + drawing |
| | - **Data Augmentation:** dropout + section_pair |
| | - **Batch Size:** 512 |
| | - **Learning Rate:** 1e-5 |
| | |
| | ### Special Tokens |
| | |
| | This model includes additional patent-specific special tokens: |
| | - `[drawing]` |
| | |
| | ## Usage |
| | |
| | ### Input Format |
| | |
| | This model expects patent text formatted with special tokens: |
| | |
| | - **For abstract**: `Title [SEP] [abstract] Abstract text` |
| | - **For other sections**: `[section] Section text` (no title prefix) |
| | |
| | Example: |
| | ```python |
| | # Abstract with title |
| | text = "Smart thermostat system [SEP] [abstract] A thermostat system comprising..." |
| | |
| | # Claim without title |
| | text = "[claim] A method comprising: step 1, step 2..." |
| | ``` |
| | |
| | ### Code Example |
| | |
| | ```python |
| | from transformers import AutoTokenizer, AutoModel |
| | import torch |
| | |
| | # Load model and tokenizer |
| | model_name = "ZoeYou/PatentMap-V0-SecPair-SummaryDrawing" |
| | tokenizer = AutoTokenizer.from_pretrained(model_name) |
| | model = AutoModel.from_pretrained(model_name) |
| |
|
| | # Format patent text |
| | title = "Smart thermostat system" |
| | abstract = "A thermostat system comprising a temperature sensor..." |
| | patent_text = f"{title} [SEP] [abstract] {abstract}" |
| | |
| | # Encode and get embeddings |
| | inputs = tokenizer(patent_text, return_tensors="pt", padding=True, truncation=True, max_length=512) |
| |
|
| | with torch.no_grad(): |
| | outputs = model(**inputs) |
| | embeddings = outputs.last_hidden_state[:, 0, :] # CLS token |
| | |
| | print(embeddings.shape) # torch.Size([1, 1024]) |
| | ``` |
| | |
| | ## Evaluation |
| | |
| | This model has been evaluated on multiple patent-specific tasks: |
| | |
| | - **IPC Classification** (linear probe and KNN) |
| | - **Prior Art Search** (recall@k, nDCG@k) |
| | - **Embedding Quality Metrics** (uniformity, alignment, topology) |
| | |
| | For detailed evaluation results, see the [PatentMap paper](https://arxiv.org/abs/2511.10657). |
| | |
| | ## Intended Use |
| | |
| | This model is designed for: |
| | - Patent document retrieval |
| | - Patent similarity search |
| | - Prior art discovery |
| | - IPC classification |
| | - Patent landscape analysis |
| | |
| | ## Citation |
| | |
| | If you use this model, please cite: |
| | |
| | ```bibtex |
| | @article{zuo2025patent, |
| | title={Patent Representation Learning via Self-supervision}, |
| | author={Zuo, You and Gerdes, Kim and de La Clergerie, Eric Villemonte and Sagot, Beno{\^i}t}, |
| | journal={arXiv preprint arXiv:2511.10657}, |
| | year={2025} |
| | } |
| | ``` |
| | |
| | ## Model Collection |
| | |
| | This model is part of the PatentMap V0 collection. For an overview of all models, see [PatentMap-V0](https://huggingface.co/ZoeYou/patentmapv0-models). |
| | |
| | ## License |
| | |
| | This model is released under CC BY-NC 4.0 license (non-commercial use only). |
| | |
| | ## Contact |
| | |
| | For questions or issues, please open an issue on the [GitHub repository](https://github.com/ZoeYou/patentmapv0) or contact the authors. |
| | |