PatentMap-V0-SecPair-Claim

PatentMap-V0-SecPair-Claim is a patent embedding model trained on abstract + claim sections with section-pair augmentation. It is part of the PatentMap V0 model collection.

Model Details

  • Base Model: anferico/bert-for-patents
  • Training Objective: Contrastive learning (InfoNCE loss)
  • Architecture: BERT-large (340M parameters)
  • Embedding Dimension: 1024
  • Max Sequence Length: 512 tokens
  • Vocabulary Size: 39859
  • Training Data: USPTO patent grants (2010-2018) from HUPD corpus

Training Configuration

  • Patent Sections Used: abstract + claim
  • Data Augmentation: dropout + section_pair
  • Batch Size: 512
  • Learning Rate: 1e-5

Usage

Input Format

This model expects patent text formatted with special tokens:

  • For abstract: Title [SEP] [abstract] Abstract text
  • For other sections: [section] Section text (no title prefix)

Example:

# Abstract with title
text = "Smart thermostat system [SEP] [abstract] A thermostat system comprising..."

# Claim without title
text = "[claim] A method comprising: step 1, step 2..."

Code Example

from transformers import AutoTokenizer, AutoModel
import torch

# Load model and tokenizer
model_name = "ZoeYou/PatentMap-V0-SecPair-Claim"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)

# Format patent text
title = "Smart thermostat system"
abstract = "A thermostat system comprising a temperature sensor..."
patent_text = f"{title} [SEP] [abstract] {abstract}"

# Encode and get embeddings
inputs = tokenizer(patent_text, return_tensors="pt", padding=True, truncation=True, max_length=512)

with torch.no_grad():
    outputs = model(**inputs)
    embeddings = outputs.last_hidden_state[:, 0, :]  # CLS token
    
print(embeddings.shape)  # torch.Size([1, 1024])

Evaluation

This model has been evaluated on multiple patent-specific tasks:

  • IPC Classification (linear probe and KNN)
  • Prior Art Search (recall@k, nDCG@k)
  • Embedding Quality Metrics (uniformity, alignment, topology)

For detailed evaluation results, see the PatentMap paper.

Intended Use

This model is designed for:

  • Patent document retrieval
  • Patent similarity search
  • Prior art discovery
  • IPC classification
  • Patent landscape analysis

Citation

If you use this model, please cite:

@article{zuo2025patent,
  title={Patent Representation Learning via Self-supervision},
  author={Zuo, You and Gerdes, Kim and de La Clergerie, Eric Villemonte and Sagot, Beno{\^i}t},
  journal={arXiv preprint arXiv:2511.10657},
  year={2025}
}

Model Collection

This model is part of the PatentMap V0 collection. For an overview of all models, see PatentMap-V0.

License

This model is released under CC BY-NC 4.0 license (non-commercial use only).

Contact

For questions or issues, please open an issue on the GitHub repository or contact the authors.

Downloads last month
37
Safetensors
Model size
0.3B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ZoeYou/PatentMap-V0-SecPair-Claim

Finetuned
(25)
this model

Paper for ZoeYou/PatentMap-V0-SecPair-Claim