YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
FAISS Index for Patent Retrieval
This repository contains FAISS index files created with the following parameters:
- Model: SPECTER2 (allenai/specter2_base)
- Index type: IVF100,PQ16.index
- Distance metric: L2
- Embedding dimension: 768
- Corpus: USPTO Patents
- PQ Quantization: PQ64 (improved precision over default PQ16)
Files
- specter2_IVF100,PQ16.index: FAISS index file
- specter2_IVF100,PQ64.index: FAISS index file
- emb_specter2.memmap: Embedding memmap file
- patents_all.parquet: Corpus parquet file
Usage
To use these files, download them and load with FAISS:
import faiss
import numpy as np
from huggingface_hub import hf_hub_download
# Download and load index
index_path = hf_hub_download(repo_id="ErzhuoShao/USPTO-Specter2-faiss", filename="specter2_IVF100,PQ16.index")
index = faiss.read_index(index_path)
# Optionally download and load embeddings if needed
emb_path = hf_hub_download(repo_id="ErzhuoShao/USPTO-Specter2-faiss", filename="emb_specter2.memmap")
embeddings = np.memmap(
emb_path,
mode="r",
dtype=np.float32
).reshape(-1, 768) # Adjust shape as needed
# Load corpus
import pandas as pd
corpus = pd.read_parquet("path/to/downloaded/corpus.parquet")
# Example query
from transformers import AutoTokenizer, AutoModel
import torch
# Load the same model used to build the index
tokenizer = AutoTokenizer.from_pretrained("allenai/specter2_base")
model = AutoModel.from_pretrained("allenai/specter2_base")
# Encode a query
query = "Machine learning techniques for computer vision"
inputs = tokenizer(query, return_tensors="pt", max_length=512, padding=True, truncation=True)
with torch.no_grad():
outputs = model(**inputs)
query_vector = outputs.last_hidden_state[:, 0].numpy().astype('float32')
# Search the index
distances, indices = index.search(query_vector, k=5)
For more details, refer to the original repository.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support