--- language: en license: mit library_name: transformers tags: - bert - scientific-text - embeddings - fine-tuned pipeline_tag: feature-extraction --- # scibert-citation-model This model is a fine-tuned version of SciBERT specifically optimized for generating embeddings from scientific papers. ## Model Details - **Base Model**: SciBERT (Scientific BERT) - **Fine-tuning Task**: Scientific paper understanding and embedding generation - **Language**: English (Scientific/Academic) - **Vocabulary**: Scientific vocabulary ## Usage ```python from transformers import AutoTokenizer, AutoModel import torch # Load model and tokenizer tokenizer = AutoTokenizer.from_pretrained("your-username/scibert-citation-model") model = AutoModel.from_pretrained("your-username/scibert-citation-model") # Generate embeddings text = "Your scientific text here" inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512) with torch.no_grad(): outputs = model(**inputs) embeddings = outputs.last_hidden_state[:, 0, :] # [CLS] token embedding print(f"Embeddings shape: {embeddings.shape}") ``` ## Performance Fine-tuned SciBert ## Training Details - **Training Framework**: PyTorch/Transformers - **Fine-tuning Objective**: Scientific text understanding ## Citation If you use this model in your research, please cite appropriately.