BSG CyLlama v2.0.0
Corpus-level scientific summarization using soft-prompt conditioned language generation.
BSG CyLlama generates structured summaries of scientific research clusters -- groups of related publications clustered by topic. Unlike document-level summarizers, it takes the combined abstracts of an entire cluster as input and produces multi-field output capturing the collective knowledge.
Architecture
Source Abstracts (concatenated text)
|
v
SBERT Encoder (thenlper/gte-large, 1024-dim)
|
v
Sbert2Prompt (Linear -> LayerNorm -> GELU -> Linear -> LayerNorm)
|
v
16 Soft Prompt Tokens (2048-dim each)
|
v
LoRA-adapted Llama-3.2-1B-Instruct (rank=64, alpha=128)
|
v
4 Structured Output Fields
Output Fields
| Field | Label | Description |
|---|---|---|
| Abstract | ABSTRACT |
Multi-sentence synthesis of the cluster's research findings |
| Overview | OVERVIEW |
Concise 2-3 sentence summary of the cluster theme |
| Title | TITLE |
Descriptive research area title (8-15 words) |
| Headline | HEADLINE |
Short punchy label (3-7 words) |
Training
- Data: 19,172 scientific research clusters with human-validated and DeepSeek-generated summaries
- Method: Format-gated checkpoint selection (format score >= 0.85, then maximize semantic similarity), prompt norm regularization, LoRA freeze at epoch 3
- Base model:
meta-llama/Llama-3.2-1B-Instruct - Encoder:
thenlper/gte-large(1024-dim sentence embeddings) - LoRA: rank=64, alpha=128, targeting all attention + MLP projections
Performance
| Metric | Score |
|---|---|
| Semantic Similarity | 0.755 |
| Format Compliance | 0.875 |
| Coherence | 0.994 |
| Composite | 0.863 |
Usage
import torch
import torch.nn as nn
from sentence_transformers import SentenceTransformer
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel, LoraConfig
from huggingface_hub import hf_hub_download, snapshot_download
import json
# Download model files
model_dir = snapshot_download("jimnoneill/BSG_CyLlama")
# Load config
with open(f"{model_dir}/config.json") as f:
config = json.load(f)
# Load SBERT encoder
sbert = SentenceTransformer(config["sbert_model_name"])
# Load prompt generator (Sbert2Prompt with LayerNorm)
class Sbert2Prompt(nn.Module):
def __init__(self, sbert_dim, llama_hidden_dim, prompt_length=16):
super().__init__()
self.prompt_length = prompt_length
self.llama_hidden_dim = llama_hidden_dim
self.projection = nn.Sequential(
nn.Linear(sbert_dim, llama_hidden_dim * 2),
nn.LayerNorm(llama_hidden_dim * 2),
nn.GELU(),
nn.Dropout(0.1),
nn.Linear(llama_hidden_dim * 2, llama_hidden_dim * prompt_length),
nn.LayerNorm(llama_hidden_dim * prompt_length),
)
def forward(self, sbert_emb):
B = sbert_emb.size(0)
out = self.projection(sbert_emb)
return out.view(B, self.prompt_length, self.llama_hidden_dim)
device = "cuda" if torch.cuda.is_available() else "cpu"
prompt_gen = Sbert2Prompt(
config["embedding_dim"],
config["llama_hidden_dim"],
config["prompt_length"]
)
prompt_gen.load_state_dict(torch.load(f"{model_dir}/prompt_generator.pt", map_location=device))
prompt_gen = prompt_gen.to(device).eval()
# Load LoRA-adapted LLM
tokenizer = AutoTokenizer.from_pretrained(f"{model_dir}/model")
base_model = AutoModelForCausalLM.from_pretrained(
config["model_name"], torch_dtype=torch.float16, device_map=device
)
model = PeftModel.from_pretrained(base_model, f"{model_dir}/model")
model.eval()
# Generate summaries for a cluster of abstracts
abstracts = [
"We studied the role of gut microbiota in inflammatory bowel disease...",
"Our findings demonstrate that fecal microbiota transplantation can...",
"Metagenomic analysis revealed significant dysbiosis patterns in..."
]
combined_text = " ".join(abstracts)
# Encode with SBERT
embedding = sbert.encode([combined_text], convert_to_tensor=True).to(device)
# Generate soft prompts
with torch.no_grad():
soft_prompts = prompt_gen(embedding.float())
# Build generation prompt with theme instruction
theme_instruction = (
"Provide a comprehensive overview covering key findings, "
"methodology, significance, and broader context."
)
for label in config["labels"]:
generation_prompt = (
f"<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n"
f"You are a scientific summarization assistant. {theme_instruction}\n"
f"<|eot_id|><|start_header_id|>user<|end_header_id|>\n"
f"Summarize the following research cluster.\n"
f"Source: {combined_text[:2000]}\n"
f"<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n"
f"{label}: "
)
input_ids = tokenizer(generation_prompt, return_tensors="pt").input_ids.to(device)
input_embeds = model.get_input_embeddings()(input_ids)
# Prepend soft prompts
input_embeds = torch.cat([soft_prompts.half(), input_embeds], dim=1)
attention_mask = torch.ones(input_embeds.shape[:2], device=device)
with torch.no_grad():
outputs = model.generate(
inputs_embeds=input_embeds,
attention_mask=attention_mask,
max_new_tokens=200 if label == "ABSTRACT" else 80,
temperature=0.7,
do_sample=True,
top_p=0.9,
repetition_penalty=1.15,
)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"{label}: {result}")
File Structure
BSG_CyLlama/
bsg_cyllama_logo.png # Logo
config.json # Model configuration
prompt_generator.pt # Sbert2Prompt weights (265 MB)
model/
adapter_config.json # LoRA adapter configuration
adapter_model.safetensors # LoRA weights (173 MB)
tokenizer.json # Tokenizer
tokenizer_config.json # Tokenizer config
special_tokens_map.json # Special tokens
chat_template.jinja # Chat template
Requirements
torch>=2.0
transformers>=4.40
peft>=0.10
sentence-transformers>=2.0
huggingface-hub
License
This model is released under the Llama 3.2 Community License.
Citation
@software{bsg_cyllama_2026,
title={BSG CyLlama: Corpus-Level Scientific Summarization},
author={O'Neill, Jim},
year={2026},
url={https://huggingface.co/jimnoneill/BSG_CyLlama},
version={2.0.0}
}
Related
- Training Data: jimnoneill/BSG_CyLlama-training
- Base Model: meta-llama/Llama-3.2-1B-Instruct
- Encoder: thenlper/gte-large
- Downloads last month
- 1
Model tree for jimnoneill/BSG_CyLlama
Base model
meta-llama/Llama-3.2-1B-Instruct