File size: 7,309 Bytes

---
license: llama3.2
tags:
  - scientific-summarization
  - lora
  - llama
  - sentence-transformers
  - corpus-level
  - research-clusters
language:
  - en
pipeline_tag: text-generation
library_name: peft
base_model: meta-llama/Llama-3.2-1B-Instruct
---

<p align="center">
  <img src="bsg_cyllama_logo.png" alt="BSG CyLlama" width="400"/>
</p>

# BSG CyLlama v2.0.0

**Corpus-level scientific summarization using soft-prompt conditioned language generation.**

BSG CyLlama generates structured summaries of scientific research clusters -- groups of related publications clustered by topic. Unlike document-level summarizers, it takes the combined abstracts of an entire cluster as input and produces multi-field output capturing the collective knowledge.

## Architecture

```
Source Abstracts (concatenated text)
        |
        v
  SBERT Encoder (thenlper/gte-large, 1024-dim)
        |
        v
  Sbert2Prompt (Linear -> LayerNorm -> GELU -> Linear -> LayerNorm)
        |
        v
  16 Soft Prompt Tokens (2048-dim each)
        |
        v
  LoRA-adapted Llama-3.2-1B-Instruct (rank=64, alpha=128)
        |
        v
  4 Structured Output Fields
```

## Output Fields

| Field | Label | Description |
|-------|-------|-------------|
| Abstract | `ABSTRACT` | Multi-sentence synthesis of the cluster's research findings |
| Overview | `OVERVIEW` | Concise 2-3 sentence summary of the cluster theme |
| Title | `TITLE` | Descriptive research area title (8-15 words) |
| Headline | `HEADLINE` | Short punchy label (3-7 words) |

## Training

- **Data**: 19,172 scientific research clusters with human-validated and DeepSeek-generated summaries
- **Method**: Format-gated checkpoint selection (format score >= 0.85, then maximize semantic similarity), prompt norm regularization, LoRA freeze at epoch 3
- **Base model**: `meta-llama/Llama-3.2-1B-Instruct`
- **Encoder**: `thenlper/gte-large` (1024-dim sentence embeddings)
- **LoRA**: rank=64, alpha=128, targeting all attention + MLP projections

## Performance

| Metric | Score |
|--------|-------|
| Semantic Similarity | 0.755 |
| Format Compliance | 0.875 |
| Coherence | 0.994 |
| Composite | 0.863 |

## Usage

```python
import torch
import torch.nn as nn
from sentence_transformers import SentenceTransformer
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel, LoraConfig
from huggingface_hub import hf_hub_download, snapshot_download
import json

# Download model files
model_dir = snapshot_download("jimnoneill/BSG_CyLlama")

# Load config
with open(f"{model_dir}/config.json") as f:
    config = json.load(f)

# Load SBERT encoder
sbert = SentenceTransformer(config["sbert_model_name"])

# Load prompt generator (Sbert2Prompt with LayerNorm)
class Sbert2Prompt(nn.Module):
    def __init__(self, sbert_dim, llama_hidden_dim, prompt_length=16):
        super().__init__()
        self.prompt_length = prompt_length
        self.llama_hidden_dim = llama_hidden_dim
        self.projection = nn.Sequential(
            nn.Linear(sbert_dim, llama_hidden_dim * 2),
            nn.LayerNorm(llama_hidden_dim * 2),
            nn.GELU(),
            nn.Dropout(0.1),
            nn.Linear(llama_hidden_dim * 2, llama_hidden_dim * prompt_length),
            nn.LayerNorm(llama_hidden_dim * prompt_length),
        )

    def forward(self, sbert_emb):
        B = sbert_emb.size(0)
        out = self.projection(sbert_emb)
        return out.view(B, self.prompt_length, self.llama_hidden_dim)

device = "cuda" if torch.cuda.is_available() else "cpu"

prompt_gen = Sbert2Prompt(
    config["embedding_dim"],
    config["llama_hidden_dim"],
    config["prompt_length"]
)
prompt_gen.load_state_dict(torch.load(f"{model_dir}/prompt_generator.pt", map_location=device))
prompt_gen = prompt_gen.to(device).eval()

# Load LoRA-adapted LLM
tokenizer = AutoTokenizer.from_pretrained(f"{model_dir}/model")
base_model = AutoModelForCausalLM.from_pretrained(
    config["model_name"], torch_dtype=torch.float16, device_map=device
)
model = PeftModel.from_pretrained(base_model, f"{model_dir}/model")
model.eval()

# Generate summaries for a cluster of abstracts
abstracts = [
    "We studied the role of gut microbiota in inflammatory bowel disease...",
    "Our findings demonstrate that fecal microbiota transplantation can...",
    "Metagenomic analysis revealed significant dysbiosis patterns in..."
]
combined_text = " ".join(abstracts)

# Encode with SBERT
embedding = sbert.encode([combined_text], convert_to_tensor=True).to(device)

# Generate soft prompts
with torch.no_grad():
    soft_prompts = prompt_gen(embedding.float())

# Build generation prompt with theme instruction
theme_instruction = (
    "Provide a comprehensive overview covering key findings, "
    "methodology, significance, and broader context."
)

for label in config["labels"]:
    generation_prompt = (
        f"<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n"
        f"You are a scientific summarization assistant. {theme_instruction}\n"
        f"<|eot_id|><|start_header_id|>user<|end_header_id|>\n"
        f"Summarize the following research cluster.\n"
        f"Source: {combined_text[:2000]}\n"
        f"<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n"
        f"{label}: "
    )

    input_ids = tokenizer(generation_prompt, return_tensors="pt").input_ids.to(device)
    input_embeds = model.get_input_embeddings()(input_ids)

    # Prepend soft prompts
    input_embeds = torch.cat([soft_prompts.half(), input_embeds], dim=1)
    attention_mask = torch.ones(input_embeds.shape[:2], device=device)

    with torch.no_grad():
        outputs = model.generate(
            inputs_embeds=input_embeds,
            attention_mask=attention_mask,
            max_new_tokens=200 if label == "ABSTRACT" else 80,
            temperature=0.7,
            do_sample=True,
            top_p=0.9,
            repetition_penalty=1.15,
        )

    result = tokenizer.decode(outputs[0], skip_special_tokens=True)
    print(f"{label}: {result}")
```

## File Structure

```
BSG_CyLlama/
  bsg_cyllama_logo.png      # Logo
  config.json                # Model configuration
  prompt_generator.pt        # Sbert2Prompt weights (265 MB)
  model/
    adapter_config.json      # LoRA adapter configuration
    adapter_model.safetensors # LoRA weights (173 MB)
    tokenizer.json           # Tokenizer
    tokenizer_config.json    # Tokenizer config
    special_tokens_map.json  # Special tokens
    chat_template.jinja      # Chat template
```

## Requirements

```
torch>=2.0
transformers>=4.40
peft>=0.10
sentence-transformers>=2.0
huggingface-hub
```

## License

This model is released under the [Llama 3.2 Community License](https://ai.meta.com/llama/license/).

## Citation

```bibtex
@software{bsg_cyllama_2026,
  title={BSG CyLlama: Corpus-Level Scientific Summarization},
  author={O'Neill, Jim},
  year={2026},
  url={https://huggingface.co/jimnoneill/BSG_CyLlama},
  version={2.0.0}
}
```

## Related

- **Training Data**: [jimnoneill/BSG_CyLlama-training](https://huggingface.co/datasets/jimnoneill/BSG_CyLlama-training)
- **Base Model**: [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)
- **Encoder**: [thenlper/gte-large](https://huggingface.co/thenlper/gte-large)