Text Generation
Transformers
Safetensors
Upper Grand Valley Dani
llama
genomic
text-generation-inference
Instructions to use HuggingFaceBio/Carbon-3B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use HuggingFaceBio/Carbon-3B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="HuggingFaceBio/Carbon-3B")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("HuggingFaceBio/Carbon-3B") model = AutoModelForCausalLM.from_pretrained("HuggingFaceBio/Carbon-3B") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use HuggingFaceBio/Carbon-3B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "HuggingFaceBio/Carbon-3B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HuggingFaceBio/Carbon-3B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/HuggingFaceBio/Carbon-3B
- SGLang
How to use HuggingFaceBio/Carbon-3B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "HuggingFaceBio/Carbon-3B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HuggingFaceBio/Carbon-3B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "HuggingFaceBio/Carbon-3B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HuggingFaceBio/Carbon-3B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use HuggingFaceBio/Carbon-3B with Docker Model Runner:
docker model run hf.co/HuggingFaceBio/Carbon-3B
docs: fix dtype, remove trust_remote_code for model, clean up internal comments
Browse files
README.md
CHANGED
|
@@ -57,13 +57,10 @@ import torch
|
|
| 57 |
|
| 58 |
repo = "HuggingFaceBio/Carbon-3B"
|
| 59 |
|
| 60 |
-
# Tokenizer needs trust_remote_code for the DNA-specific logic
|
| 61 |
tok = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
|
| 62 |
-
|
| 63 |
-
# Model is standard Llama-family β no trust_remote_code needed
|
| 64 |
model = AutoModelForCausalLM.from_pretrained(
|
| 65 |
repo,
|
| 66 |
-
|
| 67 |
).cuda().eval()
|
| 68 |
|
| 69 |
# Wrap a DNA prompt with the <dna> tag (the model is trained with this format).
|
|
@@ -133,41 +130,6 @@ def score(seq: str) -> float:
|
|
| 133 |
targets = ids[:, 1:]
|
| 134 |
logp = F.log_softmax(logits.float(), dim=-1).gather(-1, targets.unsqueeze(-1)).squeeze(-1)
|
| 135 |
return logp.mean().item()
|
| 136 |
-
|
| 137 |
-
# QY: not sure if we still want to keep this per-token log-probabilities score function,
|
| 138 |
-
# because we now have a more elegant one in modeling_carbon.py:
|
| 139 |
-
import torch
|
| 140 |
-
from transformers import AutoTokenizer, AutoModelForCausalLM
|
| 141 |
-
|
| 142 |
-
repo = "HuggingFaceBio/Carbon-3B"
|
| 143 |
-
|
| 144 |
-
# Load tokenizer and model
|
| 145 |
-
tok = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
|
| 146 |
-
model = AutoModelForCausalLM.from_pretrained(
|
| 147 |
-
repo,
|
| 148 |
-
torch_dtype=torch.bfloat16,
|
| 149 |
-
trust_remote_code=True
|
| 150 |
-
).cuda().eval()
|
| 151 |
-
|
| 152 |
-
# Setup tokenizer for bp-level scoring (required for score_sequence)
|
| 153 |
-
model.setup_tokenizer(tok)
|
| 154 |
-
|
| 155 |
-
# Score sequences - automatically handles BOS token and padding
|
| 156 |
-
sequences = ["ATCG" * 1024, "ACAT" * 2048]
|
| 157 |
-
bp_probs_list, actual_probs_list = model.score_sequence(sequences)
|
| 158 |
-
|
| 159 |
-
# bp_probs_list: list of [seq_len_i, 4] tensors - probability distribution over A/T/C/G at each position
|
| 160 |
-
# actual_probs_list: list of [seq_len_i] tensors - probability of the actual base at each position
|
| 161 |
-
|
| 162 |
-
# Compute metrics for each sequence
|
| 163 |
-
for i, (seq, actual_probs) in enumerate(zip(sequences, actual_probs_list)):
|
| 164 |
-
log_likelihood = actual_probs.log().mean().item() # Total log-likelihood
|
| 165 |
-
perplexity = torch.exp(-actual_probs.log().mean()).item() # Perplexity
|
| 166 |
-
|
| 167 |
-
print(f"Sequence {i+1} (length {len(seq)}):")
|
| 168 |
-
print(f" Mean log-likelihood: {log_likelihood:.2f}")
|
| 169 |
-
print(f" Perplexity: {perplexity:.4f}")
|
| 170 |
-
print(f" Mean probability: {actual_probs.mean().item():.4f}")
|
| 171 |
```
|
| 172 |
|
| 173 |
For batched scoring with attention masking and full reproducible evaluation pipelines (sequence recovery, ClinVar / BRCA2 / TraitGym VEP, TATA / synonymous-codon perturbation, Genome-NIAH), use the official scripts in the [Carbon evaluation directory](https://github.com/huggingface/carbon/tree/main/evaluation) β see [`perturbation_tasks.py`](https://github.com/huggingface/carbon/blob/main/evaluation/perturbation_tasks.py) for the canonical `score_hf` implementation and [`README.md`](https://github.com/huggingface/carbon/blob/main/evaluation/README.md) for run instructions across all tasks.
|
|
@@ -187,7 +149,7 @@ config.rope_scaling = {
|
|
| 187 |
"original_max_position_embeddings": 32768,
|
| 188 |
}
|
| 189 |
model = AutoModelForCausalLM.from_pretrained(
|
| 190 |
-
repo, config=config,
|
| 191 |
).cuda().eval()
|
| 192 |
```
|
| 193 |
|
|
@@ -202,8 +164,8 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
| 202 |
import torch
|
| 203 |
|
| 204 |
draft = AutoModelForCausalLM.from_pretrained(
|
| 205 |
-
"HuggingFaceBio/
|
| 206 |
-
|
| 207 |
).cuda().eval()
|
| 208 |
target = model # Carbon-3B, loaded above
|
| 209 |
|
|
@@ -256,8 +218,8 @@ Below we highlight the three short-context probes for which we report headline n
|
|
| 256 |
| | SYN v2 | <u>82.78</u> | 74.08 | **84.90** |
|
| 257 |
|
| 258 |
Carbon-3B is competitive with Evo2-7B while being much faster to run.
|
| 259 |
-
> TODO update TATA v2 and SYN v2 scores with
|
| 260 |
-
|
| 261 |
### Long-context retrieval (Genome-NIAH)
|
| 262 |
|
| 263 |
[Genome-NIAH](https://huggingface.co/datasets/HuggingFaceBio/genome-niah) is a long context benchmark, inspired from NIAH and RULER benchmarks for English. The model needs to retrieves a random 24 bp VALUE planted in a real-genome haystack at one of five depths, evaluated at six context lengths from 24 kbp to 786 kbp. The benchmark contains 500 examples per (task, context) cell.
|
|
@@ -270,7 +232,7 @@ Below are the scores on `niah`:
|
|
| 270 |
| 64 k tokens (393 kbp) | β / 0.79 | β | 0.80 |
|
| 271 |
|
| 272 |
Sample sizes: Carbon & GENERator n=500. Evo2-7B n=150 at 16k, n=100 at 32k, n=20 at 64k due to the slow inference speed.
|
| 273 |
-
> TODO
|
| 274 |
|
| 275 |
- **4Γ longer effective context than Generator-v2-3B.** Generator-v2-3B caps at 16 k tokens (β 98 kbp). Carbon-3B has a native context of 32 k tokens (β 197 kbp) and extends to 64 k tokens (β 384 kbp) at inference time with YaRN. It matches Generator-v2-3B on `niah` at 98 kbp.
|
| 276 |
- **Matches Evo2-7B (1 M context) on `niah` at 384 kbp** (64 k tokens) under YaRN, despite being substantially smaller.
|
|
|
|
| 57 |
|
| 58 |
repo = "HuggingFaceBio/Carbon-3B"
|
| 59 |
|
|
|
|
| 60 |
tok = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
|
|
|
|
|
|
|
| 61 |
model = AutoModelForCausalLM.from_pretrained(
|
| 62 |
repo,
|
| 63 |
+
dtype=torch.bfloat16,
|
| 64 |
).cuda().eval()
|
| 65 |
|
| 66 |
# Wrap a DNA prompt with the <dna> tag (the model is trained with this format).
|
|
|
|
| 130 |
targets = ids[:, 1:]
|
| 131 |
logp = F.log_softmax(logits.float(), dim=-1).gather(-1, targets.unsqueeze(-1)).squeeze(-1)
|
| 132 |
return logp.mean().item()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 133 |
```
|
| 134 |
|
| 135 |
For batched scoring with attention masking and full reproducible evaluation pipelines (sequence recovery, ClinVar / BRCA2 / TraitGym VEP, TATA / synonymous-codon perturbation, Genome-NIAH), use the official scripts in the [Carbon evaluation directory](https://github.com/huggingface/carbon/tree/main/evaluation) β see [`perturbation_tasks.py`](https://github.com/huggingface/carbon/blob/main/evaluation/perturbation_tasks.py) for the canonical `score_hf` implementation and [`README.md`](https://github.com/huggingface/carbon/blob/main/evaluation/README.md) for run instructions across all tasks.
|
|
|
|
| 149 |
"original_max_position_embeddings": 32768,
|
| 150 |
}
|
| 151 |
model = AutoModelForCausalLM.from_pretrained(
|
| 152 |
+
repo, config=config, dtype=torch.bfloat16
|
| 153 |
).cuda().eval()
|
| 154 |
```
|
| 155 |
|
|
|
|
| 164 |
import torch
|
| 165 |
|
| 166 |
draft = AutoModelForCausalLM.from_pretrained(
|
| 167 |
+
"HuggingFaceBio/Carbon-500M",
|
| 168 |
+
dtype=torch.bfloat16,
|
| 169 |
).cuda().eval()
|
| 170 |
target = model # Carbon-3B, loaded above
|
| 171 |
|
|
|
|
| 218 |
| | SYN v2 | <u>82.78</u> | 74.08 | **84.90** |
|
| 219 |
|
| 220 |
Carbon-3B is competitive with Evo2-7B while being much faster to run.
|
| 221 |
+
> TODO: update TATA v2 and SYN v2 scores with the new results
|
| 222 |
+
|
| 223 |
### Long-context retrieval (Genome-NIAH)
|
| 224 |
|
| 225 |
[Genome-NIAH](https://huggingface.co/datasets/HuggingFaceBio/genome-niah) is a long context benchmark, inspired from NIAH and RULER benchmarks for English. The model needs to retrieves a random 24 bp VALUE planted in a real-genome haystack at one of five depths, evaluated at six context lengths from 24 kbp to 786 kbp. The benchmark contains 500 examples per (task, context) cell.
|
|
|
|
| 232 |
| 64 k tokens (393 kbp) | β / 0.79 | β | 0.80 |
|
| 233 |
|
| 234 |
Sample sizes: Carbon & GENERator n=500. Evo2-7B n=150 at 16k, n=100 at 32k, n=20 at 64k due to the slow inference speed.
|
| 235 |
+
> TODO: run more 64k samples for Evo2 7B
|
| 236 |
|
| 237 |
- **4Γ longer effective context than Generator-v2-3B.** Generator-v2-3B caps at 16 k tokens (β 98 kbp). Carbon-3B has a native context of 32 k tokens (β 197 kbp) and extends to 64 k tokens (β 384 kbp) at inference time with YaRN. It matches Generator-v2-3B on `niah` at 98 kbp.
|
| 238 |
- **Matches Evo2-7B (1 M context) on `niah` at 384 kbp** (64 k tokens) under YaRN, despite being substantially smaller.
|