Instructions to use NucleusAI/nucleus-22B-token-500B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use NucleusAI/nucleus-22B-token-500B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="NucleusAI/nucleus-22B-token-500B")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("NucleusAI/nucleus-22B-token-500B")
model = AutoModelForCausalLM.from_pretrained("NucleusAI/nucleus-22B-token-500B")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use NucleusAI/nucleus-22B-token-500B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "NucleusAI/nucleus-22B-token-500B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "NucleusAI/nucleus-22B-token-500B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/NucleusAI/nucleus-22B-token-500B

SGLang

How to use NucleusAI/nucleus-22B-token-500B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "NucleusAI/nucleus-22B-token-500B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "NucleusAI/nucleus-22B-token-500B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "NucleusAI/nucleus-22B-token-500B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "NucleusAI/nucleus-22B-token-500B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use NucleusAI/nucleus-22B-token-500B with Docker Model Runner:
```
docker model run hf.co/NucleusAI/nucleus-22B-token-500B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

🚀 Nucleus-22B-token-500B

Nucleus-22B-token-500B is a 22B parameters causal decoder-only model built by Nucleus.AI and trained on 500B tokens of RefinedWeb along with curated corpora. It is made available under the MIT license.

1T-token model coming soon 😊.

What about Nucleus-22B-token-500B?

It performs well compared to similar-size open-source models (e.g., MPT-7B, StableLM, RedPajama etc.), thanks to being trained on 500B tokens of RefinedWeb enhanced with curated corpora. See the OpenLLM Leaderboard.
It is made available under an MIT license.
It is trained by a small team of four passionate for Open Source

⚠️ This is a raw, pretrained model, which should be further finetuned for most usecases.

Model Card for Nucleus-22B-token-500B

Model Details

Model Description

Developed by: NucleusAI;
Model type: Causal decoder-only;
Language(s) (NLP): English;
License: MIT.

Model Source

Paper: coming soon.

Uses

Direct Use

Research on large language models; as a foundation for further specialization and finetuning for specific usecases (e.g., summarization, text generation, chatbot, etc.)

Out-of-Scope Use

Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful.

Bias, Risks, and Limitations

Nucleus-22B-token-500B is trained on English data only, and will not generalize appropriately to other languages. Furthermore, as it is trained on a large-scale corpora representative of the web, it will carry the stereotypes and biases commonly encountered online.

Recommendations

We recommend users of Nucleus-22B-token-500B to consider finetuning it for the specific set of tasks of interest, and for guardrails and appropriate precautions to be taken for any production use.

How to Get Started with the Mode

Training Details

Training Data

Nucleus-22B-token-500B was trained on 500B tokens of RefinedWeb, along with other corpora.

Data source	Fraction	Tokens	Sources
RefinedWeb-English	75%	200B	massive web crawl
Books	7%	21B
Code	7%	21B	Big Code, CodeNet
Technical	6%	19B	arXiv
Math	5%	17B	Mathematica, Khan Academy

The data was tokenized with the tokenizer similar to Llama-7B.

Training Procedure

Nucleus-22B-token-500B was trained on 256 A100 80GB GPUs, using a FSDP

Training Hyperparameters

Hyperparameter	Value	Comment
Precision	`bfloat16`
Optimizer	AdamW
Learning rate	2e-4	8B tokens warm-up, cosine decay to 1.e-5
Weight decay	1e-1
Batch size	2048	constant
Context length	2048	constant

Speeds, Sizes, Times

Training happened in early August 2023 and took about two weeks.

Downloads last month: 107

Safetensors

Model size

22B params

Tensor type

F32

Model tree for NucleusAI/nucleus-22B-token-500B

Quantizations

4 models

NucleusAI
/

nucleus-22B-token-500B