Instructions to use AIFS/Prometh-MOEM-24B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use AIFS/Prometh-MOEM-24B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="AIFS/Prometh-MOEM-24B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("AIFS/Prometh-MOEM-24B")
model = AutoModelForCausalLM.from_pretrained("AIFS/Prometh-MOEM-24B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use AIFS/Prometh-MOEM-24B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "AIFS/Prometh-MOEM-24B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AIFS/Prometh-MOEM-24B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/AIFS/Prometh-MOEM-24B

SGLang

How to use AIFS/Prometh-MOEM-24B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "AIFS/Prometh-MOEM-24B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AIFS/Prometh-MOEM-24B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "AIFS/Prometh-MOEM-24B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AIFS/Prometh-MOEM-24B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use AIFS/Prometh-MOEM-24B with Docker Model Runner:
```
docker model run hf.co/AIFS/Prometh-MOEM-24B
```

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Prometh-MOEM-24B Model Card

Prometh-MOEM-24B is a Mixture of Experts (MoE) model that integrates multiple foundational models to deliver enhanced performance across a spectrum of tasks. It harnesses the combined strengths of its constituent models, optimizing for accuracy, speed, and versatility.

Model Sources and Components

This MoE model incorporates the following specialized models:

Language translation
Question answering

💻Usage Instructions

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

# Load the model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("AIFS/Prometh-MOEM-24B")
model = AutoModelForCausalLM.from_pretrained("AIFS/Prometh-MOEM-24B")

# Set up the pipeline
text_generator = pipeline("text-generation", model=model, tokenizer=tokenizer)

# Generate text
prompt = "The future of AI in healthcare is"
generated_texts = text_generator(prompt, max_length=50, num_return_sequences=3)

for generated_text in generated_texts:
    print(generated_text["generated_text"])

Technical Specifications

Advanced Optimization

Quantization and Fine-Tuning: Prometh-MOEM-24B can be fine tuned, offering pathways for both quantization and fine-tuning. These processes refine the model's performance and efficiency, catering to the nuanced demands of deployment environments.

Quantization

Quantization is a technique aimed at reducing the computational and memory burdens of model inference. It achieves this feat by transitioning from high-precision data types, like 32-bit floating point (float32), to more compact and efficient formats, such as 8-bit integers (int8). This transition not only shrinks the model's memory footprint but also accelerates its operational pace, making it more viable for embedded systems or devices with limited computational resources.

Benefits:
Application:
- Prometh-MOEM-24B can be quantized post-training, adjusting to int8 without retraining from scratch. This method preserves the essence of its intelligence while adapting to the practical constraints of deployment environments.

Fine-Tuning

Beyond quantization, the model is primed for fine-tuning, allowing it to adapt to specific tasks or datasets with increased precision. This process involves additional training cycles on new data, thereby enhancing its acumen for particular applications.

Customization: Tailors the model to specialized needs, optimizing its performance on tasks it was not originally designed for.
Versatility: Ensures the model remains relevant and effective across a diverse array of use cases.

Model Details and Attribution

Developed by: [Iago Gaspar]
Shared by: [AI Flow Solutions]
Model type: Mixture of Experts Model
Language(s) (NLP): en-en
License: Apache-2.0

Environmental Impact

Out-of-Scope Use

The model is not intended for generating harmful or biased content.

Bias, Risks, and Limitations

Recommendations

Users should evaluate the model for biases and other ethical considerations before deploying it for real-world applications.

Downloads last month: -

Safetensors

Model size

24B params

Tensor type

BF16

AIFS
/

Prometh-MOEM-24B