Instructions to use mlx-community/quantized-gemma-2b-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use mlx-community/quantized-gemma-2b-it with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="mlx-community/quantized-gemma-2b-it") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("mlx-community/quantized-gemma-2b-it") model = AutoModelForCausalLM.from_pretrained("mlx-community/quantized-gemma-2b-it") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - MLX
How to use mlx-community/quantized-gemma-2b-it with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("mlx-community/quantized-gemma-2b-it") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- vLLM
How to use mlx-community/quantized-gemma-2b-it with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "mlx-community/quantized-gemma-2b-it" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mlx-community/quantized-gemma-2b-it", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/mlx-community/quantized-gemma-2b-it
- SGLang
How to use mlx-community/quantized-gemma-2b-it with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "mlx-community/quantized-gemma-2b-it" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mlx-community/quantized-gemma-2b-it", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "mlx-community/quantized-gemma-2b-it" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mlx-community/quantized-gemma-2b-it", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - MLX LM
How to use mlx-community/quantized-gemma-2b-it with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "mlx-community/quantized-gemma-2b-it"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "mlx-community/quantized-gemma-2b-it" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mlx-community/quantized-gemma-2b-it", "messages": [ {"role": "user", "content": "Hello"} ] }' - Docker Model Runner
How to use mlx-community/quantized-gemma-2b-it with Docker Model Runner:
docker model run hf.co/mlx-community/quantized-gemma-2b-it
error while using the model with transformers library
Sorry I am still discovering mlx but I tried to run the quantized version of gemma 2b-it :
model_id = "mlx-community/quantized-gemma-2b-it"
model = AutoModelForCausalLM.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)
and got this error :
running this code causes this issue with the transformers library causes this issue :
AttributeError Traceback (most recent call last)
Cell In[20], line 8
5 model_id = "mlx-community/quantized-gemma-2b-it"
6 # model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map={"":0})
7 # okenizer = AutoTokenizer.from_pretrained(model_id, add_eos_token=True)
----> 8 model = AutoModelForCausalLM.from_pretrained(model_id)
9 tokenizer = AutoTokenizer.from_pretrained(model_id)
File /opt/conda/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:561, in _BaseAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
559 elif type(config) in cls._model_mapping.keys():
560 model_class = _get_model_class(config, cls._model_mapping)
--> 561 return model_class.from_pretrained(
562 pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
563 )
564 raise ValueError(
565 f"Unrecognized configuration class {config.class} for this kind of AutoModel: {cls.name}.\n"
566 f"Model type should be one of {', '.join(c.name for c in cls._model_mapping.keys())}."
567 )
File /opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py:3287, in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, **kwargs)
3284 with safe_open(resolved_archive_file, framework="pt") as f:
3285 metadata = f.metadata()
-> 3287 if metadata.get("format") == "pt":
3288 pass
3289 elif metadata.get("format") == "tf":
AttributeError: 'NoneType' object has no attribute 'get'
How to solve it please ?
Hello, how do you solved this?