# eurollm:9b-teletype -- Ollama Modelfile # # Applies the LoRA adapter over the base as a GGUF adapter, so the base is # pulled (not redistributed) and this artifact stays small. The base's own # default ChatML template and EOS are used: ccpty serves via the OpenAI protocol # and the endpoint renders with the model's default template, so train and serve # must both use that default. # # Convert the PEFT adapter to GGUF (llama.cpp): # python llama.cpp/convert_lora_to_gguf.py . \ # --base utter-project/EuroLLM-9B-Instruct \ # --outfile teletype-lora-f16.gguf # then: # ollama create eurollm-teletype -f Modelfile # # Alternative: merge first (PeftModel.merge_and_unload over the fp16 base), # convert to a single quantized GGUF, and use `FROM ./merged-q4_k_m.gguf` with # no ADAPTER line. Standalone but a full ~5.5GB upload. # Build eurollm:9b-instruct first from the base repo (tiararodney/EuroLLM-9B-Instruct) # Modelfile, which sets EuroLLM's ChatML template + <|im_end|> stops. A bare # `FROM ./gguf` base does NOT carry them and the model never stops. This adapter # Modelfile inherits the base's template/stops; do not override them here. FROM eurollm:9b-instruct ADAPTER ./teletype-lora-f16.gguf # Operate deterministically -- this is a shell driver, not a chat partner. PARAMETER temperature 0.2 PARAMETER num_ctx 4096