Tiara Rodney

release(2.0.1): Ollama serving-docs fix (base needs ChatML + stops)

80c0c1a unverified 8 days ago

1.35 kB

	# eurollm:9b-teletype -- Ollama Modelfile
	#
	# Applies the LoRA adapter over the base as a GGUF adapter, so the base is
	# pulled (not redistributed) and this artifact stays small. The base's own
	# default ChatML template and EOS are used: ccpty serves via the OpenAI protocol
	# and the endpoint renders with the model's default template, so train and serve
	# must both use that default.
	#
	# Convert the PEFT adapter to GGUF (llama.cpp):
	# python llama.cpp/convert_lora_to_gguf.py . \
	# --base utter-project/EuroLLM-9B-Instruct \
	# --outfile teletype-lora-f16.gguf
	# then:
	# ollama create eurollm-teletype -f Modelfile
	#
	# Alternative: merge first (PeftModel.merge_and_unload over the fp16 base),
	# convert to a single quantized GGUF, and use `FROM ./merged-q4_k_m.gguf` with
	# no ADAPTER line. Standalone but a full ~5.5GB upload.

	# Build eurollm:9b-instruct first from the base repo (tiararodney/EuroLLM-9B-Instruct)
	# Modelfile, which sets EuroLLM's ChatML template + <\|im_end\|> stops. A bare
	# `FROM ./gguf` base does NOT carry them and the model never stops. This adapter
	# Modelfile inherits the base's template/stops; do not override them here.
	FROM eurollm:9b-instruct
	ADAPTER ./teletype-lora-f16.gguf

	# Operate deterministically -- this is a shell driver, not a chat partner.
	PARAMETER temperature 0.2
	PARAMETER num_ctx 4096