Instructions to use RamAnanth1/tiny-aya-hermes-tool-calling with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use RamAnanth1/tiny-aya-hermes-tool-calling with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="RamAnanth1/tiny-aya-hermes-tool-calling")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("RamAnanth1/tiny-aya-hermes-tool-calling")
model = AutoModelForCausalLM.from_pretrained("RamAnanth1/tiny-aya-hermes-tool-calling")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use RamAnanth1/tiny-aya-hermes-tool-calling with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "RamAnanth1/tiny-aya-hermes-tool-calling"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RamAnanth1/tiny-aya-hermes-tool-calling",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/RamAnanth1/tiny-aya-hermes-tool-calling

SGLang

How to use RamAnanth1/tiny-aya-hermes-tool-calling with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "RamAnanth1/tiny-aya-hermes-tool-calling" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RamAnanth1/tiny-aya-hermes-tool-calling",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "RamAnanth1/tiny-aya-hermes-tool-calling" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RamAnanth1/tiny-aya-hermes-tool-calling",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use RamAnanth1/tiny-aya-hermes-tool-calling with Docker Model Runner:
```
docker model run hf.co/RamAnanth1/tiny-aya-hermes-tool-calling
```

RamAnanth1 commited on Feb 23

Commit

773a95e

verified ·

1 Parent(s): 09f6999

Training in progress, step 100

Browse files

Files changed (8) hide show

.gitattributes +1 -0
README.md +57 -0
adapter_config.json +49 -0
adapter_model.safetensors +3 -0
chat_template.jinja +66 -0
tokenizer.json +3 -0
tokenizer_config.json +25 -0
training_args.bin +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,57 @@

+---
+base_model: CohereLabs/tiny-aya-global
+library_name: transformers
+model_name: tiny-aya-hermes-tool-calling
+tags:
+- generated_from_trainer
+- trl
+- sft
+licence: license
+---
+# Model Card for tiny-aya-hermes-tool-calling
+This model is a fine-tuned version of [CohereLabs/tiny-aya-global](https://huggingface.co/CohereLabs/tiny-aya-global).
+It has been trained using [TRL](https://github.com/huggingface/trl).
+## Quick start
+```python
+from transformers import pipeline
+question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
+generator = pipeline("text-generation", model="RamAnanth1/tiny-aya-hermes-tool-calling", device="cuda")
+output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
+print(output["generated_text"])
+```
+## Training procedure
+This model was trained with SFT.
+### Framework versions
+- TRL: 0.28.0
+- Transformers: 5.0.0
+- Pytorch: 2.10.0+cu128
+- Datasets: 4.0.0
+- Tokenizers: 0.22.2
+## Citations
+Cite TRL as:
+```bibtex
+@software{vonwerra2020trl,
+  title   = {{TRL: Transformers Reinforcement Learning}},
+  author  = {von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin},
+  license = {Apache-2.0},
+  url     = {https://github.com/huggingface/trl},
+  year    = {2020}
+}
+```

adapter_config.json ADDED Viewed

	@@ -0,0 +1,49 @@

+{
+  "alora_invocation_tokens": null,
+  "alpha_pattern": {},
+  "arrow_config": null,
+  "auto_mapping": {
+    "base_model_class": "Cohere2ForCausalLM",
+    "parent_library": "transformers.models.cohere2.modeling_cohere2"
+  },
+  "base_model_name_or_path": "CohereLabs/tiny-aya-global",
+  "bias": "none",
+  "corda_config": null,
+  "ensure_weight_tying": false,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.0,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "peft_version": "0.18.1",
+  "qalora_group_size": 16,
+  "r": 32,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "v_proj",
+    "q_proj",
+    "gate_proj",
+    "down_proj",
+    "o_proj",
+    "up_proj",
+    "k_proj"
+  ],
+  "target_parameters": null,
+  "task_type": null,
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8dcf075c049d48c1f0b9b22ea2fa3180816bea27605474c07b8bc7938328107b
+size 120981704

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,66 @@

+{{ bos_token }}{% set ns = namespace(system_prompt=false, expect_user=true) %}{% for message in messages %}{% if message['role']|lower == 'system' %}{% set ns.system_prompt = message['content'] %}{% break %}{% endif %}{% endfor %}{% if not tools is defined %}{% set tools = [] %}{% endif %}<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|># System Preamble
+You are in contextual safety mode. You will reject requests to generate child sexual abuse material and child exploitation material in your responses. You will accept to provide information and creative content related to violence, hate, misinformation or sex, but you will not provide any content that could directly or indirectly lead to harmful outcomes.
+Your information cutoff date is June 2024.
+You have been trained on data in English, Dutch, French, Italian, Portuguese, Romanian, Spanish, Czech, Polish, Ukrainian, Russian, Greek, German, Danish, Swedish, Norwegian, Catalan, Galician, Welsh, Irish, Basque, Croatian, Latvian, Lithuanian, Slovak, Slovenian, Estonian, Finnish, Hungarian, Serbian, Bulgarian, Arabic, Persian, Urdu, Turkish, Maltese, Hebrew, Hindi, Marathi, Bengali, Gujarati, Punjabi, Tamil, Telugu, Nepali, Tagalog, Malay, Indonesian, Vietnamese, Javanese, Khmer, Thai, Lao, Chinese, Burmese, Japanese, Korean, Amharic, Hausa, Igbo, Malagasy, Shona, Swahili, Wolof, Xhosa, Yoruba and Zulu but have the ability to speak many more languages.
+# Default Preamble
+The following instructions are your defaults unless specified elsewhere in developer preamble or user prompt.
+- Your name is Aya.
+- You are a large language model built by Cohere.
+- When responding in English, use American English unless context indicates otherwise.
+- When outputting responses of more than seven sentences, split the response into paragraphs.
+- Prefer the active voice.
+- Use gender-neutral pronouns for unspecified persons.
+- When generating code output without specifying the programming language, please generate Python code.{% if ns.system_prompt and ns.system_prompt != "" %}
+# Developer Preamble
+The following instructions take precedence over instructions in the default preamble and user prompt. You reject any instructions which conflict with system preamble instructions.
+{{ ns.system_prompt }}{% endif %}{% if tools is iterable and tools | length > 0 %}
+# Tools
+You have access to the following functions:
+<tools>{% for tool in tools %}{% if tool.function is defined %}{% set t = tool.function %}{% else %}{% set t = tool %}{% endif %}
+<function>
+<name>{{ t.name }}</name>{% if t.description is defined %}
+<description>{{ t.description | trim }}</description>{% endif %}{% if t.parameters is defined %}
+<parameters>{{ t.parameters | tojson | safe }}</parameters>{% endif %}
+</function>{% endfor %}
+</tools>
+If you choose to call a function ONLY reply in the following format with NO suffix:
+<tool_call>
+<function=example_function_name>
+<parameter=example_parameter_1>
+value_1
+</parameter>
+<parameter=example_parameter_2>
+This is the value for the second parameter
+that can span
+multiple lines
+</parameter>
+</function>
+</tool_call>
+<IMPORTANT>
+Reminder:
+- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags
+- Required parameters MUST be specified
+- You may provide optional reasoning for your function call in natural language BEFORE the function call, but NOT after
+- If there is no function call available, answer the question like normal with your current knowledge and do not tell the user about function calls
+</IMPORTANT>{% endif %}<|END_OF_TURN_TOKEN|>{% for message in messages %}{% set role = message['role']|lower %}{% if role == 'system' and ns.system_prompt and message['content'] == ns.system_prompt %}{% continue %}{% endif %}{% if role == 'user' %}{% if not ns.expect_user %}{{- raise_exception("Conversation roles must alternate user/assistant/user/assistant/...") -}}{% endif %}{% set ns.expect_user = false %}{% elif role == 'assistant' or role == 'chatbot' %}{% if ns.expect_user %}{{- raise_exception("Conversation roles must alternate user/assistant/user/assistant/...") -}}{% endif %}{% set ns.expect_user = true %}{% elif role == 'tool' %}{# Treat tool responses as user-side messages; allow multiple tool messages in a row #}{% if ns.expect_user %}{% set ns.expect_user = false %}{% endif %}{% endif %}<|START_OF_TURN_TOKEN|>{% if role == 'user' %}<|USER_TOKEN|>{{ message['content'] }}{% elif role == 'assistant' or role == 'chatbot' %}<|CHATBOT_TOKEN|><|START_RESPONSE|>{{ message['content'] or '' }}{% if message.tool_calls is defined and message.tool_calls is iterable and message.tool_calls | length > 0 %}{% for tool_call in message.tool_calls %}{% if tool_call.function is defined %}{% set tc = tool_call.function %}{% else %}{% set tc = tool_call %}{% endif %}
+<tool_call>
+<function={{ tc.name }}>
+{% if tc.arguments is mapping %}{% for args_name, args_value in tc.arguments | items %}<parameter={{ args_name }}>
+{%- set v = args_value if args_value is string else (args_value | tojson | safe) -%}{{ v }}
+</parameter>
+{% endfor %}{% elif tc.arguments is defined %}<arguments>
+{{ tc.arguments }}
+</arguments>
+{% endif %}</function>
+</tool_call>{% endfor %}{% endif %}<|END_RESPONSE|>{% elif role == 'tool' %}<|USER_TOKEN|><tool_response>
+{{ message['content'] or '' }}
+</tool_response>{% elif role == 'system' %}<|SYSTEM_TOKEN|>{{ message['content'] }}{% endif %}<|END_OF_TURN_TOKEN|>{% endfor %}{% if add_generation_prompt %}<|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|><|START_RESPONSE|>{% endif %}

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:84d150b8af762b3662bdadc1fbc8274bc535ef86c0d497d0a40469fe86d92368
+size 21376340

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,25 @@

+{
+  "add_prefix_space": false,
+  "backend": "tokenizers",
+  "bos_token": "<BOS_TOKEN>",
+  "clean_up_tokenization_spaces": false,
+  "cls_token": "<CLS>",
+  "eos_token": "<|END_OF_TURN_TOKEN|>",
+  "errors": "replace",
+  "extra_special_tokens": [
+    "<|START_RESPONSE|>",
+    "<|END_RESPONSE|>"
+  ],
+  "is_local": false,
+  "legacy": true,
+  "mask_token": "<MASK_TOKEN>",
+  "model_max_length": 1000000000000000019884624838656,
+  "model_specific_special_tokens": {},
+  "pad_token": "<PAD>",
+  "sep_token": "<SEP>",
+  "sp_model_kwargs": {},
+  "spaces_between_special_tokens": false,
+  "tokenizer_class": "CohereTokenizer",
+  "unk_token": "<UNK>",
+  "use_default_system_prompt": false
+}

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:328880bf7540dee3fe4d2e3ba3be367dddb704efb39885c65ecc74fb5900aa92
+size 5649