--- license: apache-2.0 base_model: EssentialAI/rnj-1-instruct --- # hex-toolcall-base-v2 WIP derivative checkpoint of [EssentialAI/rnj-1-instruct](https://huggingface.co/EssentialAI/rnj-1-instruct). Baseline for tool-calling experiments. ## Training Config ```toml max_steps = 140 seq_len = 8192 [model] name = "EssentialAI/rnj-1-instruct" [wandb] project = "xml-tool-thinking" name = "hex-v6-bs512-rollouts16" [orchestrator.wandb.log_extras] samples = true interval = 1 [orchestrator] batch_size = 512 rollouts_per_example = 16 [orchestrator.sampling] max_tokens = 512 temperature = 1.0 [[orchestrator.env]] id = "hex_encode_xml" args = { max_turns = 5, max_chunk = 128, strict_format = true, user_prompt = """Here is a document with semantic XML tags: {doc} Your task: Encode the content of the <{tag_name}> tag to hexadecimal. You must encode in chunks of at most {max_chunk_size} characters at a time. Available tools: - get_tag_content: Get the target tag's text and length. No arguments. - encode_chunk: Encode a character range to hex. Args: start (int), end (int) - Every response must begin with [think] - After [/think], include your tool call - No text outside of [think]...[/think] and ... Tool format: tool_name value When done, output ONLY the final hex string with no tool calls. Example: [think]I need to get the content first.[/think] get_tag_content """ } [trainer.model] ac = { freq = 1 } [trainer.optim] lr = 1e-6 max_norm = 0.001 [trainer.scheduler] type = "linear" warmup_steps = 30 decay_steps = 30 min_lr = 0 [inference.parallel] tp = 4 [ckpt] ```