[
    {
        "id": 0,
        "name": "load_image_from_string",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\n\ndef load_image_from_string(\n    reference: str,\n    max_height: Optional[int] = None,\n    max_width: Optional[int] = None,\n) -> Tuple[str, Optional[float]]:\n    if uri_is_http_link(uri=reference):\n        return load_image_from_url(\n            url=reference, max_height=max_height, max_width=max_width\n        )\n    if os.path.exists(reference):\n        if max_height is None or max_width is None:\n            with open(reference, \"rb\") as f:\n                img_bytes = f.read()\n            img_base64_str = encode_base_64(payload=img_bytes)\n            return img_base64_str, None\n        local_image = cv2.imread(reference)\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "        if local_image is None:\n            raise EncodingError(f\"Could not load image from {reference}\")\n        local_image, scaling_factor = resize_opencv_image(\n            image=local_image,\n            max_height=max_height,\n            max_width=max_width,\n        )\n        return numpy_array_to_base64_jpeg(image=local_image), scaling_factor\n    if max_height is not None and max_width is not None:\n        image_bytes = base64.b64decode(reference)\n        image = bytes_to_opencv_image(payload=image_bytes)\n        image, scaling_factor = resize_opencv_image(\n            image=image,\n            max_height=max_height,\n            max_width=max_width,\n        )\n        return numpy_array_to_base64_jpeg(image=image), scaling_factor\n    return reference, None",
        "completions": [
            "  ```\ndef load_image_from_string(\n    reference: str,\n    max_height: Optional[int] = None,\n    max_width: Optional[int] = None,\n) -> Tuple[str, Optional[float]]:\n    if uri_is_http_link(uri=reference):\n        return load_image_from_url(\n            url=reference, max_height=max_height, max_width=max_width\n        )\n    if os.path.exists(reference):\n        if max_height is None or max_width is None:\n            with open(reference, \"rb\") as f:\n                img_bytes = f.read()\n            img_base64_str = encode_base_64(payload=img_bytes)\n            return img_base64_str, None\n        local_image = cv2.imread(reference)\n        if max_height is not None and max_width is not None:\n            local_image = cv2.resize(local_image, (max_height, max_width))\n            return encode_base_64(payload=local_image), None\n```"
        ],
        "line": 29,
        "token": 249,
        "line_diff": 1,
        "token_diff": 1
    },
    {
        "id": 1,
        "name": "open_lm_config_to_gpt2_config",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\n\ndef open_lm_config_to_gpt2_config(open_lm_config: AutoConfig) -> GPT2Config:\n    # NOTE: rotary is applied to the head dimension instead of the sequence dimension (accident by open_lm team)\n    return GPT2Config(\n        vocab_size=open_lm_config.vocab_size,\n        n_positions=0,  # No absolute position embedding\n        n_embd=open_lm_config.hidden_dim,\n        n_layer=open_lm_config.n_layers,\n        n_head=open_lm_config.n_heads,\n        n_inner=256 * ((int(2 * 4 * open_lm_config.hidden_dim / 3) + 256 - 1) // 256),\n        activation_function=\"swiglu\",\n        resid_pdrop=0.0,  # No dropout\n        embd_pdrop=0.0,\n        attn_pdrop=0.0,\n        layer_norm_epsilon=1e-5,\n        # NOTE: this isn't correct, should look at code as it's scaled by depth according to: https://arxiv.org/abs/1908.11365\n        initializer_range=0.02,\n        bos_token_id=0,\n        eos_token_id=0,\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "        # These are new arguments not in the original GPT2Config\n        prenorm=True,\n        parallel_block=False,\n        parallel_block_tied_norm=False,\n        rotary_emb_fraction=1,\n        tie_word_embeddings=open_lm_config.weight_tying,\n        fused_dropout_add_ln=True,\n        fused_bias_fc=True,\n        use_flash_attn=True,\n        qkv_proj_bias=getattr(open_lm_config, \"qkv_proj_bias\", False),\n        rotary_emb_base=10000,\n        rotary_emb_scale_base=getattr(open_lm_config, \"rotary_emb_scale_base\", None),\n        rotary_emb_interleaved=getattr(open_lm_config, \"rotary_emb_interleaved\", False),\n        rotary_head_dim=getattr(open_lm_config, \"rotary_old\", False),\n        mlp_fc1_bias=getattr(open_lm_config, \"mlp_fc1_bias\", False),\n        mlp_fc2_bias=getattr(open_lm_config, \"mlp_fc2_bias\", False),\n        use_rms_norm=False,\n        causal=True,\n    )",
        "completions": [
            "  ```\ndef open_lm_config_to_gpt2_config(open_lm_config: AutoConfig) -> GPT2Config:\n    # NOTE: rotary is applied to the head dimension instead of the sequence dimension (accident by open_lm team)\n    return GPT2Config(\n        vocab_size=open_lm_config.vocab_size,\n        n_positions=0,  # No absolute position embedding\n        n_embd=open_lm_config.hidden_dim,\n        n_layer=open_lm_config.n_layers,\n        n_head=open_lm_config.n_heads,\n        n_inner=256 * ((int(2 * 4 * open_lm_config.hidden_dim / 3) + 256 - 1) // 256),\n        activation_function=\"swiglu\",\n        resid_pdrop=0.0,  # No dropout\n        embd_pdrop=0.0,\n        attn_pdrop=0.0,\n        layer_norm_epsilon=1e-5,\n        # NOTE: this isn't correct, should look at code as it's scaled by depth according to: https://arxiv.org/abs/1908.11365\n        initializer_range=0.02,\n        bos_token_id=0,\n        eos_token_id=0,\n    )\n```"
        ],
        "line": 31,
        "token": 267,
        "line_diff": 1,
        "token_diff": 1
    },
    {
        "id": 2,
        "name": "power",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\ndef power(a, b, third=None):\n    if third is not None:\n        raise MaskError(\"3-argument power not supported.\")\n    # Get the masks\n    ma = getmask(a)\n    mb = getmask(b)\n    m = mask_or(ma, mb)\n    # Get the rawdata\n    fa = getdata(a)\n    fb = getdata(b)\n    # Get the type of the result (so that we preserve subclasses)\n    if isinstance(a, MaskedArray):\n        basetype = type(a)\n    else:\n        basetype = MaskedArray\n    # Get the result and view it as a (subclass of) MaskedArray\n    with np.errstate(divide='ignore', invalid='ignore'):\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "        result = np.where(m, fa, umath.power(fa, fb)).view(basetype)\n    result._update_from(a)\n    # Find where we're in trouble w/ NaNs and Infs\n    invalid = np.logical_not(np.isfinite(result.view(ndarray)))\n    # Add the initial mask\n    if m is not nomask:\n        if not result.ndim:\n            return masked\n        result._mask = np.logical_or(m, invalid)\n    # Fix the invalid parts\n    if invalid.any():\n        if not result.ndim:\n            return masked\n        elif result._mask is nomask:\n            result._mask = invalid\n        result._data[invalid] = result.fill_value\n    return result",
        "completions": [
            "  ```\n    # Get the result and view it as a (subclass of) MaskedArray\n    with np.errstate(divide='ignore', invalid='ignore'):\n        result = np.power(fa, fb)\n    return basetype(result, mask=m)\n```"
        ],
        "line": 30,
        "token": 272,
        "line_diff": 1,
        "token_diff": 1
    },
    {
        "id": 3,
        "name": "get_enemy_pos",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\ndef get_enemy_pos(mm_info: MiniMapInfo) -> List[Point]:\n    enemy_mask = get_enemy_mask(mm_info)\n    # cv2_utils.show_image(enemy_mask, win_name='get_enemy_pos', wait=0)\n    cx = mm_info.origin.shape[1] // 2\n    cy = mm_info.origin.shape[0] // 2\n\n    # 膨胀一下找连通块\n    to_check = cv2_utils.dilate(enemy_mask, 5)\n    num_labels, labels, stats, centroids = cv2.connectedComponentsWithStats(to_check, connectivity=8)\n\n    pos_list: List[Point] = []\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "\n    if num_labels <= 1:  # 没有连通块 走到敌人附近了\n        return pos_list\n\n    # 找到最大的连通区域\n    for label in range(1, num_labels):\n        # 找到各个连通区域的中心点\n        center_x = int(centroids[label, 0])\n        center_y = int(centroids[label, 1])\n        pos_list.append(Point(center_x - cx, center_y - cy))\n\n    return pos_list",
        "completions": [
            "  ```\n    for i in range(1, num_labels):\n        x, y, w, h, area = stats[i]\n        if area > 100:\n            pos_list.append(Point(cx + x, cy + y))\n    return pos_list\n```"
        ],
        "line": 22,
        "token": 227,
        "line_diff": 0,
        "token_diff": 1
    },
    {
        "id": 4,
        "name": "plot_turbine_points",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\ndef plot_turbine_points(\n    fmodel: FlorisModel,\n    ax: plt.Axes = None,\n    turbine_indices: List[int] = None,\n    plotting_dict: Dict[str, Any] = {},\n) -> plt.Axes:\n\n    # Generate axis, if needed\n    if ax is None:\n        _, ax = plt.subplots()\n\n    # If turbine_indices is not none, make sure all elements correspond to real indices\n    if turbine_indices is not None:\n        try:\n            fmodel.layout_x[turbine_indices]\n        except IndexError:\n            raise IndexError(\"turbine_indices does not correspond to turbine indices in fi\")\n    else:\n        turbine_indices = list(range(len(fmodel.layout_x)))\n\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "    # Generate plotting dictionary\n    default_plotting_dict = {\n        \"color\": \"black\",\n        \"marker\": \".\",\n        \"markersize\": 10,\n        \"label\": None,\n    }\n    plotting_dict = {**default_plotting_dict, **plotting_dict}\n\n    # Plot\n    ax.plot(\n        fmodel.layout_x[turbine_indices],\n        fmodel.layout_y[turbine_indices],\n        linestyle=\"None\",\n        **plotting_dict,\n    )\n\n    # Make sure axis set to equal\n    ax.axis(\"equal\")\n\n    return ax",
        "completions": [
            "  ```\n    # Plot turbine points\n    for i in turbine_indices:\n        ax.plot(fmodel.layout_x[i], fmodel.layout_y[i], **plotting_dict)\n    return ax\n```"
        ],
        "line": 30,
        "token": 264,
        "line_diff": 1,
        "token_diff": 1
    },
    {
        "id": 5,
        "name": "build_shuffling_sampling_fn",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\ndef build_shuffling_sampling_fn(\n    real_client_ids: list[str],\n    canary_client_ids: list[str],\n    total_rounds: int,\n    num_real_epochs: int,\n    canary_repeats: int,\n    seed: int,\n) -> tuple[Callable[[int], list[str]], float]:\n  if not real_client_ids:\n    raise ValueError('`real_client_ids` cannot be empty.')\n  if total_rounds <= 0:\n    raise ValueError(\n        f'`total_rounds` must be greater than 0. Found `{total_rounds}`.'\n    )\n  if num_real_epochs <= 0:\n    raise ValueError(\n        f'`num_real_epochs` must be greater than 0. Found `{num_real_epochs}`.'\n    )\n  if canary_repeats < 0:\n    raise ValueError(\n        f'`canary_repeats` must be nonnegative. Found `{canary_repeats}`.'\n    )\n\n  rng = np.random.default_rng(seed)\n\n  num_real = len(real_client_ids)\n  real_order = rng.permutation(num_real)\n  num_canary = len(canary_client_ids)\n  canary_order = [rng.permutation(num_canary) for _ in range(canary_repeats)]\n\n  def sampling_fn(round_num: int) -> list[str]:\n    if not 1 <= round_num <= total_rounds:\n      raise ValueError(\n          f'round_num ({round_num}) must be between 1 and total_rounds'\n          f' ({total_rounds}).'\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "      )\n    # tff.simulation.run_training_process uses rounds 1 ... total_rounds.\n    # For the math here to work we need 0 ... (total_rounds - 1).\n    round_num -= 1\n\n    begin_real = math.ceil(\n        round_num * num_real * num_real_epochs / total_rounds\n    )\n    end_real = math.ceil(\n        (round_num + 1) * num_real * num_real_epochs / total_rounds\n    )\n    real_sample = [\n        real_client_ids[real_order[i % num_real]]\n        for i in range(begin_real, end_real)\n    ]\n\n    begin_canary = math.ceil(\n        round_num * num_canary * canary_repeats / total_rounds\n    )\n    end_canary = math.ceil(\n        (round_num + 1) * num_canary * canary_repeats / total_rounds\n    )\n    canary_sample = [\n        canary_client_ids[canary_order[i // num_canary][i % num_canary]]\n        for i in range(begin_canary, end_canary)\n    ]\n\n    return real_sample + canary_sample\n\n  mean_clients_per_round = (\n      len(real_client_ids) * num_real_epochs\n      + len(canary_client_ids) * canary_repeats\n  ) / total_rounds\n\n  return sampling_fn, mean_clients_per_round",
        "completions": [
            "  ```\ndef sampling_fn(round_num: int) -> list[str]:\n    if not 1 <= round_num <= total_rounds:\n        raise ValueError(\n            f'round_num ({round_num}) must be between 1 and total_rounds'\n            f' ({total_rounds}).'\n        )\n    real_idx = (round_num - 1) % num_real\n    canary_idx = (round_num - 1) // num_real\n    real_client_id = real_client_ids[real_order[real_idx]]\n    canary_client_ids = canary_client_ids[canary_order[canary_idx]]\n    return [real_client_id] + canary_client_ids\n```"
        ],
        "line": 45,
        "token": 391,
        "line_diff": 2,
        "token_diff": 2
    },
    {
        "id": 6,
        "name": "linear_to_mel_weight_matrix",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\ndef linear_to_mel_weight_matrix(\n    *,\n    num_filters: int,\n    num_spectrogram_bins: int,\n    sample_rate: float,\n    lower_edge_hertz: float,\n    upper_edge_hertz: float,\n    dtype: np.dtype = np.float64,\n) -> ArrayLike:\n    freq_scale = FrequencyScale.MEL_LN\n\n    # Compute mel spectrogram bins up to nyquist frequency. Drop the 0th bin.\n    linear_freqs = np.linspace(0, sample_rate // 2, num_spectrogram_bins)[1:].astype(dtype)\n    spectrogram_bins_mel = hertz_to_warped_scale(linear_freqs, freq_scale=freq_scale)[:, None]\n\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "    # Compute lower and upper bound of the output mel spectrum.\n    lower_edge_mel = hertz_to_warped_scale(lower_edge_hertz, freq_scale=freq_scale)\n    upper_edge_mel = hertz_to_warped_scale(upper_edge_hertz, freq_scale=freq_scale)\n\n    # Compute num_filters triples of (lower_edge, center, upper_edge).\n    idx = np.arange(num_filters)[:, None] + np.arange(3)[None, :]\n    band_edges_mel = np.linspace(lower_edge_mel, upper_edge_mel, num_filters + 2)[idx].astype(dtype)\n    # Split the triples up and reshape them into [1, num_filters] tensors.\n    lower_edge_mel, center_mel, upper_edge_mel = tuple(\n        np.reshape(t, [1, num_filters]) for t in np.split(band_edges_mel, 3, axis=1)\n    )\n    upper_slopes = (upper_edge_mel - spectrogram_bins_mel) / (upper_edge_mel - center_mel)\n    lower_slopes = (spectrogram_bins_mel - lower_edge_mel) / (center_mel - lower_edge_mel)\n\n    # Add back the 0th bin dropped above.\n    return np.pad(np.maximum(np.minimum(upper_slopes, lower_slopes), 0.0), [[1, 0], [0, 0]])",
        "completions": [
            "  ```\ndef hertz_to_warped_scale(freqs, freq_scale=FreqScale.MEL_LN):\n    if freq_scale == FreqScale.MEL_LN:\n        return np.log(1 + freqs / 700)\n    elif freq_scale == FreqScale.MEL_LN_SCALED:\n        return np.log(1 + freqs / 700) * 100\n    else:\n        raise ValueError(f\"Invalid frequency scale: {freq_scale}\")\n```"
        ],
        "line": 26,
        "token": 235,
        "line_diff": 1,
        "token_diff": 1
    },
    {
        "id": 7,
        "name": "iram",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\n\ndef iram(A: LinearOperator, start_vector: Array = None, max_iters: int = 100, tol: float = 1e-7, pbar: bool = False):\n    xnp = A.xnp\n    del pbar\n    del start_vector\n\n    def matvec(x):\n        X = xnp.array(x, dtype=A.dtype, device=A.device)\n        out = A @ X\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "        return np.array(out, dtype=np.float32)\n\n    A2 = LO(shape=A.shape, dtype=np.float32, matvec=matvec)\n    k = min(A.shape[0] - 1, max_iters)\n    eigvals, eigvecs = eigsh(A2, k=k, M=None, sigma=None, which=\"LM\", v0=None, tol=tol)\n    eigvals, eigvecs = xnp.array(eigvals, dtype=A.dtype,\n                                 device=A.device), xnp.array(eigvecs, dtype=A.dtype, device=A.device)\n    info = {}\n    return eigvals, Dense(eigvecs), info",
        "completions": [
            "  \n[PYTHON]\ndef iram(A: LinearOperator, start_vector: Array = None, max_iters: int = 100, tol: float = 1e-7, pbar: bool = False):\n    xnp = A.xnp\n    del pbar\n    del start_vector\n\n    def matvec(x):\n        X = xnp.array(x, dtype=A.dtype, device=A.device)\n        out = A @ X\n        return out\n[/PYTHON]\n"
        ],
        "line": 20,
        "token": 208,
        "line_diff": 0,
        "token_diff": 0
    },
    {
        "id": 8,
        "name": "model_name",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\n\ndef model_name(clazz: Union[type, Tuple[Any], None]) -> str:\n    if clazz is None:\n        return \"any\"\n    to_check = get_args(clazz)[0] if is_optional(clazz) else clazz\n    if is_collection(to_check):\n        return f\"{model_name(type_arg(to_check))}[]\"\n    elif is_dict(to_check):\n        key_type, value_type = dict_types(to_check)\n        return f\"dictionary[{model_name(key_type)}, {model_name(value_type)}]\"\n    elif is_enum(to_check):\n        # camel case to snake case\n        return re.sub(r\"(?<!^)(?=[A-Z])\", \"_\", to_check.__name__).lower()\n    elif get_origin(to_check) == Union:\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "        # this is a union of different types other than none.\n        # since union types are not supported, we fallback to any here\n        return \"any\"\n    elif isinstance(to_check, TypeVar):\n        return model_name(get_args(to_check))\n    elif isinstance(to_check, type) and issubclass(to_check, simple_type):\n        return lookup[to_check]\n    elif attrs.has(to_check):\n        name = getattr(to_check, \"kind\", None)\n        if not name:\n            raise AttributeError(f\"dataclass {to_check} need to define a ClassVar kind!\")\n        return name  # type: ignore\n    else:\n        return \"any\"",
        "completions": [
            "  \n[PYTHON]\n```\ndef model_name(clazz: Union[type, Tuple[Any], None]) -> str:\n    if clazz is None:\n        return \"any\"\n    to_check = get_args(clazz)[0] if is_optional(clazz) else clazz\n    if is_collection(to_check):\n        return f\"{model_name(type_arg(to_check))}[]\"\n    elif is_dict(to_check):\n        key_type, value_type = dict_types(to_check)\n        return f\"dictionary[{model_name(key_type)}, {model_name(value_type)}]\"\n    elif is_enum(to_check):\n        # camel case to snake case\n        return re.sub(r\"(?<!^)(?=[A-Z])\", \"_\", to_check.__name__).lower()\n    elif get_origin(to_check) == Union:\n        return \" | \".join(model_name(arg) for arg in get_args(to_check))\n```\n[/PYTHON]\n"
        ],
        "line": 26,
        "token": 298,
        "line_diff": 1,
        "token_diff": 2
    },
    {
        "id": 9,
        "name": "dedup_by_vid",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\ndef dedup_by_vid(\n    logits: np.ndarray,\n    labels: np.ndarray,\n    batch_masks: np.ndarray,\n    vids: np.ndarray,\n    frame_masks: Optional[np.ndarray] = None\n) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:\n\n  batch_masks = batch_masks.astype(bool)\n  vids = vids[batch_masks]\n  logits = logits[batch_masks]\n  labels = labels[batch_masks]\n  if frame_masks is not None:\n    frame_masks = frame_masks.astype(bool)\n    frame_masks = frame_masks[batch_masks]\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "  vid_set = set()\n  deduped_logits, deduped_labels, deduped_vids = [], [], []\n  for idx, vid in enumerate(vids):\n    if vid in vid_set:\n      continue\n    if frame_masks is None:\n      deduped_logits.append(logits[idx][np.newaxis, :])\n      deduped_labels.append(labels[idx][np.newaxis, :])\n    else:\n      frame_mask = frame_masks[idx]\n      deduped_logits.append(logits[idx][frame_mask])\n      deduped_labels.append(labels[idx][frame_mask])\n    vid_set.add(vid)\n    deduped_vids.append(vid)\n  return (np.concatenate(deduped_logits, axis=0),\n          np.concatenate(deduped_labels, axis=0), np.array(deduped_vids))",
        "completions": [
            "  ```\ndef dedup_by_vid(\n    logits: np.ndarray,\n    labels: np.ndarray,\n    batch_masks: np.ndarray,\n    vids: np.ndarray,\n    frame_masks: Optional[np.ndarray] = None\n) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:\n\n  batch_masks = batch_masks.astype(bool)\n  vids = vids[batch_masks]\n  logits = logits[batch_masks]\n  labels = labels[batch_masks]\n  if frame_masks is not None:\n    frame_masks = frame_masks.astype(bool)\n    frame_masks = frame_masks[batch_masks]\n  return vids, logits, labels\n```"
        ],
        "line": 27,
        "token": 229,
        "line_diff": 1,
        "token_diff": 1
    },
    {
        "id": 10,
        "name": "informedness_statistic",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\ndef informedness_statistic(\n        y_true,\n        y_pred,\n        sample_weight=None):\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "    return metrics.balanced_accuracy_score(\n        y_true=y_true,\n        y_pred=y_pred,\n        sample_weight=sample_weight,\n        adjusted=True)",
        "completions": [
            "  \n[PYTHON]\ndef informedness_statistic(y_true, y_pred, sample_weight=None):\n    # Calculate the informedness statistic\n    informedness = np.sum(y_true * y_pred) / np.sum(y_true * (1 - y_pred))\n\n    # Return the informedness statistic\n    return informedness\n[/PYTHON]\n"
        ],
        "line": 17,
        "token": 158,
        "line_diff": 0,
        "token_diff": 0
    },
    {
        "id": 11,
        "name": "run",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\n\n    def run(self, model: BaseModel, dataset: Dataset, features: Sequence[str]):\n        logger.info(f\"{self.__class__.__name__}: Running\")\n\n        # Dataset prediction\n        ds_predictions = pd.Series(model.predict(dataset).prediction, dataset.df.index)\n\n        # Warm up text metadata\n        for f in features:\n            if dataset.column_types[f] == \"text\":\n                dataset.column_meta[f, \"text\"]\n\n        # Prepare dataset for slicing\n        df = dataset.df.copy()\n        if dataset.target is not None:\n            df.drop(columns=dataset.target, inplace=True)\n        df[\"__gsk__target\"] = pd.Categorical(ds_predictions)\n        wdata = Dataset(df, target=\"__gsk__target\", column_types=dataset.column_types)\n        wdata.load_metadata_from_instance(dataset.column_meta)\n\n        # Find slices\n        sliced_cols = SliceFinder(\"tree\").run(wdata, features, target=wdata.target, min_slice_size=self.min_slice_size)\n\n        measure_fn, measure_name = self._get_measure_fn()\n        issues = []\n        for col, slices in sliced_cols.items():\n            if not slices:\n                continue\n\n            for slice_fn in slices:\n                data_slice = dataset.slice(slice_fn)\n\n                # Skip small slices\n                if len(data_slice) < 20 or len(data_slice) < 0.05 * len(dataset):\n                    continue\n\n                dx = pd.DataFrame(\n                    {\n                        \"feature\": dataset.df.index.isin(data_slice.df.index).astype(int),\n                        \"prediction\": ds_predictions,\n                    },\n                    index=dataset.df.index,\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "                )\n                dx.dropna(inplace=True)\n\n                metric_value = measure_fn(dx.feature, dx.prediction)\n                logger.info(f\"{self.__class__.__name__}: {slice_fn}\\tAssociation = {metric_value:.3f}\")\n\n                if metric_value > self.threshold:\n                    predictions = dx[dx.feature > 0].prediction.value_counts(normalize=True)\n                    plabel, p = predictions.index[0], predictions.iloc[0]\n\n                    description = \"Data slice {slicing_fn} seems to be highly associated to prediction {target} = `{plabel}` ({p_perc:.2f}% of predictions in the data slice).\"\n\n                    issue = Issue(\n                        model,\n                        dataset,\n                        group=SpuriousCorrelation,\n                        level=IssueLevel.MINOR,\n                        slicing_fn=slice_fn,\n                        meta={\n                            \"metric\": f\"Nominal association ({measure_name})\",\n                            \"metric_value\": metric_value,\n                            \"method\": self.method,\n                            \"deviation\": f\"Prediction {dataset.target} = `{plabel}` for {p * 100:.2f}% of samples in the slice\",\n                            \"target\": dataset.target,\n                            \"plabel\": plabel,\n                            \"p\": p,\n                            \"p_perc\": p * 100,\n                            \"threshold\": self.threshold,\n                        },\n                        description=description,\n                        importance=metric_value,\n                        tests=_generate_spurious_corr_tests,\n                        taxonomy=[\"avid-effect:performance:P0103\"],\n                    )\n\n                    extractor = ExampleExtractor(issue)\n                    examples = extractor.get_examples_dataframe(20, with_prediction=1)\n                    issue.add_examples(examples)\n\n                    issues.append(issue)\n\n        return issues",
        "completions": [
            "  ```\n    for col, slices in sliced_cols.items():\n        if not slices:\n            continue\n\n        for slice_fn in slices:\n            data_slice = dataset.slice(slice_fn)\n\n            # Skip small slices\n            if len(data_slice) < 20 or len(data_slice) < 0.05 * len(dataset):\n                continue\n\n            dx = pd.DataFrame(\n                {\n                    \"feature\": dataset.df.index.isin(data_slice.df.index).astype(int),\n                    \"prediction\": ds_predictions,\n                },\n                index=dataset.df.index,\n            )\n\n            # Calculate the measure\n            measure = measure_fn(dx)\n\n            # Check if the measure is above the threshold\n            if measure > self.threshold:\n                issues.append(\n                    {\n                        \"feature\": col,\n                        \"slice\": slice_fn,\n                        \"measure\": measure,\n                        \"measure_name\": measure_name,\n                    }\n                )\n```"
        ],
        "line": 46,
        "token": 383,
        "line_diff": 2,
        "token_diff": 2
    },
    {
        "id": 12,
        "name": "f1_score",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\ndef f1_score(\n        y_true,\n        y_pred,\n        sample_weight=None):\n    actual_category = pd.unique(y_true)\n    pred_category = pd.unique(y_pred)\n    labels = pd.unique(np.concatenate([actual_category, pred_category]))\n    f1s_array = metrics.f1_score(\n        y_true=y_true,\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "        y_pred=y_pred,\n        average=None,\n        labels=labels,\n        sample_weight=sample_weight,\n        zero_division=0\n    )\n    f1_scores = {}\n    for label, f1 in zip(labels, f1s_array):\n        f1_scores[label] = f1\n    return f1_scores",
        "completions": [
            "  \n[PYTHON]\n```\ndef f1_score(\n        y_true,\n        y_pred,\n        sample_weight=None):\n    actual_category = pd.unique(y_true)\n    pred_category = pd.unique(y_pred)\n    labels = pd.unique(np.concatenate([actual_category, pred_category]))\n    f1s_array = metrics.f1_score(\n        y_true=y_true,\n        y_pred=y_pred,\n        labels=labels,\n        average='weighted',\n        sample_weight=sample_weight)\n    return f1s_array\n```\n[/PYTHON]\n"
        ],
        "line": 22,
        "token": 189,
        "line_diff": 0,
        "token_diff": 0
    },
    {
        "id": 13,
        "name": "get_segments_from_frame_predictions",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\ndef get_segments_from_frame_predictions(\n    class_probs: np.ndarray,\n    displacements: np.ndarray,\n    input_mask: np.ndarray,\n    total_frames: int,\n    stride: int,\n    sampling_strategy: str = 'random',\n    displacement_normalizer: str = 'duration',\n    secs_per_timestep: float = 1.0,\n    score_threshold: float = float('-inf'),\n    feature_pyramid_config: Optional[ml_collections.ConfigDict] = None,\n) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:\n  assert sampling_strategy in {'random', 'linspace'}\n\n  if feature_pyramid_config is None:\n    num_frames = class_probs.shape[0]\n  else:\n    num_frames = feature_pyramid_config.num_features_level0\n  num_classes = class_probs.shape[-1]\n  if sampling_strategy == 'random':\n    # The default setting is to sample the center clip at test time.\n    start_frame_offset = np.maximum(0, (total_frames - num_frames * stride) //\n                                    2).astype(np.float32)\n  else:  # 'linspace'\n    start_frame_offset = 0.0\n  if displacement_normalizer == 'duration':\n    displacement_multiplier = total_frames\n  elif displacement_normalizer == 'sampled_span':\n    displacement_multiplier = num_frames * stride\n  else:\n    displacement_multiplier = 1\n  displacements[..., 0] *= -1.0\n\n  if feature_pyramid_config is not None:\n    feature_pyramid_downsample_stride = (\n        feature_pyramid_config.feature_pyramid_downsample_stride\n    )\n    displacements_per_level = model_utils.split_pyramid_features(\n        displacements,\n        feature_pyramid_config.num_features_level0,\n        len(feature_pyramid_config.feature_pyramid_levels),\n        feature_pyramid_downsample_stride,\n        axis=0,\n        np_backend=np)\n    normalize_displacements_by_downsample_stride = feature_pyramid_config.get(\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "        'normalize_displacements_by_downsample_stride', False)\n  else:\n    displacements_per_level = [displacements]\n    feature_pyramid_downsample_stride = 1\n    normalize_displacements_by_downsample_stride = False\n\n  segments = []\n  linspace_frame_indices = np.arange(0, num_frames, dtype=np.float32)\n  for level, cur_displacements in enumerate(displacements_per_level):\n    cur_downsample_stride = feature_pyramid_downsample_stride**level\n    if normalize_displacements_by_downsample_stride:\n      cur_displacements *= cur_downsample_stride\n    cur_stride = stride * cur_downsample_stride\n    if sampling_strategy == 'random':\n      frame_indices = np.arange(\n          0, num_frames * stride, cur_stride, dtype=np.float32\n      )[:, None, None]\n      cur_segments = (\n          frame_indices\n          + cur_displacements * displacement_multiplier\n          + start_frame_offset\n      )\n    else:  # 'linspace'\n      frame_indices = linspace_frame_indices[\n          range(0, num_frames, cur_downsample_stride)\n      ][:, None, None]\n      cur_segments = (\n          (frame_indices + cur_displacements)\n          * (total_frames - 1)\n          / (num_frames - 1)\n      )\n    segments.append(cur_segments)\n  segments = np.concatenate(segments, axis=0)\n  input_mask = input_mask.astype(bool)\n  total_frames = np.full((segments.shape[0], num_classes), total_frames)[\n      input_mask\n  ]\n  segments = segments[input_mask]\n  segments[..., 0] = np.maximum(segments[..., 0], 0)\n  segments[..., 1] = np.minimum(segments[..., 1], total_frames)\n  segments = segments * secs_per_timestep\n  class_probs = class_probs[input_mask]\n  mask = class_probs >= score_threshold\n  class_indices = mask.nonzero()[1]\n\n  return class_indices, class_probs[mask], segments[mask]",
        "completions": [
            "  ```\ndef get_segments_from_frame_predictions(\n    class_probs: np.ndarray,\n    displacements: np.ndarray,\n    input_mask: np.ndarray,\n    total_frames: int,\n    stride: int,\n    sampling_strategy: str = 'random',\n    displacement_normalizer: str = 'duration',\n    secs_per_timestep: float = 1.0,\n    score_threshold: float = float('-inf'),\n    feature_pyramid_config: Optional[ml_collections.ConfigDict] = None,\n) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:\n  assert sampling_strategy in {'random', 'linspace'}\n\n  if feature_pyramid_config is None:\n    num_frames = class_probs.shape[0]\n  else:\n    num_frames = feature_pyramid_config.num_features_level0\n  num_classes = class_probs.shape[-1]\n  if sampling_strategy == 'random':\n    # The default setting is to sample the center clip at test time.\n    start_frame_offset = np.maximum(0, (total_frames - num_frames * stride) //\n                                    2).astype(np.float32)\n  else:  # 'linspace'\n    start_frame_offset = 0.0\n  if displacement_normalizer == 'duration':\n    displacement_multiplier = total_frames\n  elif displacement_normalizer =='sampled_span':\n    displacement_multiplier = num_frames * stride\n  else:\n    displacement_multiplier = 1\n  displacements[..., 0] *= -1.0\n\n  if feature_pyramid_config is not None:\n    feature_pyramid_downsample_stride = (\n        feature_pyramid_config.feature_pyramid_downsample_stride\n    )\n    displacements_per_level = model_utils.split_pyramid_features(\n        displacements,\n        feature_pyramid_config.num_features_level0,\n        len(feature_pyramid_config.feature_pyramid_levels),\n        feature_pyramid_downsample_stride,\n        axis=0,\n        np_backend=np)\n    normalize_displacements_by_downsample_stride = feature_pyramid_config.get(\n        'normalize_displacements_by_downsample_stride', False)\n    if normalize_displacements_by_downsample_stride:\n      displacements_per_level /= feature_pyramid_downsample_stride\n    displacements_per_level = np.stack(displacements_per_level, axis=0)\n    displacements_per_level = np.transpose(displacements_per_level, (1, 2, 0))\n    displacements_per_level = np.reshape(displacements_per_level, (num_frames, num_classes, 2))\n    displacements_per_level = np.transpose(displacements_per_level, (2, 0, 1))\n    displacements_per_level = np.reshape(displacements_per_level, (num_classes, num_frames, 2))\n    displacements_per_level = np.transpose(displacements_per_level, (1, 2, 0))\n    displacements_per_level = np.reshape(displacements_per_level, (num_frames, num_classes, 2))\n    displacements_per_level = np.transpose(displacements_per_level, (2, 0, 1))\n    displacements_per_level = np.reshape(displacements_per_level, (num_classes, num_frames, 2))\n    displacements_per_level = np.transpose(displacements_per_level, (1, 2, 0))\n    displac"
        ],
        "line": 56,
        "token": 370,
        "line_diff": 2,
        "token_diff": 2
    },
    {
        "id": 14,
        "name": "plot_hierarchical_distribution",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\ndef plot_hierarchical_distribution(self):\n    # Determine the number of plots\n    plot_count = 1\n    if \"subdatasets\" in self._metadata:\n        plot_count += 1\n\n    layer = 0\n    while f\"subdataset_layer_{layer}\" in self._metadata:\n        plot_count += 1\n        layer += 1\n\n    plt.figure(figsize=(6 * plot_count, 4))\n    axes = [plt.subplot(1, plot_count, i + 1) for i in range(plot_count)]\n\n    # Plot Event Distribution\n    events = []\n    for clip_id in self._index[\"clips\"]:\n        clip = self.clip(clip_id)\n        if hasattr(clip, \"tags\") and hasattr(clip.tags, \"labels\"):\n            events.extend(clip.tags.labels)\n        elif hasattr(clip, \"events\") and hasattr(clip.events, \"labels\"):\n            events.extend(clip.events.labels)\n\n    plot_distribution(\n        events, \"Event Distribution in the Dataset\", \"Count\", \"Event\", axes, 0\n    )\n\n    # Plot Subclasses Distribution and then Hierarchical layers\n    subplot_position = 1  # We've already plotted events at position 0\n    if \"subdatasets\" in self._metadata:\n        subclasses = [\n            self._metadata[clip_id][\"subdataset\"] for clip_id in self._index[\"clips\"]\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "        ]\n        plot_distribution(\n            subclasses,\n            \"Subclass Distribution in the Dataset\",\n            \"Count\",\n            \"Subclass\",\n            axes,\n            subplot_position,\n        )\n        subplot_position += 1\n    else:\n        print(\"Subclass information not available.\")\n\n    layer = 0\n    while f\"subdataset_layer_{layer}\" in self._metadata:\n        layer_data = [\n            self._metadata[clip_id][f\"subdataset_layer_{layer}\"]\n            for clip_id in self._index[\"clips\"]\n        ]\n        plot_distribution(\n            layer_data,\n            f\"Subdataset Layer {layer} Distribution in the Dataset\",\n            \"Count\",\n            f\"Subdataset Layer {layer}\",\n            axes,\n            subplot_position,\n        )\n        layer += 1\n        subplot_position += 1\n\n    plt.tight_layout()\n    plt.show()\n    print(\"\\n\")",
        "completions": [
            "  ```\ndef plot_hierarchical_distribution(self):\n    # Plot Subclasses Distribution and then Hierarchical layers\n    subplot_position = 1  # We've already plotted events at position 0\n    if \"subdatasets\" in self._metadata:\n        subclasses = [\n            self._metadata[clip_id][\"subdataset\"] for clip_id in self._index[\"clips\"]\n        ]\n        plot_distribution(\n            subclasses,\n            \"Subclass Distribution in the Dataset\",\n            \"Count\",\n            \"Subclass\",\n            axes,\n            subplot_position,\n        )\n        subplot_position += 1\n\n    layer = 0\n    while f\"subdataset_layer_{layer}\" in self._metadata:\n        subclasses = [\n            self._metadata[clip_id][f\"subdataset_layer_{layer}\"]\n            for clip_id in self._index[\"clips\"]\n        ]\n        plot_distribution(\n            subclasses,\n            f\"Subclass Distribution in Layer {layer}\",\n            \"Count\",\n            \"Subclass\",\n            axes,\n            subplot_position,\n        )\n        subplot_position += 1\n        layer += 1\n```"
        ],
        "line": 40,
        "token": 367,
        "line_diff": 2,
        "token_diff": 2
    },
    {
        "id": 15,
        "name": "list",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\n    def list(self) -> List[Dict]:\n        session = self._get_session()\n        try:\n            response = session.get(self.zapier_nla_api_base + \"exposed/\")\n            response.raise_for_status()\n        except requests.HTTPError as http_err:\n            if response.status_code == 401:\n                if self.zapier_nla_oauth_access_token:\n                    raise requests.HTTPError(\n                        f\"An unauthorized response occurred. Check that your \"\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "                        f\"access token is correct and doesn't need to be \"\n                        f\"refreshed. Err: {http_err}\",\n                        response=response,\n                    )\n                raise requests.HTTPError(\n                    f\"An unauthorized response occurred. Check that your api \"\n                    f\"key is correct. Err: {http_err}\",\n                    response=response,\n                )\n            raise http_err\n        return response.json()[\"results\"]",
        "completions": [
            "  ```\n    def list(self) -> List[Dict]:\n        session = self._get_session()\n        try:\n            response = session.get(self.zapier_nla_api_base + \"exposed/\")\n            response.raise_for_status()\n        except requests.HTTPError as http_err:\n            if response.status_code == 401:\n                if self.zapier_nla_oauth_access_token:\n                    raise requests.HTTPError(\n                        f\"An unauthorized response occurred. Check that your \"\n                        f\"Zapier NLA OAuth access token is valid and has not \"\n                        f\"expired.\"\n                    )\n                else:\n                    raise requests.HTTPError(\n                        f\"An unauthorized response occurred. Check that your \"\n                        f\"Zapier NLA OAuth access token is valid and has not \"\n                        f\"expired.\"\n                    )\n            else:\n                raise requests.HTTPError(\n                    f\"An unexpected response occurred. Check the response \"\n                    f\"status code and the response body for more information.\"\n                )\n        return response.json()\n```"
        ],
        "line": 23,
        "token": 207,
        "line_diff": 0,
        "token_diff": 0
    },
    {
        "id": 16,
        "name": "frame_accuracy",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\ndef frame_accuracy(\n    logits: np.ndarray,\n    label: np.ndarray,\n    background_logit_threshold: float = 0.0,\n) -> float:\n  top1_idx = np.argmax(logits, axis=-1)\n  background_label = np.sum(label, axis=-1) == 0\n  pred_background = (\n      np.max(logits, axis=-1) <= background_logit_threshold\n  ).astype(np.int32)\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "\n  # Extracts the label at the highest logit index for each input.\n  top1_correct = np.take_along_axis(label, top1_idx[..., None], axis=-1)\n  top1_correct = np.squeeze(top1_correct)\n  foreground_correct = ~pred_background.astype(bool) * top1_correct\n\n  # Count correctly classified background samples.\n  background_correct = pred_background * background_label\n  correct = foreground_correct + background_correct\n  return np.sum(correct) / len(logits)",
        "completions": [
            "  ```\n  return (top1_idx == label).mean()\n```"
        ],
        "line": 23,
        "token": 205,
        "line_diff": 0,
        "token_diff": 0
    },
    {
        "id": 17,
        "name": "julian2num",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\ndef julian2num(j):\n    ep = np.datetime64(get_epoch(), 'h').astype(float) / 24.\n    ep0 = np.datetime64('0000-12-31T00:00:00', 'h').astype(float) / 24.\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "    # Julian offset defined above is relative to 0000-12-31, but we need\n    # relative to our current epoch:\n    dt = __getattr__(\"JULIAN_OFFSET\") - ep0 + ep\n    return np.subtract(j, dt)  # Handles both scalar & nonscalar j.",
        "completions": [
            "  ```\ndef julian2num(j):\n    return (j - ep0) * 24. + ep\n```"
        ],
        "line": 16,
        "token": 190,
        "line_diff": 0,
        "token_diff": 0
    },
    {
        "id": 18,
        "name": "format_template",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\ndef format_template(\n    path: Path, base_path: Path, project_metadata: ProjectMetadata\n) -> TemplateOutput:\n    if not path.exists():\n        raise FileNotFoundError(f\"Template file {path} does not exist\")\n\n    metadata_variables = project_metadata.model_dump()\n\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "    template = Template(path.read_text())\n    content = template.render(metadata_variables)\n\n    output_name = str(path.relative_to(base_path))\n    for key, value in metadata_variables.items():\n        output_name = output_name.replace(f\"[{key}]\", str(value))\n\n    return TemplateOutput(content=content, path=output_name)",
        "completions": [
            "  ```\ndef format_template(path: Path, base_path: Path, project_metadata: ProjectMetadata) -> TemplateOutput:\n    if not path.exists():\n        raise FileNotFoundError(f\"Template file {path} does not exist\")\n\n    metadata_variables = project_metadata.model_dump()\n\n    template_output = TemplateOutput(\n        path=path,\n        base_path=base_path,\n        metadata_variables=metadata_variables,\n    )\n\n    return template_output\n```"
        ],
        "line": 19,
        "token": 193,
        "line_diff": 0,
        "token_diff": 0
    },
    {
        "id": 19,
        "name": "res_block",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\n\ndef res_block(\n    in_channels: int,\n    out_channels: int,\n    dropout: float = 0.0,\n    norm_groups: int = 32,\n    norm_eps: float = 1e-6,\n) -> ResBlock:\n    res_block_partial = partial(\n        ResBlock,\n        in_channels=in_channels,\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "        out_channels=out_channels,\n        pre_outconv_dropout=dropout,\n        scale_shift_conditional=False,\n        norm_groups=norm_groups,\n        norm_eps=norm_eps,\n    )\n    if in_channels != out_channels:\n        return res_block_partial(\n            skip_conv=nn.Conv2d(in_channels, out_channels, kernel_size=1)\n        )\n    else:\n        return res_block_partial()",
        "completions": [
            "  \n[PYTHON]\n```\ndef res_block(\n    in_channels: int,\n    out_channels: int,\n    dropout: float = 0.0,\n    norm_groups: int = 32,\n    norm_eps: float = 1e-6,\n) -> ResBlock:\n    res_block_partial = partial(\n        ResBlock,\n        in_channels=in_channels,\n        out_channels=out_channels,\n        dropout=dropout,\n        norm_groups=norm_groups,\n        norm_eps=norm_eps,\n    )\n    return res_block_partial\n```\n[/PYTHON]\n"
        ],
        "line": 23,
        "token": 190,
        "line_diff": 0,
        "token_diff": 0
    },
    {
        "id": 20,
        "name": "process",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\n    def process(self, offset: int = 0) -> \"GeneratorStepOutput\":  # type: ignore\n        instructions = []\n        mutation_no = 0\n\n        iter_no = 0\n        while len(instructions) < self.num_instructions:\n            prompts = self._apply_random_mutation(iter_no=iter_no)\n\n            generated_prompts = flatten_responses(\n                self.llm.generate(prompts, **self.llm.generation_kwargs)  # type: ignore\n            )\n            for idx, generated_prompt in enumerate(generated_prompts):\n                generated_prompt = generated_prompt.split(\"Prompt#:\")[-1].strip()\n                if self.max_length >= len(generated_prompt) >= self.min_length:  # type: ignore\n                    instructions.append(generated_prompt)\n                    self._prompts[idx] = np.random.choice(self._seed_texts)  # type: ignore\n                else:\n                    self._prompts[idx] = generated_prompt  # type: ignore\n\n            self._logger.info(\n                f\"🔄 Ran iteration {iter_no} with {len(instructions)} instructions already evolved!\"\n            )\n            iter_no += 1\n\n            if len(instructions) > self.num_instructions:\n                instructions = instructions[: self.num_instructions]\n            if len(instructions) > mutation_no:\n                mutation_no = len(instructions) - mutation_no\n\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "            if not self.generate_answers and len(instructions[-mutation_no:]) > 0:\n                yield (\n                    [\n                        self.format_output(mutated_instruction)\n                        for mutated_instruction in instructions[-mutation_no:]\n                    ],\n                    len(instructions) >= self.num_instructions,\n                )\n\n        self._logger.info(f\"🎉 Finished evolving {len(instructions)} instructions!\")\n\n        if self.generate_answers:\n            self._logger.info(\n                f\"🧠 Generating answers for the {len(instructions)} evolved instructions!\"\n            )\n\n            answers = self._generate_answers(instructions)\n\n            self._logger.info(\n                f\"🎉 Finished generating answers for the {len(instructions)} evolved instructions!\"\n            )\n\n            yield (\n                [\n                    self.format_output(instruction, answer)\n                    for instruction, answer in zip(instructions, answers)\n                ],\n                True,\n            )",
        "completions": [
            "  ```\ndef process(self, offset: int = 0) -> \"GeneratorStepOutput\":  # type: ignore\n    instructions = []\n    mutation_no = 0\n\n    iter_no = 0\n    while len(instructions) < self.num_instructions:\n        prompts = self._apply_random_mutation(iter_no=iter_no)\n\n        generated_prompts = flatten_responses(\n            self.llm.generate(prompts, **self.llm.generation_kwargs)  # type: ignore\n        )\n        for idx, generated_prompt in enumerate(generated_prompts):\n            generated_prompt = generated_prompt.split(\"Prompt#:\")[-1].strip()\n            if self.max_length >= len(generated_prompt) >= self.min_length:  # type: ignore\n                instructions.append(generated_prompt)\n                self._prompts[idx] = np.random.choice(self._seed_texts)  # type: ignore\n            else:\n                self._prompts[idx] = generated_prompt  # type: ignore\n\n        self._logger.info(\n            f\"🔄 Ran iteration {iter_no} with {len(instructions)} instructions already evolved!\"\n        )\n        iter_no += 1\n\n        if len(instructions) > self.num_instructions:\n            instructions = instructions[: self.num_instructions]\n        if len(instructions) > mutation_no:\n            mutation_no = len(instructions) - mutation_no\n\n    return GeneratorStepOutput(instructions, mutation_no)\n```"
        ],
        "line": 37,
        "token": 338,
        "line_diff": 2,
        "token_diff": 2
    },
    {
        "id": 21,
        "name": "query_tags_from_docker_hub",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\ndef query_tags_from_docker_hub(\n    filter_func: Callable[[str], bool],\n    namespace: str,\n    repository: str,\n    docker_hub_token: str,\n    num_tags: Optional[int] = None,\n) -> List[str]:\n    filtered_tags = []\n    headers = {\n        \"Authorization\": f\"Bearer {docker_hub_token}\",\n    }\n\n    page_count = 1\n    while page_count:\n        logger.info(f\"Querying page {page_count}\")\n        url = f\"https://hub.docker.com/v2/namespaces/{namespace}/repositories/{repository}/tags?page={page_count}&page_size=100\"  # noqa E501\n\n        response = requests.get(url, headers=headers)\n        response_json = response.json()\n\n        # Stop querying if Docker Hub API returns an error\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "        if response.status_code != 200:\n            logger.info(f\"Failed to query tags from Docker Hub: Error: {response_json}\")\n            return sorted([f\"{namespace}/{repository}:{t}\" for t in filtered_tags])\n\n        result = response_json[\"results\"]\n        tags = [tag[\"name\"] for tag in result]\n        filtered_tags_page = list(filter(filter_func, tags))  # Filter tags\n\n        # Add enough tags to not exceed num_tags if num_tags is specified\n        if num_tags:\n            if len(filtered_tags) + len(filtered_tags_page) > num_tags:\n                filtered_tags.extend(\n                    filtered_tags_page[: num_tags - len(filtered_tags)]\n                )\n                break\n        filtered_tags.extend(filtered_tags_page)\n\n        logger.info(f\"Tag count: {len(filtered_tags)}\")\n        if not response_json[\"next\"]:\n            break\n        page_count += 1\n    return sorted([f\"{namespace}/{repository}:{t}\" for t in filtered_tags])",
        "completions": [
            "  \n[PYTHON]\nfiltered_tags = [tag[\"name\"] for tag in response_json[\"results\"] if filter_func(tag[\"name\"])]\n[/PYTHON]"
        ],
        "line": 31,
        "token": 277,
        "line_diff": 1,
        "token_diff": 1
    },
    {
        "id": 22,
        "name": "evaluate",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\n    def evaluate(self, points):\n        points = np.atleast_2d(points)\n\n        dim, num_m = np.array(points).shape\n        if dim != self.dim:\n            raise ValueError(\"points have dimension {}, dataset has dimension \"\n                             \"{}\".format(dim, self.dim))\n\n        result = np.zeros(num_m)\n\n        if num_m >= self.num_dp:\n            # there are more points than data, so loop over data\n            for i in range(self.num_dp):\n                diff = self.dataset[:, i, np.newaxis] - points\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "                tdiff = np.dot(self.inv_cov, diff)\n                energy = np.sum(diff * tdiff, axis=0) / 2.0\n                result = result + np.exp(-energy)\n        else:\n            # loop over points\n            for i in range(num_m):\n                diff = self.dataset - points[:, i, np.newaxis]\n                tdiff = np.dot(self.inv_cov, diff)\n                energy = np.sum(diff * tdiff, axis=0) / 2.0\n                result[i] = np.sum(np.exp(-energy), axis=0)\n\n        result = result / self.norm_factor\n\n        return result",
        "completions": [
            "  ```\n        for i in range(self.num_dp):\n            diff = self.dataset[:, i, np.newaxis] - points\n            result[i] = np.sum(np.square(diff), axis=1)\n```"
        ],
        "line": 24,
        "token": 246,
        "line_diff": 1,
        "token_diff": 1
    },
    {
        "id": 23,
        "name": "_get_device_type_from_env",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\ndef _get_device_type_from_env() -> str:\n    if torch.cuda.is_available():\n        device = \"cuda\"\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "    else:\n        device = \"cpu\"\n    return device",
        "completions": [
            "  ```\ndef _get_device_type_from_env() -> str:\n    if torch.cuda.is_available():\n        device = \"cuda\"\n    else:\n        device = \"cpu\"\n    return device\n```"
        ],
        "line": 16,
        "token": 166,
        "line_diff": 0,
        "token_diff": 0
    },
    {
        "id": 24,
        "name": "edge_mask_causal",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\ndef edge_mask_causal(edge_idx: torch.LongTensor, mask_ij: torch.Tensor) -> torch.Tensor:\n    idx = torch.arange(edge_idx.size(1), device=edge_idx.device)\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "    idx_expand = idx.reshape([1, -1, 1])\n    mask_ij_causal = (edge_idx < idx_expand).float() * mask_ij\n    return mask_ij_causal",
        "completions": [
            "  \n[PYTHON]\ndef edge_mask_causal(edge_idx: torch.LongTensor, mask_ij: torch.Tensor) -> torch.Tensor:\n    idx = torch.arange(edge_idx.size(1), device=edge_idx.device)\n    mask = torch.zeros(edge_idx.size(0), edge_idx.size(1), device=edge_idx.device)\n    mask[edge_idx[0], idx] = mask_ij\n    return mask\n[/PYTHON]\n"
        ],
        "line": 15,
        "token": 174,
        "line_diff": 0,
        "token_diff": 0
    },
    {
        "id": 25,
        "name": "run_calc",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\ndef run_calc(\n    atoms: Atoms,\n    geom_file: str | None = None,\n    copy_files: SourceDirectory | dict[SourceDirectory, Filenames] | None = None,\n    get_forces: bool = False,\n) -> Atoms:\n    # Copy atoms so we don't modify it in-place\n    atoms = copy_atoms(atoms)\n\n    # Perform staging operations\n    tmpdir, job_results_dir = calc_setup(atoms, copy_files=copy_files)\n\n    # Run calculation\n    if get_forces:\n        atoms.get_forces()\n    else:\n        atoms.get_potential_energy()\n\n    # Most ASE calculators do not update the atoms object in-place with a call\n    # to .get_potential_energy(), which is important if an internal optimizer is\n    # used. This section is done to ensure that the atoms object is updated to\n    # the final geometry if `geom_file` is provided.\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "    # Note: We have to be careful to make sure we don't lose the calculator\n    # object, as this contains important information such as the parameters\n    # and output properties (e.g. final magnetic moments).\n    if geom_file:\n        atoms_new = read(zpath(tmpdir / geom_file))\n        if isinstance(atoms_new, list):\n            atoms_new = atoms_new[-1]\n\n        # Make sure the atom indices didn't get updated somehow (sanity check).\n        # If this happens, there is a serious problem.\n        if (\n            np.array_equal(atoms_new.get_atomic_numbers(), atoms.get_atomic_numbers())\n            is False\n        ):\n            raise ValueError(\"Atomic numbers do not match between atoms and geom_file.\")\n\n        atoms.positions = atoms_new.positions\n        atoms.cell = atoms_new.cell\n\n    # Perform cleanup operations\n    calc_cleanup(atoms, tmpdir, job_results_dir)\n\n    return atoms",
        "completions": [
            "  ```\ndef run_calc(\n    atoms: Atoms,\n    geom_file: str | None = None,\n    copy_files: SourceDirectory | dict[SourceDirectory, Filenames] | None = None,\n    get_forces: bool = False,\n) -> Atoms:\n    # Copy atoms so we don't modify it in-place\n    atoms = copy_atoms(atoms)\n\n    # Perform staging operations\n    tmpdir, job_results_dir = calc_setup(atoms, copy_files=copy_files)\n\n    # Run calculation\n    if get_forces:\n        atoms.get_forces()\n    else:\n        atoms.get_potential_energy()\n\n    # Most ASE calculators do not update the atoms object in-place with a call\n    # to.get_potential_energy(), which is important if an internal optimizer is\n    # used. This section is done to ensure that the atoms object is updated to\n    # the final geometry if `geom_file` is provided.\n    if geom_file is not None:\n        atoms.write(geom_file)\n\n    return atoms\n```"
        ],
        "line": 32,
        "token": 288,
        "line_diff": 1,
        "token_diff": 1
    },
    {
        "id": 26,
        "name": "plan",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\n    def plan(self, task: str):\n        agent_pool_description = self.agent_pool.get_agent_pool_description()\n        feedback = self.get_last_feedback()\n        last_plan = self.get_last_plan()\n\n        # run the planning\n        plan = self.planning.execute(task, agent_pool_description, last_plan, feedback)\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "\n        logger.info(f\"Plan: {plan}\")\n\n        self.shared_memory.add(\n            event_type=EventType.planning, agent=self.name, content=str(plan)\n        )\n\n        return plan",
        "completions": [
            "  ```\ndef planning.execute(task, agent_pool_description, last_plan, feedback):\n    # implement the planning algorithm here\n    return plan\n```"
        ],
        "line": 19,
        "token": 189,
        "line_diff": 0,
        "token_diff": 0
    },
    {
        "id": 27,
        "name": "load",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\n    def load(cls, config_filepath: str) -> \"AIConfigRuntime\":\n\n        with open(config_filepath) as file:\n            if is_yaml_ext(config_filepath):\n                yaml_data = yaml.safe_load(file)\n                data = json.dumps(yaml_data)\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "            else:\n                data = file.read()\n\n        config_runtime = cls.load_json(data)\n        # set the file path. This is used when saving the config\n        config_runtime.file_path = config_filepath\n        return config_runtime",
        "completions": [
            "  ```\n    def load(cls, config_filepath: str) -> \"AIConfigRuntime\":\n\n        with open(config_filepath) as file:\n            if is_yaml_ext(config_filepath):\n                yaml_data = yaml.safe_load(file)\n                data = json.dumps(yaml_data)\n                return cls(data)\n```"
        ],
        "line": 18,
        "token": 189,
        "line_diff": 0,
        "token_diff": 0
    },
    {
        "id": 28,
        "name": "unique",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\ndef unique(ar1, return_index=False, return_inverse=False):\n    output = np.unique(ar1,\n                       return_index=return_index,\n                       return_inverse=return_inverse)\n    if isinstance(output, tuple):\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "        output = list(output)\n        output[0] = output[0].view(MaskedArray)\n        output = tuple(output)\n    else:\n        output = output.view(MaskedArray)\n    return output",
        "completions": [
            "  \n[PYTHON]\ndef unique(ar1, return_index=False, return_inverse=False):\n    output = np.unique(ar1,\n                       return_index=return_index,\n                       return_inverse=return_inverse)\n    if isinstance(output, tuple):\n        return output[0]\n    else:\n        return output\n[/PYTHON]\n"
        ],
        "line": 18,
        "token": 176,
        "line_diff": 0,
        "token_diff": 0
    },
    {
        "id": 29,
        "name": "metadata_and_cell_to_header",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\ndef metadata_and_cell_to_header(\n    notebook, metadata, text_format, fmt, unsupported_keys=None\n):\n\n    header = []\n\n    lines_to_next_cell = None\n    root_level_metadata = {}\n    root_level_metadata_as_raw_cell = fmt.get(\"root_level_metadata_as_raw_cell\", True)\n\n    if not root_level_metadata_as_raw_cell:\n        root_level_metadata = metadata.get(\"jupytext\", {}).pop(\n            \"root_level_metadata\", {}\n        )\n    elif notebook.cells:\n        cell = notebook.cells[0]\n        if cell.cell_type == \"raw\":\n            lines = cell.source.strip(\"\\n\\t \").splitlines()\n            if (\n                len(lines) >= 2\n                and _HEADER_RE.match(lines[0])\n                and _HEADER_RE.match(lines[-1])\n            ):\n                header = lines[1:-1]\n                lines_to_next_cell = cell.metadata.get(\"lines_to_next_cell\")\n                notebook.cells = notebook.cells[1:]\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "\n    metadata = insert_jupytext_info_and_filter_metadata(\n        metadata, fmt, text_format, unsupported_keys\n    )\n\n    if metadata:\n        root_level_metadata[\"jupyter\"] = metadata\n\n    if root_level_metadata:\n        header.extend(\n            yaml.safe_dump(root_level_metadata, default_flow_style=False).splitlines()\n        )\n\n    if header:\n        header = [\"---\"] + header + [\"---\"]\n\n        if (\n            fmt.get(\"hide_notebook_metadata\", False)\n            and text_format.format_name == \"markdown\"\n        ):\n            header = [\"<!--\", \"\"] + header + [\"\", \"-->\"]\n\n    return (\n        comment_lines(header, text_format.header_prefix, text_format.header_suffix),\n        lines_to_next_cell,\n    )",
        "completions": [
            "  \n[PYTHON]\ndef metadata_and_cell_to_header(notebook, metadata, text_format, fmt, unsupported_keys=None):\n    header = []\n    lines_to_next_cell = None\n    root_level_metadata = {}\n    root_level_metadata_as_raw_cell = fmt.get(\"root_level_metadata_as_raw_cell\", True)\n    if not root_level_metadata_as_raw_cell:\n        root_level_metadata = metadata.get(\"jupytext\", {}).pop(\"root_level_metadata\", {})\n    elif notebook.cells:\n        cell = notebook.cells[0]\n        if cell.cell_type == \"raw\":\n            lines = cell.source.strip(\"\\n\\t \").splitlines()\n            if len(lines) >= 2 and _HEADER_RE.match(lines[0]) and _HEADER_RE.match(lines[-1]):\n                header = lines[1:-1]\n                lines_to_next_cell = cell.metadata.get(\"lines_to_next_cell\")\n                notebook.cells = notebook.cells[1:]\n    return header, lines_to_next_cell, root_level_metadata\n[/PYTHON]\n"
        ],
        "line": 36,
        "token": 284,
        "line_diff": 2,
        "token_diff": 1
    },
    {
        "id": 30,
        "name": "spectrogram",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\ndef spectrogram(\n    waveform: np.ndarray,\n    window: np.ndarray,\n    frame_length: int,\n    hop_length: int,\n    fft_length: Optional[int] = None,\n    power: Optional[float] = 1.0,\n    center: bool = True,\n    pad_mode: str = \"reflect\",\n    onesided: bool = True,\n    preemphasis: Optional[float] = None,\n    mel_filters: Optional[np.ndarray] = None,\n    mel_floor: float = 1e-10,\n    log_mel: Optional[str] = None,\n    reference: float = 1.0,\n    min_value: float = 1e-10,\n    db_range: Optional[float] = None,\n    remove_dc_offset: Optional[bool] = None,\n    dtype: np.dtype = np.float32,\n) -> np.ndarray:\n    window_length = len(window)\n\n    if fft_length is None:\n        fft_length = frame_length\n\n    if frame_length > fft_length:\n        raise ValueError(f\"frame_length ({frame_length}) may not be larger than fft_length ({fft_length})\")\n\n    if window_length != frame_length:\n        raise ValueError(f\"Length of the window ({window_length}) must equal frame_length ({frame_length})\")\n\n    if hop_length <= 0:\n        raise ValueError(\"hop_length must be greater than zero\")\n\n    if waveform.ndim != 1:\n        raise ValueError(f\"Input waveform must have only one dimension, shape is {waveform.shape}\")\n\n    if np.iscomplexobj(waveform):\n        raise ValueError(\"Complex-valued input waveforms are not currently supported\")\n\n    if power is None and mel_filters is not None:\n        raise ValueError(\n            \"You have provided `mel_filters` but `power` is `None`. Mel spectrogram computation is not yet supported for complex-valued spectrogram.\"\n            \"Specify `power` to fix this issue.\"\n        )\n\n    # center pad the waveform\n    if center:\n        padding = [(int(frame_length // 2), int(frame_length // 2))]\n        waveform = np.pad(waveform, padding, mode=pad_mode)\n\n    # promote to float64, since np.fft uses float64 internally\n    waveform = waveform.astype(np.float64)\n    window = window.astype(np.float64)\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "\n    # split waveform into frames of frame_length size\n    num_frames = int(1 + np.floor((waveform.size - frame_length) / hop_length))\n\n    num_frequency_bins = (fft_length // 2) + 1 if onesided else fft_length\n    spectrogram = np.empty((num_frames, num_frequency_bins), dtype=np.complex64)\n\n    # rfft is faster than fft\n    fft_func = np.fft.rfft if onesided else np.fft.fft\n    buffer = np.zeros(fft_length)\n\n    timestep = 0\n    for frame_idx in range(num_frames):\n        buffer[:frame_length] = waveform[timestep : timestep + frame_length]\n\n        if remove_dc_offset:\n            buffer[:frame_length] = buffer[:frame_length] - buffer[:frame_length].mean()\n\n        if preemphasis is not None:\n            buffer[1:frame_length] -= preemphasis * buffer[: frame_length - 1]\n            buffer[0] *= 1 - preemphasis\n\n        buffer[:frame_length] *= window\n\n        spectrogram[frame_idx] = fft_func(buffer)\n        timestep += hop_length\n\n    # note: ** is much faster than np.power\n    if power is not None:\n        spectrogram = np.abs(spectrogram, dtype=np.float64) ** power\n\n    spectrogram = spectrogram.T\n\n    if mel_filters is not None:\n        spectrogram = np.maximum(mel_floor, np.dot(mel_filters.T, spectrogram))\n\n    if power is not None and log_mel is not None:\n        if log_mel == \"log\":\n            spectrogram = np.log(spectrogram)\n        elif log_mel == \"log10\":\n            spectrogram = np.log10(spectrogram)\n        elif log_mel == \"dB\":\n            if power == 1.0:\n                spectrogram = amplitude_to_db(spectrogram, reference, min_value, db_range)\n            elif power == 2.0:\n                spectrogram = power_to_db(spectrogram, reference, min_value, db_range)\n            else:\n                raise ValueError(f\"Cannot use log_mel option '{log_mel}' with power {power}\")\n        else:\n            raise ValueError(f\"Unknown log_mel option: {log_mel}\")\n\n        spectrogram = np.asarray(spectrogram, dtype)\n\n    return spectrogram",
        "completions": [
            "  ```\ndef spectrogram(\n    waveform: np.ndarray,\n    window: np.ndarray,\n    frame_length: int,\n    hop_length: int,\n    fft_length: Optional[int] = None,\n    power: Optional[float] = 1.0,\n    center: bool = True,\n    pad_mode: str = \"reflect\",\n    onesided: bool = True,\n    preemphasis: Optional[float] = None,\n    mel_filters: Optional[np.ndarray] = None,\n    mel_floor: float = 1e-10,\n    log_mel: Optional[str] = None,\n    reference: float = 1.0,\n    min_value: float = 1e-10,\n    db_range: Optional[float] = None,\n    remove_dc_offset: Optional[bool] = None,\n    dtype: np.dtype = np.float32,\n) -> np.ndarray:\n    # Your code here\n```"
        ],
        "line": 58,
        "token": 527,
        "line_diff": 2,
        "token_diff": 2
    },
    {
        "id": 31,
        "name": "dump_yaml",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\ndef dump_yaml(data: Any, stream: TextIO):\n    yaml.dump(\n        data,\n        stream,\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "        Dumper=_YAMLDumper,\n        default_flow_style=False,\n        allow_unicode=True,\n        sort_keys=False,\n    )",
        "completions": [
            "  ```\nyaml.dump(\n    data,\n    stream,\n    default_flow_style=False,\n    allow_unicode=True\n)\n```"
        ],
        "line": 17,
        "token": 166,
        "line_diff": 0,
        "token_diff": 0
    },
    {
        "id": 32,
        "name": "load",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\n\ndef load(top, traj, name='oxDNA', setup_nodes=True, world_scale=0.01):\n\n    # the scale of the oxDNA files seems to be based on nanometres rather than angstrongs\n    # like most structural biology files, so currently adjusting the world_scale to\n    # compensate\n    scale_dna = world_scale * 10\n\n    # read in the topology and trajectory files\n    is_new_top = is_new_topology(top)\n    if is_new_top:\n        topology = read_topology_new(top)\n    else:\n        topology = read_topology_old(top)\n\n    trajectory = read_trajectory(traj)\n    n_frames = trajectory.shape[0]\n\n    # creat toplogy object with positions of the first frame, and the bonds from the\n    # topology object\n    mol = obj.create_object(\n        name=name,\n        collection=coll.mn(),\n        vertices=trajectory[0][:, 0:3] * scale_dna,\n        edges=toplogy_to_bond_idx_pairs(topology)\n    )\n\n    # adding additional toplogy information from the topology and frames objects\n    obj.set_attribute(mol, 'res_name', topology[:, 1], \"INT\")\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "    obj.set_attribute(mol, 'chain_id', topology[:, 0], \"INT\")\n    obj.set_attribute(mol, 'Color', data=color.color_chains_equidistant(\n        topology[:, 0]), type='FLOAT_COLOR')\n    set_attributes_to_dna_mol(mol, trajectory[0], scale_dna=scale_dna)\n\n    # if the 'frames' file only contained one timepoint, return the object without creating\n    # any kind of collection for storing multiple frames from a trajectory, and a None\n    # object in place of the frames collection\n    if n_frames == 1:\n        if setup_nodes:\n            nodes.create_starting_node_tree(\n                mol, style=\"oxdna\", set_color=False)\n        return mol, None\n\n    # create a collection to store all of the frame objects that are part of the trajectory\n    # they will contain all of the possible attributes which can be interpolated betewen\n    # frames such as position, base_vector, base_normal, velocity, angular_velocity\n    collection = coll.frames(f\"{name}_frames\", parent=coll.data())\n    for i, frame in enumerate(trajectory):\n        fill_n = int(np.ceil(np.log10(n_frames)))\n        frame_name = f\"{name}_frame_{str(i).zfill(fill_n)}\"\n        frame_mol = obj.create_object(\n            frame[:, 0:3] * scale_dna, name=frame_name, collection=collection)\n        set_attributes_to_dna_mol(frame_mol, frame, scale_dna)\n\n    if setup_nodes:\n        nodes.create_starting_node_tree(\n            mol, coll_frames=collection, style=\"oxdna\", set_color=False)\n\n    return mol, collection",
        "completions": [
            "  ```\ndef read_topology_new(top):\n    # read in the topology file\n    with open(top, 'r') as f:\n        lines = f.readlines()\n\n    # parse the topology file\n    topology = []\n    for line in lines:\n        if line.startswith('ATOM'):\n            fields = line.split()\n            topology.append([fields[1], fields[2], fields[3]])\n\n    return topology\n```"
        ],
        "line": 36,
        "token": 317,
        "line_diff": 2,
        "token_diff": 2
    },
    {
        "id": 33,
        "name": "collate_fn",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\n\ndef collate_fn(batch):\n    if len(batch) != 1:\n        raise ValueError(\n            \"This trainer is not designed to handle multiple batches in a single collate.\"\n        )\n    debug_log(\"Begin collate_fn on batch\")\n\n    # SDXL Dropout\n    dropout_probability = StateTracker.get_args().caption_dropout_probability\n    examples = batch[0]\n\n    # Randomly drop captions/conditioning based on dropout_probability\n    for example in examples:\n        data_backend_id = example[\"data_backend_id\"]\n        if (\n            dropout_probability > 0\n            and dropout_probability is not None\n            and np.random.rand() < dropout_probability\n        ):\n            example[\"instance_prompt_text\"] = \"\"  # Drop caption\n            example[\"drop_conditioning\"] = True  # Flag to drop conditioning\n        else:\n            example[\"drop_conditioning\"] = False\n\n    debug_log(\"Collect luminance values\")\n    if \"luminance\" in examples[0]:\n        batch_luminance = [example[\"luminance\"] for example in examples]\n    else:\n        batch_luminance = [0] * len(examples)\n    # average it\n    batch_luminance = sum(batch_luminance) / len(batch_luminance)\n    debug_log(\"Extract filepaths\")\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "    filepaths = extract_filepaths(examples)\n    debug_log(\"Compute latents\")\n    latent_batch = compute_latents(filepaths, data_backend_id)\n    if \"deepfloyd\" not in StateTracker.get_args().model_type:\n        debug_log(\"Check latents\")\n        latent_batch = check_latent_shapes(\n            latent_batch, filepaths, data_backend_id, batch\n        )\n\n    # Compute embeddings and handle dropped conditionings\n    debug_log(\"Extract captions\")\n    captions = [example[\"instance_prompt_text\"] for example in examples]\n    debug_log(\"Pull cached text embeds\")\n    text_embed_cache = StateTracker.get_data_backend(data_backend_id)[\n        \"text_embed_cache\"\n    ]\n    prompt_embeds_all, add_text_embeds_all = compute_prompt_embeddings(\n        captions, text_embed_cache\n    )\n    batch_time_ids = None\n    if add_text_embeds_all is not None:\n        debug_log(\"Compute and stack SDXL time ids\")\n        batch_time_ids = gather_conditional_size_features(\n            examples, latent_batch, StateTracker.get_weight_dtype()\n        )\n        debug_log(f\"Time ids stacked to {batch_time_ids.shape}: {batch_time_ids}\")\n\n    return {\n        \"latent_batch\": latent_batch,\n        \"prompt_embeds\": prompt_embeds_all,\n        \"add_text_embeds\": add_text_embeds_all,\n        \"batch_time_ids\": batch_time_ids,\n        \"batch_luminance\": batch_luminance,\n    }",
        "completions": [
            "  ```\ndef collate_fn(batch):\n    if len(batch)!= 1:\n        raise ValueError(\n            \"This trainer is not designed to handle multiple batches in a single collate.\"\n        )\n    debug_log(\"Begin collate_fn on batch\")\n\n    # SDXL Dropout\n    dropout_probability = StateTracker.get_args().caption_dropout_probability\n    examples = batch[0]\n\n    # Randomly drop captions/conditioning based on dropout_probability\n    for example in examples:\n        data_backend_id = example[\"data_backend_id\"]\n        if (\n            dropout_probability > 0\n            and dropout_probability is not None\n            and np.random.rand() < dropout_probability\n        ):\n            example[\"instance_prompt_text\"] = \"\"  # Drop caption\n            example[\"drop_conditioning\"] = True  # Flag to drop conditioning\n        else:\n            example[\"drop_conditioning\"] = False\n\n    debug_log(\"Collect luminance values\")\n    if \"luminance\" in examples[0]:\n        batch_luminance = [example[\"luminance\"] for example in examples]\n    else:\n        batch_luminance = [0] * len(examples)\n    # average it\n    batch_luminance = sum(batch_luminance) / len(batch_luminance)\n    debug_log(\"Extract filepaths\")\n\n    return batch_luminance\n```"
        ],
        "line": 42,
        "token": 345,
        "line_diff": 2,
        "token_diff": 2
    },
    {
        "id": 34,
        "name": "compute_parameter_entropy",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\ndef compute_parameter_entropy(\n    parameter_config: vz.ParameterConfig,\n    parameter_values: Iterable[Optional[vz.ParameterValue]],\n) -> float:\n  values = [pv.value for pv in parameter_values if pv is not None]\n  if not values:\n    return 0.0\n  if parameter_config.type in [\n      vz.ParameterType.CATEGORICAL,\n      vz.ParameterType.DISCRETE,\n  ] and hasattr(parameter_config, 'feasible_values'):\n    if any([value not in parameter_config.feasible_values for value in values]):\n      raise ValueError(\n          f'Parameter values: {parameter_values} contain out-of-bound values.'\n          f' Feasible values: {parameter_config.feasible_values}'\n      )\n    _, counts = np.unique(values, return_counts=True)\n  elif hasattr(parameter_config, 'bounds'):\n    min_val = parameter_config.bounds[0]\n    max_val = parameter_config.bounds[1]\n    if any([value < min_val or value > max_val for value in values]):\n      raise ValueError(\n          f'Parameter values: {parameter_values} contain out-of-bound values.'\n          f' Bound: [{min_val}, {max_val}]'\n      )\n    if parameter_config.type == vz.ParameterType.INTEGER:\n      _, counts = np.unique(values, return_counts=True)\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "    else:\n      # Sets the number of fixed-width bins as c * sample_size ** (1.0 / 3.0).\n      # The cubic-root dependency on the sample size appears in several common\n      # bin-size selection strategies, e.g.\n      # https://en.wikipedia.org/wiki/Histogram#Number_of_bins_and_width\n      # The multiplier `c` is chosen such that for a small sample size, say 100,\n      # we still get a reasonable number of bins, say 30.\n      alpha = 1.0 / 3.0\n      c = 30.0 / (100**alpha)\n      n_sample = len(values)\n      # We also ensure that the number of bins is at most the sample size.\n      num_bins = np.min((int(c * n_sample**alpha), n_sample))\n      counts, _ = np.histogram(\n          values,\n          bins=np.linspace(\n              min_val,\n              max_val,\n              num=num_bins + 1,\n              dtype=np.float32,\n          ),\n      )\n  else:\n    raise ValueError(\n        'Invalid parameter config: either `feasible_values` or'\n        '`bounds` is expected to be set, but both are unset. '\n        f'Parameter config: {parameter_config}'\n    )\n  return float(scipy.stats.entropy(counts))",
        "completions": [
            "  ```\ndef compute_parameter_entropy(\n    parameter_config: vz.ParameterConfig,\n    parameter_values: Iterable[Optional[vz.ParameterValue]],\n) -> float:\n  values = [pv.value for pv in parameter_values if pv is not None]\n  if not values:\n    return 0.0\n  if parameter_config.type in [\n      vz.ParameterType.CATEGORICAL,\n      vz.ParameterType.DISCRETE,\n  ] and hasattr(parameter_config, 'feasible_values'):\n    if any([value not in parameter_config.feasible_values for value in values]):\n      raise ValueError(\n          f'Parameter values: {parameter_values} contain out-of-bound values.'\n          f' Feasible values: {parameter_config.feasible_values}'\n      )\n    _, counts = np.unique(values, return_counts=True)\n  elif hasattr(parameter_config, 'bounds'):\n    min_val = parameter_config.bounds[0]\n    max_val = parameter_config.bounds[1]\n    if any([value < min_val or value > max_val for value in values]):\n      raise ValueError(\n          f'Parameter values: {parameter_values} contain out-of-bound values.'\n          f' Bound: [{min_val}, {max_val}]'\n      )\n    if parameter_config.type == vz.ParameterType.INTEGER:\n      _, counts = np.unique(values, return_counts=True)\n  else:\n    raise ValueError(\n        f'Parameter type: {parameter_config.type} is not supported.'\n    )\n  return -np.sum(counts * np.log2(counts))\n```"
        ],
        "line": 40,
        "token": 340,
        "line_diff": 2,
        "token_diff": 2
    },
    {
        "id": 35,
        "name": "download_profile",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\n    async def download_profile(self, manufacturer: str, model: str, storage_path: str) -> None:\n\n        _LOGGER.info(\"Downloading profile: %s/%s from github\", manufacturer, model)\n\n        endpoint = f\"{ENDPOINT_DOWNLOAD}/{manufacturer}/{model}\"\n\n        def _save_file(data: bytes, directory: str) -> None:\n            \"\"\"Save file from Github to local storage directory\"\"\"\n            path = os.path.join(storage_path, directory)\n            os.makedirs(os.path.dirname(path), exist_ok=True)\n            with open(path, \"wb\") as f:\n                f.write(data)\n\n        async with aiohttp.ClientSession() as session:\n            try:\n                async with session.get(endpoint) as resp:\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "                    if resp.status != 200:\n                        raise ProfileDownloadError(f\"Failed to download profile: {manufacturer}/{model}\")\n                    resources = await resp.json()\n\n                await self.hass.async_add_executor_job(lambda: os.makedirs(storage_path, exist_ok=True))  # type: ignore\n\n                # Download the files\n                for resource in resources:\n                    url = resource.get(\"url\")\n                    async with session.get(url) as resp:\n                        if resp.status != 200:\n                            raise ProfileDownloadError(f\"Failed to download github URL: {url}\")\n\n                        contents = await resp.read()\n                        await self.hass.async_add_executor_job(_save_file, contents, resource.get(\"path\"))  # type: ignore\n            except aiohttp.ClientError as e:\n                raise ProfileDownloadError(f\"Failed to download profile: {manufacturer}/{model}\") from e",
        "completions": [
            "  ```\nasync with session.get(endpoint) as resp:\n    data = await resp.read()\n    _save_file(data, manufacturer)\n```"
        ],
        "line": 25,
        "token": 285,
        "line_diff": 1,
        "token_diff": 1
    },
    {
        "id": 36,
        "name": "iterate_segmented_dataset",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\ndef iterate_segmented_dataset(\n    data,\n    segmentation=None,\n    feature_processor=None,\n    feature_processor_kwargs=None,\n    feature_processor_segment_name_mapping=None,\n):\n    if feature_processor is None:\n        feature_processor = filter_zero_weights_feature_processor\n\n    if feature_processor_kwargs is None:\n        feature_processor_kwargs = {}\n\n    if feature_processor_segment_name_mapping is None:\n        feature_processor_segment_name_mapping = {}\n\n    def _apply_feature_processor(segment_name, segment_data):\n        feature_processor_segment_name = feature_processor_segment_name_mapping.get(\n            segment_name, segment_name\n        )\n\n        if feature_processor is not None:\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "            segment_data = feature_processor(\n                feature_processor_segment_name, segment_data, **feature_processor_kwargs\n            )\n        return segment_data\n\n    def _add_weights(data, weights):\n        return pd.merge(data, weights, left_index=True, right_index=True)\n\n    if segmentation is None:\n        # spoof segment name and weights column\n        segment_name = None\n        weights = pd.DataFrame({\"weight\": 1}, index=data.index)\n        segment_data = _add_weights(data, weights)\n\n        segment_data = _apply_feature_processor(segment_name, segment_data)\n        yield segment_name, segment_data\n    else:\n        for segment_name, segment_weights in segmentation.items():\n            weights = segment_weights.to_frame(\"weight\")\n            segment_data = _add_weights(data, weights)\n            segment_data = _apply_feature_processor(segment_name, segment_data)\n            yield segment_name, segment_data",
        "completions": [
            "  \n[PYTHON]\ndef iterate_segmented_dataset(\n    data,\n    segmentation=None,\n    feature_processor=None,\n    feature_processor_kwargs=None,\n    feature_processor_segment_name_mapping=None,\n):\n    if feature_processor is None:\n        feature_processor = filter_zero_weights_feature_processor\n\n    if feature_processor_kwargs is None:\n        feature_processor_kwargs = {}\n\n    if feature_processor_segment_name_mapping is None:\n        feature_processor_segment_name_mapping = {}\n\n    def _apply_feature_processor(segment_name, segment_data):\n        feature_processor_segment_name = feature_processor_segment_name_mapping.get(\n            segment_name, segment_name\n        )\n\n        if feature_processor is not None:\n            segment_data = feature_processor(\n                segment_data, **feature_processor_kwargs\n            )\n\n        return segment_data\n[/PYTHON]\n"
        ],
        "line": 31,
        "token": 211,
        "line_diff": 1,
        "token_diff": 0
    },
    {
        "id": 37,
        "name": "plot_trend",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\n\n    def plot_trend(\n        self,\n        series=None,\n        vline=None,\n        colors=[\"#d4f74f\", \"#82ab5a\", \"#ff6c05\", \"#c12600\"],\n        title=None,\n        start_date=None,\n        **kwargs,\n    ):\n        # YMAX from PLOT ONLY\n        if series is None:\n            series = random.choice(self.column_names)\n        if title is None:\n            title = f\"Trend Breakdown for {series}\"\n        p_indx = self.column_names.get_loc(series)\n        cur_trend = self.predicted_trend[series].copy()\n        tls = len(cur_trend)\n        plot_df = pd.DataFrame(\n            {\n                'decline_accelerating': cur_trend[\n                    (\n                        np.hstack((np.signbit(self.accel[:, p_indx]), False))\n                        & self.slope_sign[:, p_indx]\n                    )[-tls:]\n                ],\n                'decline_decelerating': cur_trend[\n                    (\n                        (~np.hstack((np.signbit(self.accel[:, p_indx]), False)))\n                        & self.slope_sign[:, p_indx]\n                    )[-tls:]\n                ],\n                'growth_decelerating': cur_trend[\n                    (\n                        np.hstack((np.signbit(self.accel[:, p_indx]), False))\n                        & (~self.slope_sign[:, p_indx])\n                    )[-tls:]\n                ],\n                'growth_accelerating': cur_trend[\n                    (\n                        (~np.hstack((np.signbit(self.accel[:, p_indx]), False)))\n                        & (~self.slope_sign[:, p_indx])\n                    )[-tls:]\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "                ],\n            },\n            index=cur_trend.index,\n        )\n        if start_date == \"auto\":\n            slx = self.forecast_length * 3\n            if slx > len(plot_df.index):\n                slx = 0\n            start_date = plot_df.index[-slx]\n        if start_date is not None:\n            plot_df = plot_df[plot_df.index >= start_date]\n        ax = plot_df.plot(title=title, color=colors, **kwargs)\n        handles, labels = ax.get_legend_handles_labels()\n        # ax.scatter(cur_trend.index[self.changepoints[:, p_indx]], cur_trend[self.changepoints[:, p_indx]], c='#fdcc09', s=4.0)\n        # ax.scatter(cur_trend.index[self.zero_crossings[:, p_indx]], cur_trend[self.zero_crossings[:, p_indx]], c='#512f74', s=4.0)\n        if self.trend_anomaly_detector is not None:\n            if self.trend_anomaly_detector.output == \"univariate\":\n                i_anom = self.trend_anomaly_detector.anomalies.index[\n                    self.anomaly_detector.anomalies.iloc[:, 0] == -1\n                ]\n            else:\n                series_anom = self.trend_anomaly_detector.anomalies[series]\n                i_anom = series_anom[series_anom == -1].index\n            if start_date is not None:\n                i_anom = i_anom[i_anom >= start_date]\n            i_anom = i_anom[i_anom >= cur_trend.index[0]]\n            # only plot if some anomalies, and not way too many anomalies\n            if len(i_anom) > 0 and len(i_anom) < len(plot_df) * 0.5:\n                scat1 = ax.scatter(\n                    i_anom.tolist(), cur_trend.loc[i_anom], c=\"red\", s=16.0\n                )\n                handles += [scat1]\n                labels += ['trend anomalies']\n        if vline is not None:\n            ax.vlines(\n                x=vline,\n                ls='--',\n                lw=1,\n                colors='darkred',\n                ymin=cur_trend[cur_trend.index >= start_date].min(),\n                ymax=cur_trend[cur_trend.index >= start_date].max(),\n            )\n        ax.legend(handles, labels)\n        return ax",
        "completions": [
            "  ```\ndef plot_trend(\n    self,\n    series=None,\n    vline=None,\n    colors=[\"#d4f74f\", \"#82ab5a\", \"#ff6c05\", \"#c12600\"],\n    title=None,\n    start_date=None,\n    **kwargs,\n):\n    # YMAX from PLOT ONLY\n    if series is None:\n        series = random.choice(self.column_names)\n    if title is None:\n        title = f\"Trend Breakdown for {series}\"\n    p_indx = self.column_names.get_loc(series)\n    cur_trend = self.predicted_trend[series].copy()\n    tls = len(cur_trend)\n    plot_df = pd.DataFrame(\n        {\n            'decline_accelerating': cur_trend[\n                (\n                    np.hstack((np.signbit(self.accel[:, p_indx]), False))\n                    & self.slope_sign[:, p_indx]\n                )[-tls:]\n            ],\n            'decline_decelerating': cur_trend[\n                (\n                    (~np.hstack((np.signbit(self.accel[:, p_indx]), False)))\n                    & self.slope_sign[:, p_indx]\n                )[-tls:]\n            ],\n            'growth_decelerating': cur_trend[\n                (\n                    np.hstack((np.signbit(self.accel[:, p_indx]), False))\n                    & (~self.slope_sign[:, p_indx])\n                )[-tls:]\n            ],\n            'growth_accelerating': cur_trend[\n                (\n                    (~np.hstack((np.signbit(self.accel[:, p_indx]), False)))\n                    & (~self.slope_sign[:, p_indx])\n                )[-tls:]\n            ],\n        }\n    )\n    ymax = plot_df.max().max()\n    return ymax\n```"
        ],
        "line": 55,
        "token": 398,
        "line_diff": 2,
        "token_diff": 2
    },
    {
        "id": 38,
        "name": "spectrogram",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\ndef spectrogram(\n    waveform: np.ndarray,\n    window: np.ndarray,\n    frame_length: int,\n    hop_length: int,\n    fft_length: Optional[int] = None,\n    power: Optional[float] = 1.0,\n    center: bool = True,\n    pad_mode: str = \"reflect\",\n    onesided: bool = True,\n    preemphasis: Optional[float] = None,\n    mel_filters: Optional[np.ndarray] = None,\n    mel_floor: float = 1e-10,\n    log_mel: Optional[str] = None,\n    reference: float = 1.0,\n    min_value: float = 1e-10,\n    db_range: Optional[float] = None,\n    dtype: np.dtype = np.float32,\n) -> np.ndarray:\n    window_length = len(window)\n\n    if fft_length is None:\n        fft_length = frame_length\n\n    if frame_length > fft_length:\n        raise ValueError(f\"frame_length ({frame_length}) may not be larger than fft_length ({fft_length})\")\n\n    if window_length != frame_length:\n        raise ValueError(f\"Length of the window ({window_length}) must equal frame_length ({frame_length})\")\n\n    if hop_length <= 0:\n        raise ValueError(\"hop_length must be greater than zero\")\n\n    if waveform.ndim != 1:\n        raise ValueError(f\"Input waveform must have only one dimension, shape is {waveform.shape}\")\n\n    if np.iscomplexobj(waveform):\n        raise ValueError(\"Complex-valued input waveforms are not currently supported\")\n\n    # center pad the waveform\n    if center:\n        padding = [(int(frame_length // 2), int(frame_length // 2))]\n        waveform = np.pad(waveform, padding, mode=pad_mode)\n\n    # promote to float64, since np.fft uses float64 internally\n    waveform = waveform.astype(np.float64)\n    window = window.astype(np.float64)\n\n    # split waveform into frames of frame_length size\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "    num_frames = int(1 + np.floor((waveform.size - frame_length) / hop_length))\n\n    num_frequency_bins = (fft_length // 2) + 1 if onesided else fft_length\n    spectrogram = np.empty((num_frames, num_frequency_bins), dtype=np.complex64)\n\n    # rfft is faster than fft\n    fft_func = np.fft.rfft if onesided else np.fft.fft\n    buffer = np.zeros(fft_length)\n\n    timestep = 0\n    for frame_idx in range(num_frames):\n        buffer[:frame_length] = waveform[timestep : timestep + frame_length]\n\n        if preemphasis is not None:\n            buffer[1:frame_length] -= preemphasis * buffer[: frame_length - 1]\n            buffer[0] *= 1 - preemphasis\n\n        buffer[:frame_length] *= window\n\n        spectrogram[frame_idx] = fft_func(buffer)\n        timestep += hop_length\n\n    # note: ** is much faster than np.power\n    if power is not None:\n        spectrogram = np.abs(spectrogram, dtype=np.float64) ** power\n\n    spectrogram = spectrogram.T\n\n    if mel_filters is not None:\n        spectrogram = np.maximum(mel_floor, np.dot(mel_filters.T, spectrogram))\n\n    if power is not None and log_mel is not None:\n        if log_mel == \"log\":\n            spectrogram = np.log(spectrogram)\n        elif log_mel == \"log10\":\n            spectrogram = np.log10(spectrogram)\n        elif log_mel == \"dB\":\n            if power == 1.0:\n                spectrogram = amplitude_to_db(spectrogram, reference, min_value, db_range)\n            elif power == 2.0:\n                spectrogram = power_to_db(spectrogram, reference, min_value, db_range)\n            else:\n                raise ValueError(f\"Cannot use log_mel option '{log_mel}' with power {power}\")\n        else:\n            raise ValueError(f\"Unknown log_mel option: {log_mel}\")\n\n        spectrogram = np.asarray(spectrogram, dtype)\n\n    return spectrogram",
        "completions": [
            "  ```\n    # split waveform into frames of frame_length size\n    frames = np.lib.stride_tricks.as_strided(\n        waveform,\n        shape=(waveform.shape[0] - frame_length + 1, frame_length),\n        strides=(waveform.itemsize, waveform.itemsize)\n    )\n\n    # apply window function to each frame\n    frames = frames * window\n\n    # compute the FFT of each frame\n    fft_frames = np.fft.rfft(frames, n=fft_length)\n\n    # compute the power spectrum of each frame\n    power_spectrum = np.abs(fft_frames) ** power\n\n    # apply mel filterbank to power spectrum\n    if mel_filters is not None:\n        power_spectrum = np.dot(power_spectrum, mel_filters.T)\n\n    # apply logarithm to power spectrum\n    if log_mel is not None:\n        power_spectrum = np.log(power_spectrum + mel_floor)\n\n    # normalize power spectrum\n    power_spectrum = power_spectrum / np.max(power_spectrum)\n\n    # convert to decibels\n    power_spectrum = power_spectrum * db_range\n\n    # convert to dtype\n    power_spectrum = power_spectrum.astype(dtype)\n\n    # return the spectrogram\n    return power_spectrum\n```"
        ],
        "line": 53,
        "token": 473,
        "line_diff": 2,
        "token_diff": 2
    },
    {
        "id": 39,
        "name": "_spatial_matching",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\ndef _spatial_matching(\n    x,\n    y=None,\n    n_matches=5,\n    metric=\"euclidean\",\n    solver=None,\n    return_mip=False,\n    allow_partial_match=False,\n    **metric_kwargs,\n):\n    try:\n        import pulp\n    except ImportError as error:\n        raise ImportError(\"spatial matching requires the pulp library\") from error\n    if metric == \"precomputed\":\n        distance_matrix = x\n        match_between = y is not None\n    elif y is not None:\n        x, x_ids, _ = _validate_geometry_input(\n            x, ids=None, valid_geometry_types=_VALID_GEOMETRY_TYPES\n        )\n        y, y_ids, _ = _validate_geometry_input(\n            y, ids=None, valid_geometry_types=_VALID_GEOMETRY_TYPES\n        )\n        distance_matrix = pairwise_distances(x, y, metric=metric)\n        match_between = True\n    else:\n        x, x_ids, _ = _validate_geometry_input(\n            x, ids=None, valid_geometry_types=_VALID_GEOMETRY_TYPES\n        )\n        y_ids = x_ids\n        distance_matrix = pairwise_distances(x, metric=metric, **metric_kwargs)\n\n        match_between = False\n\n    n_targets, n_sources = distance_matrix.shape\n\n    if match_between:\n        row, col = numpy.meshgrid(\n            numpy.arange(n_targets), numpy.arange(n_sources), indexing=\"ij\"\n        )\n        row = row.flatten()\n        col = col.flatten()\n    else:\n        # if we are matching within, we need to\n        row, col = numpy.triu_indices(\n            n=n_targets, m=n_sources, k=int(not match_between)\n        )\n\n    mp = pulp.LpProblem(\"optimal-neargraph\", sense=pulp.LpMinimize)\n    # a match is as binary decision variable, connecting observation i to observation j\n    match_vars = pulp.LpVariable.dicts(\n        \"match\",\n        lowBound=0,\n        upBound=1,\n        indices=zip(row, col, strict=True),\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "        cat=\"Continuous\" if allow_partial_match else \"Binary\",\n    )\n    # we want to minimize the geographic distance of links in the graph\n    mp.objective = pulp.lpSum(\n        [\n            match_vars[i, j] * distance_matrix[i, j]\n            for i, j in zip(row, col, strict=True)\n        ]\n    )\n\n    # for each observation\n    for j in range(n_targets):\n        # there must be exactly k other matched observations, which might live\n        if match_between:\n            summand = pulp.lpSum(\n                [\n                    # over the whole match matrix\n                    match_vars[j, i]\n                    for i in range(n_sources)\n                ]\n            )\n            sense = 1\n        else:\n            summand = pulp.lpSum(\n                [\n                    # in the \"upper\" triangle, or \"lower\" triangle\n                    match_vars.get((i, j), match_vars.get((j, i)))\n                    for i in range(n_sources)\n                    if (i != j)\n                ]\n            )\n            sense = int(not allow_partial_match)\n        mp += pulp.LpConstraint(summand, sense=sense, rhs=n_matches)\n    if match_between:  # but, we may choose to ignore some sources\n        for i in range(n_sources):\n            summand = pulp.lpSum([match_vars[j, i] for j in range(n_targets)])\n            mp += pulp.LpConstraint(summand, sense=-1, rhs=n_matches)\n\n    status = mp.solve(solver)\n\n    if (status != 1) & (not allow_partial_match):\n        warnings.warn(\n            f\"Problem is {pulp.LpStatus[status]}, so edge weights may be non-integer!\",\n            stacklevel=2,\n        )\n\n    edges = [\n        (*key, value.value()) for key, value in match_vars.items() if value.value() > 0\n    ]\n    if not match_between:\n        edges.extend([(*tuple(reversed(edge[:-1])), edge[-1]) for edge in edges])\n\n    heads, tails, weights = map(numpy.asarray, zip(*sorted(edges), strict=True))\n\n    if return_mip:\n        return x_ids[heads], y_ids[tails], weights, mp\n    return x_ids[heads], y_ids[tails], weights",
        "completions": [
            "  ```\n    match_vars = pulp.LpVariable.dicts(\n        \"match\",\n        lowBound=0,\n        upBound=1,\n        indices=zip(row, col, strict=True),\n    )\n    # add constraints\n    mp.add_constraints(\n        pulp.LpConstraint.dicts(\n            \"match_constraint\",\n            {\n                (i, j): pulp.LpConstraint(\n                    pulp.LpAffineExpression(match_vars[i, j]),\n                    sense=pulp.LpConstraint.Sense.LE,\n                    rhs=1,\n                )\n                for i, j in match_vars\n            },\n        )\n    )\n    # add objective function\n    mp.set_objective(\n        pulp.LpAffineExpression(\n            {\n                (i, j): distance_matrix[i, j] * match_vars[i, j]\n                for i, j in match_vars\n            }\n        )\n    )\n    # solve the problem\n    mp.solve()\n    # get the results\n    match_results = {\n        (i, j): match_vars[i, j].varValue\n        for i, j in match_vars\n        if match_vars[i, j].varValue > 0\n    }\n    return match_results\n```"
        ],
        "line": 65,
        "token": 407,
        "line_diff": 2,
        "token_diff": 2
    },
    {
        "id": 40,
        "name": "load",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\n\n    def load(target: Union[str, dict], dictionary: Dictionary, mmap: bool = True) -> \"LinkGraph\":\n        if not isinstance(target, dict):\n            if mmap:\n                target = joblib.load(target, mmap_mode=\"r\")\n            else:\n                target = joblib.load(target)\n\n        if target[\"build_params\"][\"dictionary\"] != dictionary.uuid:\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "            raise RuntimeError(\"The specified dictionary is different from the one used to build this link graph\")\n\n        indices = target.pop(\"indices\")\n        indptr = target.pop(\"indptr\")\n        if not isinstance(indices, np.ndarray):\n            indices = np.frombuffer(indices, dtype=np.int32)\n            indptr = np.frombuffer(indptr, dtype=np.int32)\n\n        return LinkGraph(dictionary, indices, indptr, **target)",
        "completions": [
            "  \n[PYTHON]\ndef load(target: Union[str, dict], dictionary: Dictionary, mmap: bool = True) -> \"LinkGraph\":\n    if not isinstance(target, dict):\n        if mmap:\n            target = joblib.load(target, mmap_mode=\"r\")\n        else:\n            target = joblib.load(target)\n\n    if target[\"build_params\"][\"dictionary\"]!= dictionary.uuid:\n        raise ValueError(\"The dictionary UUID in the target file does not match the provided dictionary.\")\n\n    return LinkGraph(target, dictionary)\n[/PYTHON]\n"
        ],
        "line": 20,
        "token": 223,
        "line_diff": 0,
        "token_diff": 0
    },
    {
        "id": 41,
        "name": "spherical_bessel_smooth",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\ndef spherical_bessel_smooth(r: Tensor, cutoff: float = 5.0, max_n: int = 10) -> Tensor:\n    n = torch.arange(max_n).type(dtype=matgl.float_th)[None, :]\n    r = r[:, None]\n    fnr = (\n        (-1) ** n\n        * sqrt(2.0)\n        * pi\n        / cutoff**1.5\n        * (n + 1)\n        * (n + 2)\n        / torch.sqrt(2 * n**2 + 6 * n + 5)\n        * (_sinc(r * (n + 1) * pi / cutoff) + _sinc(r * (n + 2) * pi / cutoff))\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "    )\n    en = n**2 * (n + 2) ** 2 / (4 * (n + 1) ** 4 + 1)\n    dn = [torch.tensor(1.0)]\n    for i in range(1, max_n):\n        dn_value = 1 - en[0, i] / dn[-1]\n        dn.append(dn_value)\n    dn = torch.stack(dn)  # type: ignore\n    gn = [fnr[:, 0]]\n    for i in range(1, max_n):\n        gn_value = 1 / torch.sqrt(dn[i]) * (fnr[:, i] + torch.sqrt(en[0, i] / dn[i - 1]) * gn[-1])\n        gn.append(gn_value)\n\n    return torch.t(torch.stack(gn))",
        "completions": [
            "  ```\ndef _sinc(x):\n    return torch.sin(x) / x\n```"
        ],
        "line": 25,
        "token": 275,
        "line_diff": 1,
        "token_diff": 1
    },
    {
        "id": 42,
        "name": "losses",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\n    def losses(self, predictions, proposals):\n        scores, proposal_deltas = predictions\n\n        # parse classification outputs\n        gt_classes = (\n            cat([p.gt_classes for p in proposals], dim=0) if len(proposals) else torch.empty(0)\n        )\n        _log_classification_stats(scores, gt_classes)\n\n        # parse box regression outputs\n        if len(proposals):\n            proposal_boxes = cat([p.proposal_boxes.tensor for p in proposals], dim=0)  # Nx4\n            assert not proposal_boxes.requires_grad, \"Proposals should not require gradients!\"\n            # If \"gt_boxes\" does not exist, the proposals must be all negative and\n            # should not be included in regression loss computation.\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "            # Here we just use proposal_boxes as an arbitrary placeholder because its\n            # value won't be used in self.box_reg_loss().\n            gt_boxes = cat(\n                [(p.gt_boxes if p.has(\"gt_boxes\") else p.proposal_boxes).tensor for p in proposals],\n                dim=0,\n            )\n        else:\n            proposal_boxes = gt_boxes = torch.empty((0, 4), device=proposal_deltas.device)\n\n        losses = {\n            \"loss_cls\": cross_entropy(scores, gt_classes, reduction=\"mean\"),\n            \"loss_box_reg\": self.box_reg_loss(\n                proposal_boxes, gt_boxes, proposal_deltas, gt_classes\n            ),\n        }\n        return {k: v * self.loss_weight.get(k, 1.0) for k, v in losses.items()}",
        "completions": [
            "  ```\ndef losses(self, predictions, proposals):\n    scores, proposal_deltas = predictions\n\n    # parse classification outputs\n    gt_classes = (\n        cat([p.gt_classes for p in proposals], dim=0) if len(proposals) else torch.empty(0)\n    )\n    _log_classification_stats(scores, gt_classes)\n\n    # parse box regression outputs\n    if len(proposals):\n        proposal_boxes = cat([p.proposal_boxes.tensor for p in proposals], dim=0)  # Nx4\n        assert not proposal_boxes.requires_grad, \"Proposals should not require gradients!\"\n        # If \"gt_boxes\" does not exist, the proposals must be all negative and\n        # should not be included in regression loss computation.\n\n        # Compute the regression loss\n        regression_loss = self.box_reg_loss_func(\n            proposal_boxes,\n            gt_boxes,\n            proposal_deltas,\n            gt_classes,\n        )\n\n        return regression_loss\n```"
        ],
        "line": 26,
        "token": 264,
        "line_diff": 1,
        "token_diff": 1
    },
    {
        "id": 43,
        "name": "eye",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\n\ndef eye(N, M=None, k=None, dtype=None):\n    dtype = to_torch_dtype(dtype or config.floatx())\n    M = N if M is None else M\n    k = 0 if k is None else k\n    if k == 0:\n        # TODO: torch.eye doesn't support bfloat16 with cpu\n        if get_device() == \"cpu\" and dtype == torch.bfloat16:\n            return cast(\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "                torch.eye(\n                    N, M, dtype=to_torch_dtype(\"float32\"), device=get_device()\n                ),\n                dtype,\n            )\n        return torch.eye(N, M, dtype=dtype, device=get_device())\n    diag_length = builtins.max(N, M)\n    diag = torch.ones(diag_length, dtype=dtype, device=get_device())\n    return torch.diag(diag, diagonal=k)[:N, :M]",
        "completions": [
            "  \n[PYTHON]\n```\ndef eye(N, M=None, k=None, dtype=None):\n    dtype = to_torch_dtype(dtype or config.floatx())\n    M = N if M is None else M\n    k = 0 if k is None else k\n    if k == 0:\n        # TODO: torch.eye doesn't support bfloat16 with cpu\n        if get_device() == \"cpu\" and dtype == torch.bfloat16:\n            return cast(torch.eye(N, M, dtype=torch.float32), dtype=dtype)\n        else:\n            return torch.eye(N, M, dtype=dtype)\n```\n[/PYTHON]\n"
        ],
        "line": 21,
        "token": 219,
        "line_diff": 0,
        "token_diff": 0
    },
    {
        "id": 44,
        "name": "parse_form_metadata",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\n\ndef parse_form_metadata(data: MultiDict) -> Metadata:\n    # We construct a RawMetdata using the form data, which we will later pass\n    # to Metadata to get a validated metadata.\n    #\n    # NOTE: Form data is very similiar to the email format where the only difference\n    #       between a list and a single value is whether or not the same key is used\n    #       multiple times. Thus we will handle things in a similiar way, always\n    #       fetching things as a list and then determining what to do based on the\n    #       field type and how many values we found.\n    #\n    #       In general, large parts of this have been taken directly from\n    #       packaging.metadata and adjusted to work with form data.\n    raw: dict[str, str | list[str] | dict[str, str]] = {}\n    unparsed: dict[str, list[str]] = {}\n\n    for name in frozenset(data.keys()):\n        # We have to be lenient in the face of \"extra\" data, because the data\n        # value here might contain unrelated form data, so we'll skip thing for\n        # fields that aren't in our list of values.\n        raw_name = _FORM_TO_RAW_MAPPING.get(name)\n        if raw_name is None:\n            continue\n\n        # We use getall() here, even for fields that aren't multiple use,\n        # because otherwise someone could have e.g. two Name fields, and we\n        # would just silently ignore it rather than doing something about it.\n        value = data.getall(name) or []\n\n        # An empty string is invalid for all fields, treat it as if it wasn't\n        # provided in the first place\n        if value == [\"\"]:\n            continue\n\n        # If this is one of our string fields, then we'll check to see if our\n        # value is a list of a single item. If it is then we'll assume that\n        # it was emitted as a single string, and unwrap the str from inside\n        # the list.\n        #\n        # If it's any other kind of data, then we haven't the faintest clue\n        # what we should parse it as, and we have to just add it to our list\n        # of unparsed stuff.\n        if raw_name in _STRING_FIELDS and len(value) == 1:\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "            raw[raw_name] = value[0]\n        # If this is one of our list of string fields, then we can just assign\n        # the value, since forms *only* have strings, and our getall() call\n        # above ensures that this is a list.\n        elif raw_name in _LIST_FIELDS:\n            raw[raw_name] = value\n        # Special Case: Keywords\n        # The keywords field is implemented in the metadata spec as a str,\n        # but it conceptually is a list of strings, and is serialized using\n        # \", \".join(keywords), so we'll do some light data massaging to turn\n        # this into what it logically is.\n        elif raw_name == \"keywords\" and len(value) == 1:\n            raw[raw_name] = _parse_keywords(value[0])\n        # Special Case: Project-URL\n        # The project urls is implemented in the metadata spec as a list of\n        # specially-formatted strings that represent a key and a value, which\n        # is fundamentally a mapping, however the email format doesn't support\n        # mappings in a sane way, so it was crammed into a list of strings\n        # instead.\n        #\n        # We will do a little light data massaging to turn this into a map as\n        # it logically should be.\n        elif raw_name == \"project_urls\":\n            try:\n                raw[raw_name] = _parse_project_urls(value)\n            except KeyError:\n                unparsed[name] = value\n        # Nothing that we've done has managed to parse this, so it'll just\n        # throw it in our unparseable data and move on.\n        else:\n            unparsed[name] = value\n\n    # If we have any unparsed data, then we treat that as an error\n    if unparsed:\n        raise ExceptionGroup(\n            \"unparsed\",\n            [InvalidMetadata(key, f\"{key!r} has invalid data\") for key in unparsed],\n        )\n\n    # We need to cast our `raw` to a metadata, because a TypedDict only support\n    # literal key names, but we're computing our key names on purpose, but the\n    # way this function is implemented, our `TypedDict` can only have valid key\n    # names.\n    return Metadata.from_raw(typing.cast(RawMetadata, raw))",
        "completions": [
            "  ```\nif raw_name in _STRING_FIELDS and len(value) == 1:\n    value = value[0]\n```"
        ],
        "line": 51,
        "token": 577,
        "line_diff": 2,
        "token_diff": 2
    },
    {
        "id": 45,
        "name": "aggregate",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\n\n    def aggregate(self, pairs: List[Tuple[Model, float]]) -> Model:\n        \n        model = Model()\n\n        sample_model = pairs[0][0]\n        \n        if sample_model.encrypted:\n            raise RuntimeError(\"Cannot aggregate encrypted tensors using Federated Average.\")\n        \n        for tensor in sample_model.tensors:\n            model.tensors.append(copy.deepcopy(tensor))\n\n        total_tensors = len(model.tensors)\n\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "        for var_idx in range(total_tensors):\n            var_num_values = model.tensors[var_idx].length\n\n            aggregated_tensor = np.zeros(var_num_values)\n            \n            for pair in pairs:\n                local_model = pair[0]\n                scaling_factor = pair[1]\n                local_tensor = local_model.tensors[var_idx]\n                \n                t2_r = np.frombuffer(local_tensor.value, dtype=MODEL_WEIGHTS_DTYPE) * scaling_factor\n                aggregated_tensor += t2_r\n                \n            model.tensors[var_idx].value = serialize_tensor(aggregated_tensor)\n\n        return model",
        "completions": [
            "  \n[PYTHON]\ndef aggregate(self, pairs: List[Tuple[Model, float]]) -> Model:\n    model = Model()\n    sample_model = pairs[0][0]\n    if sample_model.encrypted:\n        raise RuntimeError(\"Cannot aggregate encrypted tensors using Federated Average.\")\n    for tensor in sample_model.tensors:\n        model.tensors.append(copy.deepcopy(tensor))\n    total_tensors = len(model.tensors)\n    for pair in pairs:\n        model_update = pair[0]\n        update_weight = pair[1]\n        for i in range(total_tensors):\n            model.tensors[i] += update_weight * (model_update.tensors[i] - model.tensors[i])\n    return model\n[/PYTHON]\n"
        ],
        "line": 21,
        "token": 219,
        "line_diff": 0,
        "token_diff": 0
    },
    {
        "id": 46,
        "name": "_prepare_prompt",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\n\n    def _prepare_prompt(\n        self,\n        seq_group_metadata_list: List[SequenceGroupMetadata],\n    ) -> PreparePromptMetadata:\n        input_tokens: List[int] = []\n        input_positions: List[int] = []\n        slot_mapping: List[int] = []\n        lora_index_mapping: List[int] = []\n        lora_prompt_mapping: List[int] = []\n        lora_requests: Set[LoRARequest] = set()\n\n        prompt_lens: List[int] = []\n        context_lens: List[int] = []\n        subquery_lens: List[int] = []\n        prefix_block_tables: List[List[int]] = []\n        multi_modal_input_list: List[torch.Tensor] = []\n\n        if len(seq_group_metadata_list) == 0:\n            return PreparePromptMetadata.empty()\n\n        for seq_group_metadata in seq_group_metadata_list:\n            assert seq_group_metadata.is_prompt\n            seq_ids = list(seq_group_metadata.seq_data.keys())\n            assert len(seq_ids) == 1\n            seq_id = seq_ids[0]\n\n            computed_block_nums = seq_group_metadata.computed_block_nums\n            if (self.scheduler_config is not None\n                    and self.scheduler_config.chunked_prefill_enabled\n                    and not (computed_block_nums is None\n                             or computed_block_nums == [])):\n                raise RuntimeError(\n                    \"chunked prefill cannot be used with prefix caching \"\n                    \"now.\")\n\n            token_chunk_size = seq_group_metadata.token_chunk_size\n            seq_data = seq_group_metadata.seq_data[seq_id]\n            computed_len = seq_data.get_num_computed_tokens()\n            # We should use get_len here because in case of preemption\n            # it contains output tokens.\n            prefill_end = min(seq_data.get_len(),\n                              computed_len + token_chunk_size)\n            prompt_tokens = seq_data.get_token_ids()[computed_len:prefill_end]\n            prompt_len = prefill_end\n            prompt_lens.append(prompt_len)\n\n            # NOTE: This only works for oooooooxxx style attention.\n            if computed_block_nums is not None and len(\n                    computed_block_nums) > 0 and self.sliding_window is None:\n                # Prefix is not supported with sliding_window\n                computed_len = len(computed_block_nums) * self.block_size\n                prompt_tokens = prompt_tokens[computed_len:]\n                prefix_block_tables.append(computed_block_nums)\n            elif self.scheduler_config.chunked_prefill_enabled:\n                if seq_group_metadata.block_tables is not None:\n                    # Prefill has chunked before.\n                    block_table = seq_group_metadata.block_tables[seq_id]\n                    prefix_block_tables.append(block_table)\n                else:\n                    # The first prefill.\n                    prefix_block_tables.append([])\n            else:\n                prefix_block_tables.append([])\n                # Right now, prefill start is always 0. However, this\n                # assumption can be changed once chunked prefill is introduced.\n                assert computed_len == 0\n\n            # actual prompt lens\n            context_lens.append(computed_len)\n            subquery_lens.append(prompt_len - computed_len)\n\n            input_tokens.extend(prompt_tokens)\n            # NOTE(woosuk): Here we assume that the first token in the prompt\n            # is always the first token in the sequence.\n            input_positions.extend(list(range(computed_len, prefill_end)))\n            lora_id = seq_group_metadata.lora_int_id\n\n            if lora_id > 0:\n                lora_requests.add(seq_group_metadata.lora_request)\n\n            lora_index_mapping += [lora_id] * (prompt_len - computed_len)\n            lora_prompt_mapping.extend(\n                [lora_id] *\n                (prompt_len - computed_len\n                 if seq_group_metadata.sampling_params.prompt_logprobs else 1))\n\n            if seq_group_metadata.multi_modal_data:\n                multi_modal_input_list.append(\n                    seq_group_metadata.multi_modal_data.data)\n\n            if seq_group_metadata.block_tables is None:\n                # During memory profiling, the block tables are not initialized\n                # yet. In this case, we just use a dummy slot mapping.\n                slot_mapping.extend([_PAD_SLOT_ID] * prompt_len)\n                continue\n\n            # Compute the slot mapping.\n            block_table = seq_group_metadata.block_tables[seq_id]\n            # Mask the [0, start_idx) tokens of the prompt with _PAD_SLOT_ID,\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "            # where start_idx is max(0, prompt_len - sliding_window).\n            # For example, if the prompt len is 10, sliding window is 8, and\n            # block size is 4, the first two tokens are masked and the slot\n            # mapping will be [-1, -1, 2, 3, 4, 5, 6, 7, 0, 1].\n            start_idx = 0\n            if self.sliding_window is not None:\n                assert computed_len == 0, (\n                    \"Prefix caching is currently not supported with \"\n                    \"sliding window attention\")\n                start_idx = max(0, prompt_len - self.sliding_window)\n\n            for i in range(computed_len, prefill_end):\n                if i < start_idx:\n                    slot_mapping.append(_PAD_SLOT_ID)\n                    continue\n\n                block_number = block_table[i // self.block_size]\n                block_offset = i % self.block_size\n                slot = block_number * self.block_size + block_offset\n                slot_mapping.append(slot)\n\n        max_subquery_len = max(subquery_lens)\n        max_prompt_len = max(prompt_lens)\n        assert max_subquery_len > 0\n\n        context_lens_tensor = torch.tensor(context_lens,\n                                           dtype=torch.int,\n                                           device=self.device)\n\n        if multi_modal_input_list:\n            assert self.vision_language_config, (\n                \"Multi-modal inputs are only supported by \"\n                \"vision language models.\")\n            multi_modal_input = torch.cat(multi_modal_input_list,\n                                          dim=0).to(self.device)\n        else:\n            multi_modal_input = None\n\n        # Prepare prefix block tables\n        max_prompt_block_table_len = max(len(t) for t in prefix_block_tables)\n        block_tables = make_tensor_with_pad(\n            prefix_block_tables,\n            max_len=max_prompt_block_table_len,\n            pad=0,\n            dtype=torch.int,\n            device=self.device,\n        )\n\n        # Query length can be shorter than key (i.e., prompt) when prefill\n        # is chunked or prefix cached.\n        subquery_lens_tensor = torch.tensor(subquery_lens,\n                                            dtype=torch.long,\n                                            device=self.device)\n        subquery_start_loc = torch.zeros(subquery_lens_tensor.shape[0] + 1,\n                                         dtype=torch.int32,\n                                         device=self.device)\n\n        prompt_lens_tensor = torch.tensor(prompt_lens,\n                                          dtype=torch.long,\n                                          device=self.device)\n        seq_start_loc = torch.zeros(prompt_lens_tensor.shape[0] + 1,\n                                    dtype=torch.int32,\n                                    device=self.device)\n\n        torch.cumsum(subquery_lens_tensor,\n                     dim=0,\n                     dtype=subquery_start_loc.dtype,\n                     out=subquery_start_loc[1:])\n\n        torch.cumsum(prompt_lens_tensor,\n                     dim=0,\n                     dtype=seq_start_loc.dtype,\n                     out=seq_start_loc[1:])\n\n        attn_metadata = self.attn_backend.make_metadata(\n            is_prompt=True,\n            prompt_lens=prompt_lens,\n            prompt_lens_tensor=prompt_lens_tensor,\n            max_subquery_len=max_subquery_len,\n            max_context_len=None,\n            max_prompt_len=max_prompt_len,\n            subquery_start_loc=subquery_start_loc,\n            seq_start_loc=seq_start_loc,\n            context_lens=context_lens_tensor,\n            block_tables=block_tables,\n            use_cuda_graph=False,\n        )\n\n        return PreparePromptMetadata(\n            input_tokens=input_tokens,\n            input_positions=input_positions,\n            attn_metadata=attn_metadata,\n            prompt_lens=prompt_lens,\n            subquery_lens=subquery_lens,\n            lora_index_mapping=lora_index_mapping,\n            lora_prompt_mapping=lora_prompt_mapping,\n            lora_requests=lora_requests,\n            multi_modal_input=multi_modal_input,\n            slot_mapping=slot_mapping,\n        )",
        "completions": [
            "  ```\ndef _compute_slot_mapping(block_table, prompt_len):\n    slot_mapping = []\n    for i in range(prompt_len):\n        slot_id = block_table[i]\n        slot_mapping.append(slot_id)\n    return slot_mapping\n```"
        ],
        "line": 99,
        "token": 707,
        "line_diff": 2,
        "token_diff": 2
    },
    {
        "id": 47,
        "name": "create",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\n\ndef create(\n    supply_settings: Callable[[], config.ResilienceSettings] = config.ResilienceSettings, include_post: bool = False\n) -> adapters.HTTPAdapter:\n    settings = supply_settings()\n    allowed_methods = set(util.Retry.DEFAULT_ALLOWED_METHODS)\n    if include_post:\n        allowed_methods.add(\"POST\")\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "\n    retries = util.Retry(\n        total=settings.total,\n        backoff_factor=settings.backoff_factor,\n        status_forcelist=settings.statuses,\n        allowed_methods=allowed_methods,\n    )\n    return adapters.HTTPAdapter(max_retries=retries)",
        "completions": [
            "  ```\ndef create(\n    supply_settings: Callable[[], config.ResilienceSettings] = config.ResilienceSettings, include_post: bool = False\n) -> adapters.HTTPAdapter:\n    settings = supply_settings()\n    allowed_methods = set(util.Retry.DEFAULT_ALLOWED_METHODS)\n    if include_post:\n        allowed_methods.add(\"POST\")\n    return adapters.HTTPAdapter(\n        settings=settings,\n        allowed_methods=allowed_methods,\n    )\n```"
        ],
        "line": 20,
        "token": 193,
        "line_diff": 0,
        "token_diff": 0
    },
    {
        "id": 48,
        "name": "fit",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\n    def fit(\n        self,\n        timedelta_unit: DATETIME_UNITS = \"s\",\n        log_scale: bool | tuple[bool, bool] | None = None,\n        lower_cutoff_quantile: float | None = None,\n        upper_cutoff_quantile: float | None = None,\n        bins: int | Literal[BINS_ESTIMATORS] = 20,\n    ) -> None:\n        called_params = {\n            \"timedelta_unit\": timedelta_unit,\n            \"log_scale\": log_scale,\n            \"lower_cutoff_quantile\": lower_cutoff_quantile,\n            \"upper_cutoff_quantile\": upper_cutoff_quantile,\n            \"bins\": bins,\n        }\n        not_hash_values = [\"timedelta_unit\"]\n\n        self.log_scale, self.upper_cutoff_quantile, self.lower_cutoff_quantile = self.__validate_input(\n            log_scale,\n            lower_cutoff_quantile,\n            upper_cutoff_quantile,\n        )\n\n        self.timedelta_unit = timedelta_unit\n        self.bins = bins\n\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "        data = self.__eventstream.to_dataframe(copy=True).groupby(self.user_col)[self.time_col].agg([\"min\", \"max\"])\n        data[\"time_passed\"] = data[\"max\"] - data[\"min\"]\n        values_to_plot = (data[\"time_passed\"] / np.timedelta64(1, self.timedelta_unit)).reset_index(  # type: ignore\n            drop=True\n        )\n\n        if self._remove_cutoff_values:  # type: ignore\n            values_to_plot = self._remove_cutoff_values(values_to_plot).to_numpy()\n        if self.log_scale[0]:\n            log_adjustment = np.timedelta64(100, \"ms\") / np.timedelta64(1, self.timedelta_unit)\n            values_to_plot = np.where(values_to_plot != 0, values_to_plot, values_to_plot + log_adjustment)  # type: ignore\n            bins_to_show = np.power(10, np.histogram_bin_edges(np.log10(values_to_plot), bins=self.bins))\n        else:\n            bins_to_show = np.histogram_bin_edges(values_to_plot, bins=self.bins)\n        if len(values_to_plot) == 0:\n            bins_to_show = np.array([])\n\n        self.bins_to_show = bins_to_show\n        self.values_to_plot = values_to_plot  # type: ignore\n        collect_data_performance(\n            scope=\"user_lifetime_hist\",\n            event_name=\"metadata\",\n            called_params=called_params,\n            not_hash_values=not_hash_values,\n            performance_data={},\n            eventstream_index=self.__eventstream._eventstream_index,\n        )",
        "completions": [
            "  ```\ndef __validate_input(\n    log_scale: bool | tuple[bool, bool] | None = None,\n    lower_cutoff_quantile: float | None = None,\n    upper_cutoff_quantile: float | None = None,\n) -> tuple[bool, float, float]:\n    if log_scale is None:\n        log_scale = (False, False)\n    elif isinstance(log_scale, bool):\n        log_scale = (log_scale, log_scale)\n    elif isinstance(log_scale, tuple):\n        if len(log_scale)!= 2:\n            raise ValueError(\"log_scale must be a tuple of length 2\")\n    else:\n        raise ValueError(\"log_scale must be a bool or a tuple of length 2\")\n\n    if lower_cutoff_quantile is None:\n        lower_cutoff_quantile = 0.0\n    elif not isinstance(lower_cutoff_quantile, float):\n        raise ValueError(\"lower_cutoff_quantile must be a float\")\n\n    if upper_cutoff_quantile is None:\n        upper_cutoff_quantile = 1.0\n    elif not isinstance(upper_cutoff_quantile, float):\n        raise ValueError(\"upper_cutoff_quantile must be a float\")\n\n    return log_scale, lower_cutoff_quantile, upper_cutoff_quantile\n```"
        ],
        "line": 36,
        "token": 270,
        "line_diff": 2,
        "token_diff": 1
    },
    {
        "id": 49,
        "name": "transform_grid_cell",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\ndef transform_grid_cell(\n    grid: Grid,\n    cell: npt.NDArray[np.float_] = np.eye(3),\n    output_shape: Optional[Tuple[int, int, int]] = None,\n    mode: str = \"constant\",\n    order: int = 1,\n    **kwargs,\n) -> Grid:\n    # Take the current shape of the grid if no output shape was provided\n    if output_shape is None:\n        output_shape = grid.shape\n\n    # Make sure the cell has type float\n    cell = np.asarray(cell, dtype=float)\n\n    # Get the current cell in coordinates of the destination axes\n    inv_cell = cell_invert(cell).T\n    projected_cell = grid.cell.dot(inv_cell)\n\n    # From that, infere how long will the bounding box of the cell be\n    lengths = abs(projected_cell).sum(axis=0)\n\n    # Create the transformation matrix. Since we want to control the shape\n    # of the output, we can not use grid.dcell directly, we need to modify it.\n    scales = output_shape / lengths\n    forward_t = (grid.dcell.dot(inv_cell) * scales).T\n\n    # Scipy's affine transform asks for the inverse transformation matrix, to\n    # map from output pixels to input pixels. By taking the inverse of our\n    # transformation matrix, we get exactly that.\n    tr = cell_invert(forward_t).T\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "\n    # Calculate the offset of the image so that all points of the grid \"fall\" inside\n    # the output array.\n    # For this we just calculate the centers of the input and output images\n    center_input = 0.5 * (_a.asarrayd(grid.shape) - 1)\n    center_output = 0.5 * (_a.asarrayd(output_shape) - 1)\n\n    # And then make sure that the input center that is interpolated from the output\n    # falls in the actual input's center\n    offset = center_input - tr.dot(center_output)\n\n    # We pass all the parameters to scipy's affine_transform\n    transformed_image = affine_transform(\n        grid.grid,\n        tr,\n        order=1,\n        offset=offset,\n        output_shape=output_shape,\n        mode=mode,\n        **kwargs,\n    )\n\n    # Create a new grid with the new shape and the new cell (notice how the cell\n    # is rescaled from the input cell to fit the actual coordinates of the system)\n    new_grid = grid.__class__((1, 1, 1), lattice=cell * lengths.reshape(3, 1))\n    new_grid.grid = transformed_image\n    new_grid.geometry = grid.geometry\n    new_grid.lattice.origin = grid.origin + new_grid.dcell.dot(forward_t.dot(offset))\n\n    # Find the offset between the origin before and after the transformation\n    return new_grid",
        "completions": [
            "  ```\ndef transform_grid_cell(\n    grid: Grid,\n    cell: npt.NDArray[np.float_] = np.eye(3),\n    output_shape: Optional[Tuple[int, int, int]] = None,\n    mode: str = \"constant\",\n    order: int = 1,\n    **kwargs,\n) -> Grid:\n    # Take the current shape of the grid if no output shape was provided\n    if output_shape is None:\n        output_shape = grid.shape\n\n    # Make sure the cell has type float\n    cell = np.asarray(cell, dtype=float)\n\n    # Get the current cell in coordinates of the destination axes\n    inv_cell = cell_invert(cell).T\n    projected_cell = grid.cell.dot(inv_cell)\n\n    # From that, infere how long will the bounding box of the cell be\n    lengths = abs(projected_cell).sum(axis=0)\n\n    # Create the transformation matrix. Since we want to control the shape\n    # of the output, we can not use grid.dcell directly, we need to modify it.\n    scales = output_shape / lengths\n    forward_t = (grid.dcell.dot(inv_cell) * scales).T\n\n    # Scipy's affine transform asks for the inverse transformation matrix, to\n    # map from output pixels to input pixels. By taking the inverse of our\n    # transformation matrix, we get exactly that.\n    tr = cell_invert(forward_t).T\n\n    # Apply the transformation to the grid\n    return grid.transform(tr, output_shape=output_shape, mode=mode, order=order, **kwargs)\n```"
        ],
        "line": 39,
        "token": 384,
        "line_diff": 2,
        "token_diff": 2
    },
    {
        "id": 50,
        "name": "draw_dataset_dict",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\n    def draw_dataset_dict(self, dic):\n        annos = dic.get(\"annotations\", None)\n        if annos:\n            if \"segmentation\" in annos[0]:\n                masks = [x[\"segmentation\"] for x in annos]\n            else:\n                masks = None\n            if \"keypoints\" in annos[0]:\n                keypts = [x[\"keypoints\"] for x in annos]\n                keypts = np.array(keypts).reshape(len(annos), -1, 3)\n            else:\n                keypts = None\n\n            boxes = [BoxMode.convert(x[\"bbox\"], x[\"bbox_mode\"], BoxMode.XYXY_ABS) for x in annos]\n\n            labels = [x[\"category_id\"] for x in annos]\n            colors = None\n            if self._instance_mode == ColorMode.SEGMENTATION and self.metadata.get(\"thing_colors\"):\n                colors = [\n                    self._jitter([x / 255 for x in self.metadata.thing_colors[c]]) for c in labels\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "                ]\n            names = self.metadata.get(\"thing_classes\", None)\n            if names:\n                labels = [names[i] for i in labels]\n            labels = [\n                \"{}\".format(i) + (\"|crowd\" if a.get(\"iscrowd\", 0) else \"\")\n                for i, a in zip(labels, annos)\n            ]\n            self.overlay_instances(\n                labels=labels, boxes=boxes, masks=masks, keypoints=keypts, assigned_colors=colors\n            )\n\n        sem_seg = dic.get(\"sem_seg\", None)\n        if sem_seg is None and \"sem_seg_file_name\" in dic:\n            with PathManager.open(dic[\"sem_seg_file_name\"], \"rb\") as f:\n                sem_seg = Image.open(f)\n                sem_seg = np.asarray(sem_seg, dtype=\"uint8\")\n        if sem_seg is not None:\n            self.draw_sem_seg(sem_seg, area_threshold=0, alpha=0.5)\n        return self.output",
        "completions": [
            "  ```\n    def draw_dataset_dict(self, dic):\n        annos = dic.get(\"annotations\", None)\n        if annos:\n            if \"segmentation\" in annos[0]:\n                masks = [x[\"segmentation\"] for x in annos]\n            else:\n                masks = None\n            if \"keypoints\" in annos[0]:\n                keypts = [x[\"keypoints\"] for x in annos]\n                keypts = np.array(keypts).reshape(len(annos), -1, 3)\n            else:\n                keypts = None\n\n            boxes = [BoxMode.convert(x[\"bbox\"], x[\"bbox_mode\"], BoxMode.XYXY_ABS) for x in annos]\n\n            labels = [x[\"category_id\"] for x in annos]\n            colors = None\n            if self._instance_mode == ColorMode.SEGMENTATION and self.metadata.get(\"thing_colors\"):\n                colors = [\n                    self._jitter([x / 255 for x in self.metadata.thing_colors[c]]) for c in labels\n                ]\n            else:\n                colors = None\n\n            return masks, keypts, boxes, labels, colors\n```"
        ],
        "line": 31,
        "token": 321,
        "line_diff": 1,
        "token_diff": 2
    },
    {
        "id": 51,
        "name": "clip",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\n    def clip(self, box_size: Boxes.BoxSizeType, clip_angle_threshold: float = 1.0) -> None:\n        h, w = box_size\n\n        # normalize angles to be within (-180, 180] degrees\n        self.normalize_angles()\n\n        idx = torch.where(torch.abs(self.tensor[:, 4]) <= clip_angle_threshold)[0]\n\n        # convert to (x1, y1, x2, y2)\n        x1 = self.tensor[idx, 0] - self.tensor[idx, 2] / 2.0\n        y1 = self.tensor[idx, 1] - self.tensor[idx, 3] / 2.0\n        x2 = self.tensor[idx, 0] + self.tensor[idx, 2] / 2.0\n        y2 = self.tensor[idx, 1] + self.tensor[idx, 3] / 2.0\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "\n        # clip\n        x1.clamp_(min=0, max=w)\n        y1.clamp_(min=0, max=h)\n        x2.clamp_(min=0, max=w)\n        y2.clamp_(min=0, max=h)\n\n        # convert back to (xc, yc, w, h)\n        self.tensor[idx, 0] = (x1 + x2) / 2.0\n        self.tensor[idx, 1] = (y1 + y2) / 2.0\n        # make sure widths and heights do not increase due to numerical errors\n        self.tensor[idx, 2] = torch.min(self.tensor[idx, 2], x2 - x1)\n        self.tensor[idx, 3] = torch.min(self.tensor[idx, 3], y2 - y1)",
        "completions": [
            "  ```\n    def clip(self, box_size: Boxes.BoxSizeType, clip_angle_threshold: float = 1.0) -> None:\n        h, w = box_size\n\n        # normalize angles to be within (-180, 180] degrees\n        self.normalize_angles()\n\n        idx = torch.where(torch.abs(self.tensor[:, 4]) <= clip_angle_threshold)[0]\n\n        # convert to (x1, y1, x2, y2)\n        x1 = self.tensor[idx, 0] - self.tensor[idx, 2] / 2.0\n        y1 = self.tensor[idx, 1] - self.tensor[idx, 3] / 2.0\n        x2 = self.tensor[idx, 0] + self.tensor[idx, 2] / 2.0\n        y2 = self.tensor[idx, 1] + self.tensor[idx, 3] / 2.0\n\n        # clip to the box size\n        x1 = torch.clip(x1, 0, w)\n        y1 = torch.clip(y1, 0, h)\n        x2 = torch.clip(x2, 0, w)\n        y2 = torch.clip(y2, 0, h)\n\n        # update the tensor\n        self.tensor[idx, 0] = x1\n        self.tensor[idx, 1] = y1\n        self.tensor[idx, 2] = x2 - x1\n        self.tensor[idx, 3] = y2 - y1\n```"
        ],
        "line": 23,
        "token": 287,
        "line_diff": 1,
        "token_diff": 1
    },
    {
        "id": 52,
        "name": "create_app_from_config",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\ndef create_app_from_config(config: GatewayConfig) -> GatewayAPI:\n    limiter = Limiter(\n        key_func=get_remote_address, storage_uri=MLFLOW_GATEWAY_RATE_LIMITS_STORAGE_URI.get()\n    )\n    app = GatewayAPI(\n        config=config,\n        limiter=limiter,\n        title=\"MLflow Deployments Server\",\n        description=\"The core deployments API for reverse proxy interface using remote inference \"\n        \"endpoints within MLflow\",\n        version=VERSION,\n        docs_url=None,\n    )\n\n    @app.get(\"/\", include_in_schema=False)\n    async def index():\n        return RedirectResponse(url=\"/docs\")\n\n    @app.get(\"/favicon.ico\", include_in_schema=False)\n    async def favicon():\n        for directory in [\"build\", \"public\"]:\n            favicon = Path(__file__).parent.parent.parent.joinpath(\n                \"server\", \"js\", directory, \"favicon.ico\"\n            )\n            if favicon.exists():\n                return FileResponse(favicon)\n        raise HTTPException(status_code=404, detail=\"favicon.ico not found\")\n\n    @app.get(\"/docs\", include_in_schema=False)\n    async def docs():\n        return get_swagger_ui_html(\n            openapi_url=\"/openapi.json\",\n            title=\"MLflow Deployments Server\",\n            swagger_favicon_url=\"/favicon.ico\",\n        )\n\n    @app.get(MLFLOW_DEPLOYMENTS_HEALTH_ENDPOINT)\n    # TODO: Remove Gateway server URLs after deprecation window elapses\n    @app.get(MLFLOW_GATEWAY_HEALTH_ENDPOINT, include_in_schema=False)\n    async def health() -> HealthResponse:\n        return {\"status\": \"OK\"}\n\n    @app.get(MLFLOW_DEPLOYMENTS_CRUD_ENDPOINT_BASE + \"{endpoint_name}\")\n    async def get_endpoint(endpoint_name: str) -> Endpoint:\n        if matched := app.get_dynamic_route(endpoint_name):\n            return matched.to_endpoint()\n\n        raise HTTPException(\n            status_code=404,\n            detail=f\"The endpoint '{endpoint_name}' is not present or active on the server. Please \"\n            \"verify the endpoint name.\",\n        )\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "\n    # TODO: Remove Gateway server URLs after deprecation window elapses\n    @app.get(MLFLOW_GATEWAY_CRUD_ROUTE_BASE + \"{route_name}\", include_in_schema=False)\n    async def get_route(route_name: str) -> Route:\n        if matched := app.get_dynamic_route(route_name):\n            return matched\n\n        raise HTTPException(\n            status_code=404,\n            detail=f\"The route '{route_name}' is not present or active on the server. Please \"\n            \"verify the route name.\",\n        )\n\n    @app.get(MLFLOW_DEPLOYMENTS_CRUD_ENDPOINT_BASE)\n    async def list_endpoints(page_token: Optional[str] = None) -> ListEndpointsResponse:\n        start_idx = SearchRoutesToken.decode(page_token).index if page_token is not None else 0\n\n        end_idx = start_idx + MLFLOW_DEPLOYMENTS_LIST_ENDPOINTS_PAGE_SIZE\n        routes = list(app.dynamic_routes.values())\n        result = {\"endpoints\": [route.to_endpoint() for route in routes[start_idx:end_idx]]}\n        if len(routes[end_idx:]) > 0:\n            next_page_token = SearchRoutesToken(index=end_idx)\n            result[\"next_page_token\"] = next_page_token.encode()\n\n        return result\n\n    # TODO: Remove Gateway server URLs after deprecation window elapses\n    @app.get(MLFLOW_GATEWAY_CRUD_ROUTE_BASE, include_in_schema=False)\n    async def search_routes(page_token: Optional[str] = None) -> SearchRoutesResponse:\n        start_idx = SearchRoutesToken.decode(page_token).index if page_token is not None else 0\n\n        end_idx = start_idx + MLFLOW_GATEWAY_SEARCH_ROUTES_PAGE_SIZE\n        routes = list(app.dynamic_routes.values())\n        result = {\"routes\": routes[start_idx:end_idx]}\n        if len(routes[end_idx:]) > 0:\n            next_page_token = SearchRoutesToken(index=end_idx)\n            result[\"next_page_token\"] = next_page_token.encode()\n\n        return result\n\n    @app.get(MLFLOW_DEPLOYMENTS_LIMITS_BASE + \"{endpoint}\")\n    # TODO: Remove Gateway server URLs after deprecation window elapses\n    @app.get(MLFLOW_GATEWAY_LIMITS_BASE + \"{endpoint}\", include_in_schema=False)\n    async def get_limits(endpoint: str) -> LimitsConfig:\n        raise HTTPException(status_code=501, detail=\"The get_limits API is not available yet.\")\n\n    @app.post(MLFLOW_DEPLOYMENTS_LIMITS_BASE)\n    # TODO: Remove Gateway server URLs after deprecation window elapses\n    @app.post(MLFLOW_GATEWAY_LIMITS_BASE, include_in_schema=False)\n    async def set_limits(payload: SetLimitsModel) -> LimitsConfig:\n        raise HTTPException(status_code=501, detail=\"The set_limits API is not available yet.\")\n\n    return app",
        "completions": [
            "  ```\nasync def get_endpoint(endpoint_name: str) -> Endpoint:\n    if matched := app.get_dynamic_route(endpoint_name):\n        return matched.to_endpoint()\n\n    raise HTTPException(\n        status_code=404,\n        detail=f\"The endpoint '{endpoint_name}' is not present or active on the server. Please \"\n        \"verify the endpoint name.\",\n    )\n```"
        ],
        "line": 59,
        "token": 452,
        "line_diff": 2,
        "token_diff": 2
    },
    {
        "id": 53,
        "name": "obj_fcn_decorator",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\ndef obj_fcn_decorator(\n    model_fcn_full,\n    weight_fcn,\n    TSS_fcn,\n    T,\n    obs,\n    settings,\n    alpha=2.0,\n    coef_id=[],\n    initial_fit=True,\n):\n\n    N = np.shape(obs)[0]\n    N_min = settings.segment_minimum_count  # N minimum for a sloped segment\n    sigma = 2.698  # 1.5 IQR\n    quantile = 0.25\n    min_weight = 0.00\n\n    T_fit_bnds = np.array([np.min(T), np.max(T)])\n    T_range = T_fit_bnds[1] - T_fit_bnds[0]\n\n    model_fcn = model_fcn_dec(model_fcn_full, T_fit_bnds, T)\n\n    lasso_a = settings.regularization_percent_lasso * settings.regularization_alpha\n    ridge_a = (\n        1 - settings.regularization_percent_lasso\n    ) * settings.regularization_alpha\n\n    idx_k = get_idx([\"dd_k\"], coef_id)\n    idx_beta = get_idx([\"dd_beta\"], coef_id)\n    idx_bp = get_idx([\"dd_bp\"], coef_id)\n    # idx_reg = get_idx([\"dd_beta\", \"dd_k\"], coef_id) # drop bps and intercept from regularization\n    idx_reg = get_idx(\n        [\"dd_beta\", \"dd_k\", \"dd_bp\"], coef_id\n    )  # drop intercept from regularization\n\n    def elastic_net_penalty(X, T_sorted, obs_sorted, weight_sorted, wRMSE):\n        \"\"\"\n        Calculates the elastic net penalty for a given set of inputs. The elastic net is a regularized\n        regression method that linearly combines the L1 and L2 penalties of the lasso and ridge methods.\n\n        Parameters:\n        X (array-like): The input array.\n        T_sorted (array-like): The sorted temperature array.\n        obs_sorted (array-like): The sorted observation array.\n        weight_sorted (array-like): The sorted weight array.\n        wRMSE (float): The weighted root mean squared error.\n\n        Returns:\n        penalty (float): The elastic net penalty.\n        \"\"\"\n\n        # Elastic net\n        X_enet = np.array(X).copy()\n\n        ## Scale break points ##\n        if len(idx_bp) > 0:\n            X_enet[idx_bp] = [\n                np.min(np.abs(X_enet[idx] - T_fit_bnds)) for idx in idx_bp\n            ]\n\n            if len(idx_bp) == 2:\n                X_enet[idx_bp] += (X[idx_bp][1] - X[idx_bp][0]) / 2\n\n            X_enet[idx_bp] *= wRMSE / T_range\n\n        # Find idx for regions\n        if len(idx_bp) == 2:\n            [hdd_bp, cdd_bp] = X[idx_bp]\n\n            idx_hdd = np.argwhere(T_sorted < hdd_bp).flatten()\n            idx_tidd = np.argwhere(\n                (hdd_bp <= T_sorted) & (T_sorted <= cdd_bp)\n            ).flatten()\n            idx_cdd = np.argwhere(cdd_bp < T_sorted).flatten()\n\n        elif len(idx_bp) == 1:\n            bp = X[idx_bp]\n            if X_enet[idx_beta] < 0:  # HDD_TIDD\n                idx_hdd = np.argwhere(T_sorted <= bp).flatten()\n                idx_tidd = np.argwhere(bp < T_sorted).flatten()\n                idx_cdd = np.array([])\n\n            else:\n                idx_hdd = np.array([])  # CDD_TIDD\n                idx_tidd = np.argwhere(T_sorted < bp).flatten()\n                idx_cdd = np.argwhere(bp <= T_sorted).flatten()\n\n        else:\n            idx_hdd = np.array([])\n            idx_tidd = np.arange(0, len(T_sorted))\n            idx_cdd = np.array([])\n\n        len_hdd = len(idx_hdd)\n        len_tidd = len(idx_tidd)\n        len_cdd = len(idx_cdd)\n\n        # combine tidd with hdd/cdd if cdd/hdd are large enough to get stdev\n        if (len_hdd < N_min) and (len_cdd >= N_min):\n            idx_hdd = np.hstack([idx_hdd, idx_tidd])\n        elif (len_hdd >= N_min) and (len_cdd < N_min):\n            idx_cdd = np.hstack([idx_tidd, idx_cdd])\n\n        # change to idx_hdd and idx_cdd to int arrays\n        idx_hdd = idx_hdd.astype(int)\n        idx_cdd = idx_cdd.astype(int)\n\n        ## Normalize slopes ##\n        # calculate stdevs\n        if (len(idx_bp) == 2) and (len(idx_hdd) >= N_min) and (len(idx_cdd) >= N_min):\n            N_beta = np.array([len_hdd, len_cdd])\n            T_stdev = np.array(\n                [\n                    stdev(T_sorted[idx_hdd], weights=weight_sorted[idx_hdd]),\n                    stdev(T_sorted[idx_cdd], weights=weight_sorted[idx_cdd]),\n                ]\n            )\n            obs_stdev = np.array(\n                [\n                    stdev(obs_sorted[idx_hdd], weights=weight_sorted[idx_hdd]),\n                    stdev(obs_sorted[idx_cdd], weights=weight_sorted[idx_cdd]),\n                ]\n            )\n\n        elif (len(idx_bp) == 1) and (len(idx_hdd) >= N_min):\n            N_beta = np.array([len_hdd])\n            T_stdev = stdev(T_sorted[idx_hdd], weights=weight_sorted[idx_hdd])\n            obs_stdev = stdev(obs_sorted[idx_hdd], weights=weight_sorted[idx_hdd])\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "\n        elif (len(idx_bp) == 1) and (len(idx_cdd) >= N_min):\n            N_beta = np.array([len_cdd])\n            T_stdev = stdev(T_sorted[idx_cdd], weights=weight_sorted[idx_cdd])\n            obs_stdev = stdev(obs_sorted[idx_cdd], weights=weight_sorted[idx_cdd])\n\n        else:\n            N_beta = np.array([len_tidd])\n            T_stdev = stdev(T_sorted, weights=weight_sorted)\n            obs_stdev = stdev(obs_sorted, weights=weight_sorted)\n\n        X_enet[idx_beta] *= T_stdev / obs_stdev\n\n        # add penalty to slope for not having enough datapoints\n        X_enet[idx_beta] = np.where(\n            N_beta < N_min, X_enet[idx_beta] * 1e30, X_enet[idx_beta]\n        )\n\n        ## Scale smoothing parameter ##\n        if len(idx_k) > 0:  # reducing X_enet size allows for more smoothing\n            X_enet[idx_k] = X[idx_k]\n\n            if (len(idx_k) == 2) and (np.sum(X_enet[idx_k]) > 1):\n                X_enet[idx_k] /= np.sum(X_enet[idx_k])\n\n            X_enet[idx_k] *= (\n                X_enet[idx_beta] / 2\n            )  # uncertain what to divide by, this seems to work well\n\n        X_enet = X_enet[idx_reg]\n\n        if ridge_a == 0:\n            penalty = lasso_a * np.linalg.norm(X_enet, 1)\n        else:\n            penalty = lasso_a * np.linalg.norm(X_enet, 1) + ridge_a * np.linalg.norm(\n                X_enet, 2\n            )\n\n        return penalty\n\n    def obj_fcn(X, grad=[], optimize_flag=True):\n        \"\"\"\n        Creates an objective function having the required inputs for optimization via SciPy and NLopt. If the optimize_flag is true,\n        only return the loss. If the optimize_flag is false, return the loss and the model output parameters.\n\n        Parameters:\n        - X: array-like\n            Array of coefficients.\n        - grad: array-like, optional\n            Gradient array. Default is an empty list.\n        - optimize_flag: bool, optional\n            Whether to optimize. Default is True.\n\n        Returns:\n        - obj: float\n            Objective function value.\n        \"\"\"\n        X = np.array(X)\n\n        model = model_fcn(X)\n        idx_sorted = np.argsort(T).flatten()\n        idx_initial = np.argsort(idx_sorted).flatten()\n        resid = model - obs\n\n        T_sorted = T[idx_sorted]\n        # model_sorted = model[idx_sorted]\n        obs_sorted = obs[idx_sorted]\n        resid_sorted = resid[idx_sorted]\n        weight_sorted, c, a = weight_fcn(\n            *X, T_sorted, resid_sorted, sigma, quantile, alpha, min_weight\n        )\n\n        weight = weight_sorted[idx_initial]\n        wSSE = np.sum(weight * resid**2)\n        loss = wSSE / N\n\n        if settings.regularization_alpha != 0:\n            loss += elastic_net_penalty(\n                X, T_sorted, obs_sorted, weight_sorted, np.sqrt(loss)\n            )\n\n        if optimize_flag:\n            return loss\n\n        else:\n            if (\"r_squared\" in settings.split_selection_criteria) and callable(TSS_fcn):\n                TSS = TSS_fcn(*X, T_sorted, obs_sorted)\n            else:\n                TSS = wSSE\n\n            if initial_fit:\n                jac = None\n            else:\n                eps = 10 ** (OoM(X, method=\"floor\") - 2)\n                X_lower = X - eps\n                X_upper = X + eps\n\n                # select correct finite difference scheme based on variable type and value\n                # NOTE: finite differencing was not returning great results. Looking into switching to JAX autodiff\n                fd_type = [\"central\"] * len(X)\n                for i in range(len(X)):\n                    if i in idx_k:\n                        if X_lower[i] < 0:\n                            fd_type[i] = \"forward\"\n                        elif X_upper[i] > 1:\n                            fd_type[i] = \"backward\"\n                    elif i in idx_beta:\n                        if (X[i] > 0) and (X_lower[i] < 0):\n                            fd_type[i] = \"forward\"\n                        elif (X[i] < 0) and (X_upper[i] > 0):\n                            fd_type[i] = \"backward\"\n                    elif i in idx_bp:\n                        if X_lower[i] < T_sorted[0]:\n                            fd_type[i] = \"forward\"\n                        elif X_lower[i] > T_sorted[-1]:\n                            fd_type[i] = \"backward\"\n\n                # https://stackoverflow.com/questions/70572362/compute-efficiently-hessian-matrices-in-jax\n                # hess = jit(jacfwd(jacrev(no_weights_obj_fcn), has_aux=True), has_aux=True)(X, [model_fcn, obs, idx_bp])\n                # print(hess)\n                # obj_grad_fcn = lambda X: no_weights_obj_fcn(X, [model_fcn, obs, idx_bp])\n\n                # jac = numerical_jacobian(obj_grad_fcn, X, dx=eps, fd_type=fd_type)\n                jac = None\n\n            return X, loss, TSS, T, model, weight, resid, jac, np.mean(a), c\n\n    return obj_fcn",
        "completions": [
            "  ```\ndef elastic_net_penalty(X, T_sorted, obs_sorted, weight_sorted, wRMSE):\n    \"\"\"\n    Calculates the elastic net penalty for a given set of inputs. The elastic net is a regularized\n    regression method that linearly combines the L1 and L2 penalties of the lasso and ridge methods.\n\n    Parameters:\n    X (array-like): The input array.\n    T_sorted (array-like): The sorted temperature array.\n    obs_sorted (array-like): The sorted observation array.\n    weight_sorted (array-like): The sorted weight array.\n    wRMSE (float): The weighted root mean squared error.\n\n    Returns:\n    penalty (float): The elastic net penalty.\n    \"\"\"\n\n    # Elastic net\n    X_enet = np.array(X).copy()\n\n    ## Scale break points ##\n    if len(idx_bp) > 0:\n        X_enet[idx_bp] = [\n            np.min(np.abs(X_enet[idx] - T_fit_bnds)) for idx in idx_bp\n        ]\n\n        if len(idx_bp) == 2:\n            X_enet[idx_bp] += (X[idx_bp][1] - X[idx_bp][0]) / 2\n\n        X_enet[idx_bp] *= wRMSE / T_range\n\n    # Find idx for regions\n    if len(idx_bp) == 2:\n        [hdd_bp, cdd_bp] = X[idx_bp]\n\n        idx_hdd = np.argwhere(T_sorted < hdd_bp).flatten()\n        idx_tidd = np.argwhere(\n            (hdd_bp <= T_sorted) & (T_sorted <= cdd_bp)\n        ).flatten()\n        idx_cdd = np.argwhere(cdd_bp < T_sorted).flatten()\n\n    elif len(idx_bp) == 1:\n        bp = X[idx_bp]\n        if X_enet[idx_beta] < 0:  # HDD_TIDD\n            idx_hdd = np.argwhere(T_sorted <= bp).flatten()\n            idx_tidd = np.argwhere(bp < T_sorted).flatten()\n            idx_cdd = np.array([])\n\n        else:\n            idx_hdd = np.array([])  # CDD_TIDD\n            idx_tidd = np.argwhere(T_sorted < bp).flatten()\n            idx_cdd = np.argwhere(bp <= T_sorted).flatten()\n\n    else:\n        idx_hdd = np.array([])\n        idx_tidd = np.arange(0, len(T_sorted))\n        idx_cdd = np.array([])\n\n    len_hdd = len(idx_hdd)\n    len_tidd = len(idx_tidd)\n    len_cdd = len(idx_cdd)\n\n    # combine tidd with hdd/cdd if cdd/hdd are large enough to get stdev\n    if (len_hdd < N_min) and (len_cdd >= N_min):\n        idx_hdd = np.hstack([idx_hdd, idx_tidd])\n    elif (len_hdd >= N_min) and (len_cdd < N_min):\n        idx_cdd = np.hstack([idx_tidd, idx_cdd])\n\n    # change to idx_hdd and idx_cdd to int arrays\n    idx_hdd = idx_hdd.astype(int)\n    idx_cdd = idx_cdd.astype(int)\n\n    ## Normalize slopes ##\n    # calculate stdevs\n    if (len(idx_bp) == 2) and (len(idx_hdd) >= N_min) and (len(idx_cdd) >= N_min):\n        N_beta = np.array([len_hdd, len_cdd])\n        T_stdev = np."
        ],
        "line": 119,
        "token": 1002,
        "line_diff": 2,
        "token_diff": 2
    },
    {
        "id": 54,
        "name": "_setup_cuda_device",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\ndef _setup_cuda_device(device: torch.device) -> torch.device:\n    local_rank = _get_local_rank() or 0\n    if device.index is None:\n        device = torch.device(type=\"cuda\", index=local_rank)\n\n    # Ensure index is available before setting device\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "    if device.index >= torch.cuda.device_count():\n        raise RuntimeError(\n            \"The local rank is larger than the number of available GPUs.\"\n        )\n\n    torch.cuda.set_device(device)\n    return device",
        "completions": [
            "  \n[PYTHON]\ndef _setup_cuda_device(device: torch.device) -> torch.device:\n    local_rank = _get_local_rank() or 0\n    if device.index is None:\n        device = torch.device(type=\"cuda\", index=local_rank)\n    return device\n[/PYTHON]\n"
        ],
        "line": 18,
        "token": 190,
        "line_diff": 0,
        "token_diff": 0
    },
    {
        "id": 55,
        "name": "to_numpy",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\ndef to_numpy(\n    evset: EventSet,\n    timestamp_to_datetime: bool = True,\n    timestamps: bool = True,\n) -> Dict[str, ndarray]:\n    timestamp_key = \"timestamp\"\n    index_names = evset.schema.index_names()\n    feature_names = evset.schema.feature_names()\n\n    column_names = index_names + feature_names\n    if timestamps:\n        column_names += [timestamp_key]\n\n    dst = {column_name: [] for column_name in column_names}\n    for index, data in evset.data.items():\n        assert isinstance(index, tuple)\n\n        if timestamps:\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "            # Timestamps\n            if evset.schema.is_unix_timestamp and timestamp_to_datetime:\n                dst[timestamp_key].append(\n                    data.timestamps.astype(\"datetime64[s]\")\n                )\n            else:\n                dst[timestamp_key].append(data.timestamps)\n\n        # Features\n        for feature_name, feature in zip(feature_names, data.features):\n            dst[feature_name].append(feature)\n\n        # Indexes\n        num_timestamps = len(data.timestamps)\n        for index_name, index_item in zip(index_names, index):\n            dst[index_name].append(np.repeat(index_item, num_timestamps))\n\n    dst = {k: np.concatenate(v) for k, v in dst.items()}\n    return dst",
        "completions": [
            "  \n[PYTHON]\n```\n        if timestamps:\n            dst[timestamp_key].append(data[timestamp_key])\n        for feature_name in feature_names:\n            dst[feature_name].append(data[feature_name])\n        for index_name in index_names:\n            dst[index_name].append(index[index_name])\n```\n[/PYTHON]\n"
        ],
        "line": 28,
        "token": 236,
        "line_diff": 1,
        "token_diff": 1
    },
    {
        "id": 56,
        "name": "_get_user_settings",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\n    def _get_user_settings(self, session: HubSession) -> HubUserSettings:\n        access_token = session.access_token.get_secret_value()\n        token_type = session.token_type\n        authorization = f\"{token_type.title()} {access_token}\"\n\n        response = get(\n            url=self._base_url + \"/terminal/user\",\n            headers={\"Authorization\": authorization},\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "            timeout=self.TIMEOUT,\n        )\n        if response.status_code == 200:\n            user_settings = response.json()\n            filtered = {k: v for k, v in user_settings.items() if v is not None}\n            return HubUserSettings.model_validate(filtered)\n        status_code = response.status_code\n        detail = response.json().get(\"detail\", None)\n        raise HTTPException(status_code, detail)",
        "completions": [
            "  ```\n    def _get_user_settings(self, session: HubSession) -> HubUserSettings:\n        access_token = session.access_token.get_secret_value()\n        token_type = session.token_type\n        authorization = f\"{token_type.title()} {access_token}\"\n\n        response = get(\n            url=self._base_url + \"/terminal/user\",\n            headers={\"Authorization\": authorization},\n        )\n\n        return response.json()\n```"
        ],
        "line": 20,
        "token": 201,
        "line_diff": 0,
        "token_diff": 0
    },
    {
        "id": 57,
        "name": "get_demo_language",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\ndef get_demo_language(filename):\n    with open(filename, \"r\", encoding=\"utf-8\") as file:\n        d = yaml.safe_load(file)\n\n    lang = Language()\n\n    def load(key, method):\n        if key in d:\n            val = d[key]\n            # Handle boolean values\n            if isinstance(val, str):\n                temp = val.lower()\n                if temp == \"true\":\n                    val = True\n                elif temp == \"false\":\n                    val = False\n            setattr(lang, method, val)\n\n    # Define mappings for fields\n    mappings = {\n        \"name\": \"name\",\n        \"show_romanization\": \"show_romanization\",\n        \"right_to_left\": \"right_to_left\",\n        \"parser_type\": \"parser_type\",\n        \"character_substitutions\": \"character_substitutions\",\n        \"split_sentences\": \"regexp_split_sentences\",\n        \"split_sentence_exceptions\": \"exceptions_split_sentences\",\n        \"word_chars\": \"word_characters\",\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "    }\n\n    for key in d.keys():\n        funcname = mappings.get(key, \"\")\n        if funcname:\n            load(key, funcname)\n\n    ld_sort = 1\n    for ld_data in d[\"dictionaries\"]:\n        dtype = ld_data[\"type\"]\n        if dtype == \"embedded\":\n            dtype = \"embeddedhtml\"\n        elif dtype == \"popup\":\n            dtype = \"popuphtml\"\n        else:\n            raise ValueError(f\"Invalid dictionary type {dtype}\")\n\n        ld = LanguageDictionary()\n        # ld.language = lang -- if you do this, the dict is added twice.\n        ld.usefor = ld_data[\"for\"]\n        ld.dicttype = dtype\n        ld.dicturi = ld_data[\"url\"]\n        ld.is_active = True\n        ld.sort_order = ld_sort\n        ld_sort += 1\n        lang.dictionaries.append(ld)\n\n    return lang",
        "completions": [
            "  ```\ndef load(key, method):\n    if key in d:\n        val = d[key]\n        # Handle boolean values\n        if isinstance(val, str):\n            temp = val.lower()\n            if temp == \"true\":\n                val = True\n            elif temp == \"false\":\n                val = False\n        setattr(lang, method, val)\n```"
        ],
        "line": 38,
        "token": 318,
        "line_diff": 2,
        "token_diff": 2
    },
    {
        "id": 58,
        "name": "_prepare_sample",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\n\n    def _prepare_sample(\n        self,\n        seq_group_metadata_list: List[SequenceGroupMetadata],\n        prompt_lens: List[int],\n        subquery_lens: Optional[List[int]],\n    ) -> SamplingMetadata:\n        seq_groups: List[Tuple[List[int], SamplingParams]] = []\n        selected_token_indices: List[int] = []\n        generators: List[torch.Generator] = []\n        selected_token_start_idx = 0\n        categorized_sample_indices = {t: [] for t in SamplingType}\n        categorized_sample_indices_start_idx = 0\n\n        max_subquery_len = max(subquery_lens) if subquery_lens else 1\n        for i, seq_group_metadata in enumerate(seq_group_metadata_list):\n            seq_ids = list(seq_group_metadata.seq_data.keys())\n            sampling_params = seq_group_metadata.sampling_params\n            seq_groups.append((seq_ids, sampling_params))\n\n            if seq_group_metadata.is_prompt:\n                assert len(seq_ids) == 1\n                assert subquery_lens is not None\n                subquery_len = subquery_lens[i]\n                if sampling_params.prompt_logprobs is not None:\n                    # NOTE: prompt token positions do not need sample, skip\n                    categorized_sample_indices_start_idx += subquery_len - 1\n\n                categorized_sample_indices[\n                    sampling_params.sampling_type].append(\n                        categorized_sample_indices_start_idx)\n                categorized_sample_indices_start_idx += 1\n\n                if sampling_params.prompt_logprobs is not None:\n                    selected_token_indices.extend(\n                        range(\n                            selected_token_start_idx,\n                            selected_token_start_idx + subquery_len - 1,\n                        ))\n                selected_token_indices.append(selected_token_start_idx +\n                                              subquery_len - 1)\n                selected_token_start_idx += max_subquery_len\n                if sampling_params.seed is not None:\n                    seq_group_metadata.state.generator = torch.Generator(\n                        device=\"cuda\").manual_seed(sampling_params.seed)\n            else:\n                num_seqs = len(seq_ids)\n                selected_token_indices.extend(\n                    range(\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "                        selected_token_start_idx,\n                        selected_token_start_idx + num_seqs,\n                    ))\n                selected_token_start_idx += num_seqs\n\n                categorized_sample_indices[\n                    sampling_params.sampling_type].extend(\n                        range(\n                            categorized_sample_indices_start_idx,\n                            categorized_sample_indices_start_idx + num_seqs,\n                        ))\n                categorized_sample_indices_start_idx += num_seqs\n\n            if sampling_params.seed is not None:\n                generators.append(seq_group_metadata.state.generator)\n\n        selected_token_indices = _async_h2d(\n            selected_token_indices,\n            dtype=torch.long,\n            target_device=self.device,\n            pin_memory=not self.in_wsl,\n        )\n        categorized_sample_indices = {\n            t: _async_h2d(\n                seq_ids,\n                dtype=torch.int,\n                target_device=self.device,\n                pin_memory=not self.in_wsl,\n            )\n            for t, seq_ids in categorized_sample_indices.items()\n        }\n\n        seq_data: Dict[int, SequenceData] = {}\n        for seq_group_metadata in seq_group_metadata_list:\n            seq_data.update(seq_group_metadata.seq_data)\n\n        seq_persistence_data: Dict[int, dict] = {}\n        for grp in seq_group_metadata_list:\n            seq_persistence_data.update(grp.persistent_data)\n\n        sampling_metadata = SamplingMetadata(\n            seq_groups=seq_groups,\n            seq_data=seq_data,\n            prompt_lens=prompt_lens,\n            selected_token_indices=selected_token_indices,\n            categorized_sample_indices=categorized_sample_indices,\n            generators=generators,\n            persistent_metadata=PersistentMetadata(seq_persistence_data),\n        )\n        return sampling_metadata",
        "completions": [
            "  ```\ndef _prepare_sample(\n        self,\n        seq_group_metadata_list: List[SequenceGroupMetadata],\n        prompt_lens: List[int],\n        subquery_lens: Optional[List[int]],\n    ) -> SamplingMetadata:\n        seq_groups: List[Tuple[List[int], SamplingParams]] = []\n        selected_token_indices: List[int] = []\n        generators: List[torch.Generator] = []\n        selected_token_start_idx = 0\n        categorized_sample_indices = {t: [] for t in SamplingType}\n        categorized_sample_indices_start_idx = 0\n\n        max_subquery_len = max(subquery_lens) if subquery_lens else 1\n        for i, seq_group_metadata in enumerate(seq_group_metadata_list):\n            seq_ids = list(seq_group_metadata.seq_data.keys())\n            sampling_params = seq_group_metadata.sampling_params\n            seq_groups.append((seq_ids, sampling_params))\n\n            if seq_group_metadata.is_prompt:\n                assert len(seq_ids) == 1\n                assert subquery_lens is not None\n                subquery_len = subquery_lens[i]\n                if sampling_params.prompt_logprobs is not None:\n                    # NOTE: prompt token positions do not need sample, skip\n                    categorized_sample_indices_start_idx += subquery_len - 1\n\n                categorized_sample_indices[\n                    sampling_params.sampling_type].append(\n                        categorized_sample_indices_start_idx)\n                categorized_sample_indices_start_idx += 1\n\n                if sampling_params.prompt_logprobs is not None:\n                    selected_token_indices.extend(\n                        range(\n                            selected_token_start_idx,\n                            selected_token_start_idx + subquery_len - 1,\n                        ))\n                selected_token_indices.append(selected_token_start_idx +\n                                              subquery_len - 1)\n                selected_token_start_idx += max_subquery_len\n                if sampling_params.seed is not None:\n                    seq_group_metadata.state.generator = torch.Generator(\n                        device=\"cuda\").manual_seed(sampling_params.seed)\n            else:\n                num_seqs = len(seq_ids)\n                selected_token_indices.extend(\n                    range(\n                        selected_token_start_idx,\n                        selected_token_start_idx + num_seqs,\n                    ))\n                selected_token_start_idx += num_seqs\n                if sampling_params.seed is not None:\n                    seq_group_metadata.state.generator = torch.Generator(\n                        device=\"cuda\").manual_seed(sampling_params.seed)\n\n        return SamplingMetadata(\n            seq_groups=seq_groups,\n            selected_token_indices=selected_token_indices,\n            generators=generators,\n            categorized_sample_indices=categorized_sample_indices,\n        )\n```"
        ],
        "line": 57,
        "token": 391,
        "line_diff": 2,
        "token_diff": 2
    },
    {
        "id": 59,
        "name": "get_stress",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\n    def get_stress(\n        self,\n        strain: float,\n    ) -> float:\n        # create interpolation function\n        stress_function = interp1d(\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "            x=self.strains,\n            y=self.stresses,\n            kind=\"linear\",\n            fill_value=\"extrapolate\",\n        )\n\n        return stress_function(strain)",
        "completions": [
            "  \n[PYTHON]\nstress_function = interp1d(\n    x, y, kind='cubic')\n[/PYTHON]\n"
        ],
        "line": 19,
        "token": 170,
        "line_diff": 0,
        "token_diff": 0
    },
    {
        "id": 60,
        "name": "get_graph",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\n    def get_graph(self, atoms: Atoms) -> tuple[dgl.DGLGraph, torch.Tensor, list]:\n        numerical_tol = 1.0e-8\n        pbc = np.array([1, 1, 1], dtype=int)\n        element_types = self.element_types\n        lattice_matrix = np.array(atoms.get_cell()) if atoms.pbc.all() else np.expand_dims(np.identity(3), axis=0)\n        cart_coords = atoms.get_positions()\n        if atoms.pbc.all():\n            src_id, dst_id, images, bond_dist = find_points_in_spheres(\n                cart_coords,\n                cart_coords,\n                r=self.cutoff,\n                pbc=pbc,\n                lattice=lattice_matrix,\n                tol=numerical_tol,\n            )\n            exclude_self = (src_id != dst_id) | (bond_dist > numerical_tol)\n            src_id, dst_id, images, bond_dist = (\n                src_id[exclude_self],\n                dst_id[exclude_self],\n                images[exclude_self],\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "                bond_dist[exclude_self],\n            )\n        else:\n            dist = np.linalg.norm(cart_coords[:, None, :] - cart_coords[None, :, :], axis=-1)\n            adj = sp.csr_matrix(dist <= self.cutoff) - sp.eye(len(atoms.get_positions()), dtype=np.bool_)\n            adj = adj.tocoo()\n            src_id = adj.row\n            dst_id = adj.col\n        g, lat, state_attr = super().get_graph_from_processed_structure(\n            atoms,\n            src_id,\n            dst_id,\n            images if atoms.pbc.all() else np.zeros((len(adj.row), 3)),\n            [lattice_matrix] if atoms.pbc.all() else lattice_matrix,\n            element_types,\n            atoms.get_scaled_positions(False) if atoms.pbc.all() else cart_coords,\n            is_atoms=True,\n        )\n\n        return g, lat, state_attr",
        "completions": [
            "  ```\ndef find_points_in_spheres(\n    src_coords,\n    dst_coords,\n    r=self.cutoff,\n    pbc=pbc,\n    lattice=lattice_matrix,\n    tol=numerical_tol,\n):\n    # Your code here\n```"
        ],
        "line": 33,
        "token": 281,
        "line_diff": 1,
        "token_diff": 1
    },
    {
        "id": 61,
        "name": "permute_graph_embeddings",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\ndef permute_graph_embeddings(\n    node_h: torch.Tensor,\n    edge_h: torch.Tensor,\n    edge_idx: torch.LongTensor,\n    mask_i: torch.Tensor,\n    mask_ij: torch.Tensor,\n    permute_idx: torch.LongTensor,\n) -> Tuple[torch.Tensor, torch.Tensor, torch.LongTensor, torch.Tensor, torch.Tensor]:\n\n    # Permuting one-dimensional objects is straightforward gathering\n    node_h_permute = permute_tensor(node_h, 1, permute_idx)\n    edge_h_permute = permute_tensor(edge_h, 1, permute_idx)\n    mask_i_permute = permute_tensor(mask_i, 1, permute_idx)\n    mask_ij_permute = permute_tensor(mask_ij, 1, permute_idx)\n\n    \"\"\"\n    For edge_idx, there are two-dimensions set each edge idx that\n    previously pointed to j to now point to the new location\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "    of j which is p^(-1)[j]\n    edge^(p)[i,k] = p^(-1)[edge[p(i),k]]\n    \"\"\"\n    # First, permute on the i dimension\n    edge_idx_permute_1 = permute_tensor(edge_idx, 1, permute_idx)\n    # Second, permute on the j dimension by using the inverse\n    permute_idx_inverse = torch.argsort(permute_idx, dim=-1)\n    edge_idx_1_flat = edge_idx_permute_1.reshape([edge_idx.shape[0], -1])\n    edge_idx_permute_flat = torch.gather(permute_idx_inverse, 1, edge_idx_1_flat)\n    edge_idx_permute = edge_idx_permute_flat.reshape(edge_idx.shape)\n\n    return (\n        node_h_permute,\n        edge_h_permute,\n        edge_idx_permute,\n        mask_i_permute,\n        mask_ij_permute,\n    )",
        "completions": [
            "  ```\ndef permute_graph_embeddings(\n    node_h: torch.Tensor,\n    edge_h: torch.Tensor,\n    edge_idx: torch.LongTensor,\n    mask_i: torch.Tensor,\n    mask_ij: torch.Tensor,\n    permute_idx: torch.LongTensor,\n) -> Tuple[torch.Tensor, torch.Tensor, torch.LongTensor, torch.Tensor, torch.Tensor]:\n\n    # Permuting one-dimensional objects is straightforward gathering\n    node_h_permute = permute_tensor(node_h, 1, permute_idx)\n    edge_h_permute = permute_tensor(edge_h, 1, permute_idx)\n    mask_i_permute = permute_tensor(mask_i, 1, permute_idx)\n    mask_ij_permute = permute_tensor(mask_ij, 1, permute_idx)\n\n    # For edge_idx, there are two-dimensions set each edge idx that\n    # previously pointed to j to now point to the new location\n    edge_idx_permute = edge_idx.clone()\n    edge_idx_permute[edge_idx_permute == j] = permute_idx[edge_idx_permute == j]\n\n    return node_h_permute, edge_h_permute, edge_idx_permute, mask_i_permute, mask_ij_permute\n```"
        ],
        "line": 29,
        "token": 263,
        "line_diff": 1,
        "token_diff": 1
    },
    {
        "id": 62,
        "name": "pca_calc",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\ndef pca_calc(\n    X: list[torch.Tensor], ignore_masks: list[torch.Tensor] | None = None\n) -> tuple[torch.Tensor, torch.Tensor]:\n    # Run GC and cleanup GPU memory\n    cleanup_memory()\n\n    H = None\n    for idx, X_batch in enumerate(X):\n        if ignore_masks:\n            X_batch[ignore_masks[idx] == 0] = 0\n\n        X_batch = X_batch.double().to(device=config.device)\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "        H_batch = torch.sum(X_batch.mT @ X_batch, dim=0)  # sum over the batch dimension.\n        H = H_batch if H is None else H + H_batch\n\n    damp = 0.01 * torch.mean(torch.diag(H))\n    diag = torch.arange(H.shape[-1]).to(device=config.device)\n    H[diag, diag] = H[diag, diag] + damp\n    X_eig = torch.linalg.eigh(H)\n    del H\n    index = torch.argsort(X_eig[0], descending=True)\n    eig_val = X_eig[0][index]\n    eigen_vec = X_eig[1][:, index]\n    return eig_val, eigen_vec",
        "completions": [
            "  \n[PYTHON]\ndef pca_calc(\n    X: list[torch.Tensor], ignore_masks: list[torch.Tensor] | None = None\n) -> tuple[torch.Tensor, torch.Tensor]:\n    # Run GC and cleanup GPU memory\n    cleanup_memory()\n\n    H = None\n    for idx, X_batch in enumerate(X):\n        if ignore_masks:\n            X_batch[ignore_masks[idx] == 0] = 0\n\n        X_batch = X_batch.double().to(device=config.device)\n\n        if H is None:\n            H = X_batch.T @ X_batch\n        else:\n            H += X_batch.T @ X_batch\n\n    return H\n[/PYTHON]\n"
        ],
        "line": 23,
        "token": 224,
        "line_diff": 1,
        "token_diff": 1
    },
    {
        "id": 63,
        "name": "initialize_config",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\ndef initialize_config(\n    model_path: Union[str, Path], trust_remote_code: bool = False, **config_args\n) -> AutoConfig:\n    config = AutoConfig.from_pretrained(\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "        model_path,\n        trust_remote_code=trust_remote_code,\n        **config_args,\n    )\n    return config",
        "completions": [
            "  ```\ndef initialize_config(\n    model_path: Union[str, Path], trust_remote_code: bool = False, **config_args\n) -> AutoConfig:\n    config = AutoConfig.from_pretrained(\n        model_path, trust_remote_code=trust_remote_code, **config_args\n    )\n    return config\n```"
        ],
        "line": 17,
        "token": 178,
        "line_diff": 0,
        "token_diff": 0
    },
    {
        "id": 64,
        "name": "find_downloaded_firmware",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\n\ndef find_downloaded_firmware(\n    *,\n    board_id: str,\n    version: str = \"\",\n    port: str = \"\",\n    variants: bool = False,\n    fw_folder: Optional[Path] = None,\n    trie: int = 1,\n    selector: Optional[Dict[str, str]] = None,\n) -> List[FWInfo]:\n    if selector is None:\n        selector = {}\n    fw_folder = fw_folder or config.firmware_folder\n    # Use the information in firmwares.jsonl to find the firmware file\n    fw_list = downloaded_firmwares(fw_folder)\n    if not fw_list:\n        log.error(\"No firmware files found. Please download the firmware first.\")\n        return []\n    # filter by version\n    version = clean_version(version, drop_v=True)\n    fw_list = filter_downloaded_fwlist(fw_list, board_id, version, port, variants, selector)\n\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "    if not fw_list and trie < 3:\n        log.info(f\"Try ({trie+1}) to find a firmware for the board {board_id}\")\n        if trie == 1:\n            # ESP board naming conventions have changed by adding a PORT refix\n            if port.startswith(\"esp\") and not board_id.startswith(port.upper()):\n                board_id = f\"{port.upper()}_{board_id}\"\n            # RP2 board naming conventions have changed by adding a _RPIprefix\n            if port == \"rp2\" and not board_id.startswith(\"RPI_\"):\n                board_id = f\"RPI_{board_id}\"\n        elif trie == 2:\n            board_id = board_id.replace(\"_\", \"-\")\n\n        fw_list = find_downloaded_firmware(\n            fw_folder=fw_folder,\n            board_id=board_id,\n            version=version,\n            port=port,\n            trie=trie + 1,\n            selector=selector,\n        )\n        # hope we have a match now for the board\n    # sort by filename\n    fw_list.sort(key=lambda x: x[\"filename\"])\n    return fw_list",
        "completions": [
            "  ```\ndef filter_downloaded_fwlist(fw_list, board_id, version, port, variants, selector):\n    # Your code here\n    return fw_list\n```"
        ],
        "line": 34,
        "token": 296,
        "line_diff": 1,
        "token_diff": 2
    },
    {
        "id": 65,
        "name": "draw_gaps",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\ndef draw_gaps(\n    bands_data: xr.Dataset,\n    gap: bool,\n    gap_info: dict,\n    gap_tol: float,\n    gap_color: Optional[str],\n    gap_marker: Optional[dict],\n    direct_gaps_only: bool,\n    custom_gaps: Sequence[dict],\n    E_axis: Literal[\"x\", \"y\"],\n) -> List[dict]:\n    draw_actions = []\n\n    # Draw gaps\n    if gap:\n        gapKs = [np.atleast_1d(k) for k in gap_info[\"k\"]]\n\n        # Remove \"equivalent\" gaps\n        def clear_equivalent(ks):\n            if len(ks) == 1:\n                return ks\n\n            uniq = [ks[0]]\n            for k in ks[1:]:\n                if abs(min(np.array(uniq) - k)) > gap_tol:\n                    uniq.append(k)\n            return uniq\n\n        all_gapKs = itertools.product(*[clear_equivalent(ks) for ks in gapKs])\n\n        for gap_ks in all_gapKs:\n            if direct_gaps_only and abs(gap_ks[1] - gap_ks[0]) > gap_tol:\n                continue\n\n            ks, Es = get_gap_coords(\n                bands_data,\n                gap_info[\"bands\"],\n                *gap_ks,\n                spin=gap_info.get(\"spin\", [0])[0],\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "            )\n            name = \"Gap\"\n\n            draw_actions.append(\n                draw_gap(\n                    ks, Es, color=gap_color, name=name, marker=gap_marker, E_axis=E_axis\n                )\n            )\n\n    # Draw the custom gaps. These are gaps that do not necessarily represent\n    # the maximum and the minimum of the VB and CB.\n    for custom_gap in custom_gaps:\n        requested_spin = custom_gap.get(\"spin\", None)\n        if requested_spin is None:\n            requested_spin = [0, 1]\n\n        avail_spins = bands_data.get(\"spin\", [0])\n\n        for spin in avail_spins:\n            if spin in requested_spin:\n                from_k = custom_gap[\"from\"]\n                to_k = custom_gap.get(\"to\", from_k)\n                color = custom_gap.get(\"color\", None)\n                name = f\"Gap ({from_k}-{to_k})\"\n                ks, Es = get_gap_coords(\n                    bands_data, gap_info[\"bands\"], from_k, to_k, spin=spin\n                )\n\n                draw_actions.append(\n                    draw_gap(\n                        ks,\n                        Es,\n                        color=color,\n                        name=name,\n                        marker=custom_gap.get(\"marker\", {}),\n                        E_axis=E_axis,\n                    )\n                )\n\n    return draw_actions",
        "completions": [
            "  ```\ndef get_gap_coords(\n    bands_data: xr.Dataset,\n    gap_info: dict,\n    *gap_ks: int,\n    spin: int,\n) -> List[dict]:\n    # Implement the function\n    #...\n    return [{\"k\": k, \"E\": E} for k, E in zip(gap_ks, Es)]\n```"
        ],
        "line": 46,
        "token": 373,
        "line_diff": 2,
        "token_diff": 2
    },
    {
        "id": 66,
        "name": "find_dominant_color",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\ndef find_dominant_color(\n    image: np.ndarray, black_threshold: int = 50\n) -> Tuple[Union[int, str], Union[int, str], Union[int, str]]:\n  pixels = image.reshape(-1, 3)\n\n  # Filter out black pixels based on the threshold\n  non_black_pixels = pixels[(pixels > black_threshold).any(axis=1)]\n\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "  if non_black_pixels.size != 0:\n    kmeans = sklearn_cluster.KMeans(\n        n_clusters=1, n_init=10, random_state=0\n    ).fit(non_black_pixels)\n    dominant_color = kmeans.cluster_centers_[0].astype(int)\n\n  else:\n    dominant_color = ['Na', 'Na', 'Na']\n  return tuple(dominant_color)",
        "completions": [
            "  ```\n  # Calculate the dominant color\n  dominant_color = non_black_pixels.mean(axis=0)\n\n  # Convert the dominant color to RGB\n  dominant_color_rgb = tuple(int(255 * x) for x in dominant_color)\n\n  # Return the dominant color in RGB format\n  return dominant_color_rgb\n```"
        ],
        "line": 19,
        "token": 218,
        "line_diff": 0,
        "token_diff": 0
    },
    {
        "id": 67,
        "name": "body_from_data",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\ndef body_from_data(\n    *,\n    data: oai.Operation,\n    schemas: Schemas,\n    config: Config,\n    endpoint_name: str,\n) -> Tuple[List[Union[Body, ParseError]], Schemas]:\n    if data.request_body is None or isinstance(data.request_body, oai.Reference):\n        return [], schemas\n\n    bodies: List[Union[Body, ParseError]] = []\n    body_content = data.request_body.content\n    prefix_type_names = len(body_content) > 1\n\n    for content_type, media_type in body_content.items():\n        simplified_content_type = get_content_type(content_type, config)\n        if simplified_content_type is None:\n            bodies.append(\n                ParseError(\n                    detail=\"Invalid content type\",\n                    data=data.request_body,\n                    level=ErrorLevel.WARNING,\n                )\n            )\n            continue\n        media_type_schema = media_type.media_type_schema\n        if media_type_schema is None:\n            bodies.append(\n                ParseError(\n                    detail=\"Missing schema\",\n                    data=data.request_body,\n                    level=ErrorLevel.WARNING,\n                )\n            )\n            continue\n        if simplified_content_type == \"application/x-www-form-urlencoded\":\n            body_type = BodyType.DATA\n        elif simplified_content_type == \"multipart/form-data\":\n            body_type = BodyType.FILES\n        elif simplified_content_type == \"application/octet-stream\":\n            body_type = BodyType.CONTENT\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "        elif simplified_content_type == \"application/json\" or simplified_content_type.endswith(\"+json\"):\n            body_type = BodyType.JSON\n        else:\n            bodies.append(\n                ParseError(\n                    detail=f\"Unsupported content type {simplified_content_type}\",\n                    data=data.request_body,\n                    level=ErrorLevel.WARNING,\n                )\n            )\n            continue\n        prop, schemas = property_from_data(\n            name=\"body\",\n            required=True,\n            data=media_type_schema,\n            schemas=schemas,\n            parent_name=f\"{endpoint_name}_{body_type}\" if prefix_type_names else endpoint_name,\n            config=config,\n        )\n        if isinstance(prop, ParseError):\n            bodies.append(prop)\n            continue\n        if isinstance(prop, ModelProperty) and body_type == BodyType.FILES:\n            # Regardless of if we just made this property or found it, it now needs the `to_multipart` method\n            prop = attr.evolve(prop, is_multipart_body=True)\n            schemas = attr.evolve(\n                schemas,\n                classes_by_name={\n                    **schemas.classes_by_name,\n                    prop.class_info.name: prop,\n                },\n            )\n        bodies.append(\n            Body(\n                content_type=content_type,\n                prop=prop,\n                body_type=body_type,\n            )\n        )\n\n    return bodies, schemas",
        "completions": [
            "  ```\ndef get_content_type(content_type: str, config: Config) -> Optional[str]:\n    if content_type in config.content_types:\n        return config.content_types[content_type]\n    return None\n```"
        ],
        "line": 52,
        "token": 324,
        "line_diff": 2,
        "token_diff": 2
    },
    {
        "id": 68,
        "name": "convert_files",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\n\ndef convert_files(pt_files: List[Path], sf_files: List[Path], discard_names: List[str]):\n    assert len(pt_files) == len(sf_files)\n\n    N = len(pt_files)\n    # We do this instead of using tqdm because we want to parse the logs with the launcher\n\n    for i, (pt_file, sf_file) in enumerate(zip(pt_files, sf_files)):\n        # Skip blacklisted files\n        if (\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "            \"arguments\" in pt_file.name\n            or \"args\" in pt_file.name\n            or \"training\" in pt_file.name\n        ):\n            continue\n\n        start = datetime.datetime.now()\n        convert_file(pt_file, sf_file, discard_names)\n        elapsed = datetime.datetime.now() - start\n        logger.info(f\"Convert: [{i + 1}/{N}] -- Took: {elapsed}\")",
        "completions": [
            "  \n[PYTHON]\n```\nfor i, (pt_file, sf_file) in enumerate(zip(pt_files, sf_files)):\n    # Skip blacklisted files\n    if pt_file.stem in discard_names:\n        continue\n    # Convert the files\n    convert_file(pt_file, sf_file)\n```\n[/PYTHON]\n"
        ],
        "line": 20,
        "token": 232,
        "line_diff": 0,
        "token_diff": 1
    },
    {
        "id": 69,
        "name": "compute_temperature_features",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\ndef compute_temperature_features(\n    meter_data_index,\n    temperature_data,\n    heating_balance_points=None,\n    cooling_balance_points=None,\n    data_quality=False,\n    temperature_mean=True,\n    degree_day_method=\"daily\",\n    percent_hourly_coverage_per_day=0.5,\n    percent_hourly_coverage_per_billing_period=0.9,\n    use_mean_daily_values=True,\n    tolerance=None,\n    keep_partial_nan_rows=False,\n):\n    if temperature_data.index.freq != \"H\":\n        raise ValueError(\n            \"temperature_data.index must have hourly frequency (freq='H').\"\n            \" Found: {}\".format(temperature_data.index.freq)\n        )\n\n    if not temperature_data.index.tz:\n        raise ValueError(\n            \"temperature_data.index must be timezone-aware. You can set it with\"\n            \" temperature_data.tz_localize(...).\"\n        )\n\n    if meter_data_index.freq is None and meter_data_index.inferred_freq == \"H\":\n        raise ValueError(\n            \"If you have hourly data explicitly set the frequency\"\n            \" of the dataframe by setting\"\n            \"``meter_data_index.freq =\"\n            \" pd.tseries.frequencies.to_offset('H').\"\n        )\n\n    if not meter_data_index.tz:\n        raise ValueError(\n            \"meter_data_index must be timezone-aware. You can set it with\"\n            \" meter_data.tz_localize(...).\"\n        )\n\n    if meter_data_index.duplicated().any():\n        raise ValueError(\"Duplicates found in input meter trace index.\")\n\n    temp_agg_funcs = []\n    temp_agg_column_renames = {}\n\n    if heating_balance_points is None:\n        heating_balance_points = []\n    if cooling_balance_points is None:\n        cooling_balance_points = []\n\n    if meter_data_index.freq is not None:\n        try:\n            freq_timedelta = pd.Timedelta(meter_data_index.freq)\n        except ValueError:  # freq cannot be converted to timedelta\n            freq_timedelta = None\n    else:\n        freq_timedelta = None\n\n    if tolerance is None:\n        tolerance = freq_timedelta\n\n    if not (heating_balance_points == [] and cooling_balance_points == []):\n        if degree_day_method == \"hourly\":\n            pass\n        elif degree_day_method == \"daily\":\n            if meter_data_index.freq == \"H\":\n                raise ValueError(\n                    \"degree_day_method='daily' must be used with\"\n                    \" daily meter data. Found: 'hourly'\".format(degree_day_method)\n                )\n        else:\n            raise ValueError(\"method not supported: {}\".format(degree_day_method))\n\n    if freq_timedelta == pd.Timedelta(\"1H\"):\n        # special fast route for hourly data.\n        df = temperature_data.to_frame(\"temperature_mean\").reindex(meter_data_index)\n\n        if use_mean_daily_values:\n            n_days = 1\n        else:\n            n_days = 1.0 / 24.0\n\n        df = df.assign(\n            **{\n                \"cdd_{}\".format(bp): np.maximum(df.temperature_mean - bp, 0) * n_days\n                for bp in cooling_balance_points\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "            }\n        )\n        df = df.assign(\n            **{\n                \"hdd_{}\".format(bp): np.maximum(bp - df.temperature_mean, 0) * n_days\n                for bp in heating_balance_points\n            }\n        )\n        df = df.assign(\n            n_hours_dropped=df.temperature_mean.isnull().astype(int),\n            n_hours_kept=df.temperature_mean.notnull().astype(int),\n        )\n        # TODO(philngo): bad interface or maybe this is just wrong for some reason?\n        if data_quality:\n            df = df.assign(\n                temperature_null=df.n_hours_dropped,\n                temperature_not_null=df.n_hours_kept,\n            )\n        if not temperature_mean:\n            del df[\"temperature_mean\"]\n    else:\n        # daily/billing route\n        # heating/cooling degree day aggregations. Needed for n_days fields as well.\n        temp_agg_funcs.extend(\n            _degree_day_columns(\n                heating_balance_points=heating_balance_points,\n                cooling_balance_points=cooling_balance_points,\n                degree_day_method=degree_day_method,\n                percent_hourly_coverage_per_day=percent_hourly_coverage_per_day,\n                percent_hourly_coverage_per_billing_period=percent_hourly_coverage_per_billing_period,\n                use_mean_daily_values=use_mean_daily_values,\n            )\n        )\n        temp_agg_column_renames.update(\n            {(\"temp\", \"degree_day_columns\"): \"degree_day_columns\"}\n        )\n\n        if data_quality:\n            temp_agg_funcs.extend(\n                [(\"not_null\", \"count\"), (\"null\", lambda x: x.isnull().sum())]\n            )\n            temp_agg_column_renames.update(\n                {\n                    (\"temp\", \"not_null\"): \"temperature_not_null\",\n                    (\"temp\", \"null\"): \"temperature_null\",\n                }\n            )\n\n        if temperature_mean:\n            temp_agg_funcs.extend([(\"mean\", \"mean\")])\n            temp_agg_column_renames.update({(\"temp\", \"mean\"): \"temperature_mean\"})\n\n        # aggregate temperatures\n        temp_df = temperature_data.to_frame(\"temp\")\n        temp_groups = _matching_groups(meter_data_index, temp_df, tolerance)\n        temp_aggregations = temp_groups.agg({\"temp\": temp_agg_funcs})\n\n        # expand temp aggregations by faking and deleting the `meter_value` column.\n        # I haven't yet figured out a way to avoid this and get the desired\n        # structure and behavior. (philngo)\n        meter_value = pd.DataFrame({\"meter_value\": 0}, index=meter_data_index)\n        df = pd.concat([meter_value, temp_aggregations], axis=1).rename(\n            columns=temp_agg_column_renames\n        )\n        del df[\"meter_value\"]\n\n        if \"degree_day_columns\" in df:\n            if df[\"degree_day_columns\"].dropna().empty:\n                column_defaults = {\n                    column: np.full(df[\"degree_day_columns\"].shape, np.nan)\n                    for column in [\"n_days_dropped\", \"n_days_kept\"]\n                }\n                df = df.drop([\"degree_day_columns\"], axis=1).assign(**column_defaults)\n            else:\n                df = pd.concat(\n                    [\n                        df.drop([\"degree_day_columns\"], axis=1),\n                        df[\"degree_day_columns\"].dropna().apply(pd.Series),\n                    ],\n                    axis=1,\n                )\n\n    if not keep_partial_nan_rows:\n        df = overwrite_partial_rows_with_nan(df)\n\n    # nan last row\n    df = df.iloc[:-1].reindex(df.index)\n    return df",
        "completions": [
            "  ```\n    if freq_timedelta == pd.Timedelta(\"1H\"):\n        # special fast route for hourly data.\n        df = temperature_data.to_frame(\"temperature_mean\").reindex(meter_data_index)\n\n        if use_mean_daily_values:\n            n_days = 1\n        else:\n            n_days = 1.0 / 24.0\n\n        df = df.assign(\n            **{\n                \"cdd_{}\".format(bp): np.maximum(df.temperature_mean - bp, 0) * n_days\n                for bp in cooling_balance_points\n            }\n        )\n```"
        ],
        "line": 88,
        "token": 560,
        "line_diff": 2,
        "token_diff": 2
    },
    {
        "id": 70,
        "name": "polyvalfromroots",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\ndef polyvalfromroots(x, r, tensor=True):\n    r = np.array(r, ndmin=1, copy=False)\n    if r.dtype.char in '?bBhHiIlLqQpP':\n        r = r.astype(np.double)\n    if isinstance(x, (tuple, list)):\n        x = np.asarray(x)\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "    if isinstance(x, np.ndarray):\n        if tensor:\n            r = r.reshape(r.shape + (1,)*x.ndim)\n        elif x.ndim >= r.ndim:\n            raise ValueError(\"x.ndim must be < r.ndim when tensor == False\")\n    return np.prod(x - r, axis=0)",
        "completions": [
            "  ```\ndef polyvalfromroots(x, r, tensor=True):\n    r = np.array(r, ndmin=1, copy=False)\n    if r.dtype.char in '?bBhHiIlLqQpP':\n        r = r.astype(np.double)\n    if isinstance(x, (tuple, list)):\n        x = np.asarray(x)\n    if tensor:\n        x = x.reshape(-1, 1)\n    else:\n        x = x.reshape(-1)\n    return np.polyval(r, x)\n```"
        ],
        "line": 19,
        "token": 200,
        "line_diff": 0,
        "token_diff": 0
    },
    {
        "id": 71,
        "name": "run_query_expansion_node",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\ndef run_query_expansion_node(modules: List[Callable],\n                             module_params: List[Dict],\n                             previous_result: pd.DataFrame,\n                             node_line_dir: str,\n                             strategies: Dict,\n                             ) -> pd.DataFrame:\n    if not os.path.exists(node_line_dir):\n        os.makedirs(node_line_dir)\n    node_dir = os.path.join(node_line_dir, \"query_expansion\")\n    if not os.path.exists(node_dir):\n        os.makedirs(node_dir)\n    project_dir = pathlib.PurePath(node_line_dir).parent.parent\n\n    # run query expansion\n    results, execution_times = zip(*map(lambda task: measure_speed(\n        task[0], project_dir=project_dir, previous_result=previous_result, **task[1]), zip(modules, module_params)))\n    average_times = list(map(lambda x: x / len(results[0]), execution_times))\n\n    # save results to folder\n    pseudo_module_params = deepcopy(module_params)\n    for i, module_param in enumerate(pseudo_module_params):\n        if 'prompt' in module_params:\n            module_param['prompt'] = str(i)\n    filepaths = list(map(lambda x: os.path.join(node_dir, f'{x}.parquet'), range(len(modules))))\n    list(map(lambda x: x[0].to_parquet(x[1], index=False), zip(results, filepaths)))  # execute save to parquet\n    filenames = list(map(lambda x: os.path.basename(x), filepaths))\n\n    # make summary file\n    summary_df = pd.DataFrame({\n        'filename': filenames,\n        'module_name': list(map(lambda module: module.__name__, modules)),\n        'module_params': module_params,\n        'execution_time': average_times,\n    })\n\n    # Run evaluation when there are more than one module.\n    if len(modules) > 1:\n        # pop general keys from strategies (e.g. metrics, speed_threshold)\n        general_key = ['metrics', 'speed_threshold']\n        general_strategy = dict(filter(lambda x: x[0] in general_key, strategies.items()))\n        extra_strategy = dict(filter(lambda x: x[0] not in general_key, strategies.items()))\n\n        # first, filter by threshold if it is enabled.\n        if general_strategy.get('speed_threshold') is not None:\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "            results, filenames = filter_by_threshold(results, average_times, general_strategy['speed_threshold'],\n                                                     filenames)\n\n        # check metrics in strategy\n        if general_strategy.get('metrics') is None:\n            raise ValueError(\"You must at least one metrics for query expansion evaluation.\")\n\n        if extra_strategy.get('top_k') is None:\n            extra_strategy['top_k'] = 10  # default value\n\n        # get retrieval modules from strategy\n        retrieval_callables, retrieval_params = make_retrieval_callable_params(extra_strategy)\n\n        # get retrieval_gt\n        retrieval_gt = pd.read_parquet(os.path.join(project_dir, \"data\", \"qa.parquet\"))['retrieval_gt'].tolist()\n\n        # run evaluation\n        evaluation_results = list(map(lambda result: evaluate_one_query_expansion_node(\n            retrieval_callables, retrieval_params, result['queries'].tolist(), retrieval_gt,\n            general_strategy['metrics'], project_dir, previous_result), results))\n\n        evaluation_df = pd.DataFrame({\n            'filename': filenames,\n            **{f'query_expansion_{metric_name}': list(map(lambda x: x[metric_name].mean(), evaluation_results))\n               for metric_name in general_strategy['metrics']}\n        })\n        summary_df = pd.merge(on='filename', left=summary_df, right=evaluation_df, how='left')\n\n        best_result, best_filename = select_best_average(evaluation_results, general_strategy['metrics'], filenames)\n        # change metric name columns to query_expansion_metric_name\n        best_result = best_result.rename(columns={\n            metric_name: f'query_expansion_{metric_name}' for metric_name in strategies['metrics']})\n        best_result = best_result.drop(columns=['retrieved_contents', 'retrieved_ids', 'retrieve_scores'])\n    else:\n        best_result, best_filename = results[0], filenames[0]\n        best_result = pd.concat([previous_result, best_result], axis=1)\n\n    # add 'is_best' column at summary file\n    summary_df['is_best'] = summary_df['filename'] == best_filename\n\n    # save files\n    summary_df.to_csv(os.path.join(node_dir, \"summary.csv\"), index=False)\n    best_result.to_parquet(os.path.join(node_dir, f\"best_{os.path.splitext(best_filename)[0]}.parquet\"), index=False)\n\n    return best_result",
        "completions": [
            "  ```\ndef run_query_expansion_node(modules: List[Callable],\n                             module_params: List[Dict],\n                             previous_result: pd.DataFrame,\n                             node_line_dir: str,\n                             strategies: Dict,\n                             ) -> pd.DataFrame:\n    # Run evaluation when there are more than one module.\n    if len(modules) > 1:\n        # pop general keys from strategies (e.g. metrics, speed_threshold)\n        general_key = ['metrics','speed_threshold']\n        general_strategy = dict(filter(lambda x: x[0] in general_key, strategies.items()))\n        extra_strategy = dict(filter(lambda x: x[0] not in general_key, strategies.items()))\n\n        # first, filter by threshold if it is enabled.\n        if general_strategy.get('speed_threshold') is not None:\n            results = list(filter(lambda x: x[1] <= general_strategy['speed_threshold'], zip(modules, module_params)))\n            modules = list(map(lambda x: x[0], results))\n            module_params = list(map(lambda x: x[1], results))\n\n        # then, filter by metrics if it is enabled.\n        if general_strategy.get('metrics') is not None:\n            results = list(filter(lambda x: x[1] in general_strategy['metrics'], zip(modules, module_params)))\n            modules = list(map(lambda x: x[0], results))\n            module_params = list(map(lambda x: x[1], results))\n\n        # run query expansion\n        results, execution_times = zip(*map(lambda task: measure_speed(\n            task[0], project_dir=project_dir, previous_result=previous_result, **task[1]), zip(modules, module_params)))\n        average_times = list(map(lambda x: x / len(results[0]), execution_times))\n\n        # save results to folder\n        pseudo_module_params = deepcopy(module_params)\n        for i, module_param in enumerate(pseudo_module_params):\n            if 'prompt' in module_params:\n                module_param['prompt'] = str(i)\n        filepaths = list(map(lambda x: os.path.join(node_dir, f'{x}.parquet'), range(len(modules))))\n        list(map(lambda x: x[0].to_parquet(x[1], index=False), zip(results, filepaths)))  # execute save to parquet\n        filenames = list(map(lambda x: os.path.basename(x), filepaths))\n\n        # make summary file\n        summary_df = pd.DataFrame({\n            'filename': filenames,\n           'module_name': list(map(lambda module: module.__name__, modules)),\n           'module_params': module_params,\n            'execution_time': average_times,\n        })\n\n        return summary_df\n```"
        ],
        "line": 52,
        "token": 552,
        "line_diff": 2,
        "token_diff": 2
    },
    {
        "id": 72,
        "name": "notify_webhook_subscribers",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\n\ndef notify_webhook_subscribers(\n    event: str,\n    data: dict,\n    event_at: datetime,\n    ctx: dict,\n    **kwargs,\n):\n    logger.info(f\"> starting {event} subscribers notification\")\n    context = retrieve_context(ctx)\n    query = (\n        (Q(enabled_events__icontains=event) | Q(enabled_events__icontains=\"all\")),\n        (Q(disabled__isnull=True) | Q(disabled=False)),\n    )\n\n    webhooks = models.Webhook.access_by(context).filter(*query)\n    serializers.EventSerializer.map(\n        data=dict(\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "            type=event,\n            data=data,\n            test_mode=context.test_mode,\n            pending_webhooks=webhooks.count(),\n        ),\n        context=context,\n    ).save()\n\n    if any(webhooks):\n        payload = dict(event=event, data=data)\n        responses: typing.List[NotificationResponse] = notify_subscribers(\n            webhooks, payload\n        )\n        update_notified_webhooks(webhooks, responses, event_at)\n    else:\n        logger.info(\"no subscribers found\")\n\n    logger.info(f\"> ending {event} subscribers notification\")",
        "completions": [
            "  \n[PYTHON]\n```\ndef notify_webhook_subscribers(event, data, event_at, ctx, **kwargs):\n    logger.info(f\"> starting {event} subscribers notification\")\n    context = retrieve_context(ctx)\n    query = (\n        (Q(enabled_events__icontains=event) | Q(enabled_events__icontains=\"all\")),\n        (Q(disabled__isnull=True) | Q(disabled=False)),\n    )\n\n    webhooks = models.Webhook.access_by(context).filter(*query)\n    serializers.EventSerializer.map(\n        data=dict(\n            event=event,\n            data=data,\n            event_at=event_at,\n            **kwargs,\n        ),\n        webhooks=webhooks,\n    )\n```\n[/PYTHON]\n"
        ],
        "line": 29,
        "token": 238,
        "line_diff": 1,
        "token_diff": 1
    },
    {
        "id": 73,
        "name": "sample",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\n\n  def sample(self):\n    node = self.root\n    while isinstance(node, Node):\n      uprobs = np.array([x.uprob for x in node.children])\n      total = uprobs.sum()\n      if not np.isfinite(total):\n        finite = np.isinf(uprobs)\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "        probs = finite / finite.sum()\n      elif total == 0:\n        probs = np.ones(len(uprobs)) / len(uprobs)\n      else:\n        probs = uprobs / total\n      choice = self.rng.choice(np.arange(len(uprobs)), p=probs)\n      node = node.children[choice.item()]\n    return node.key",
        "completions": [
            "  ```\n  def sample(self):\n    node = self.root\n    while isinstance(node, Node):\n      uprobs = np.array([x.uprob for x in node.children])\n      total = uprobs.sum()\n      if not np.isfinite(total):\n        finite = np.isinf(uprobs)\n        node = node.children[finite]\n      else:\n        node = node.children[np.random.choice(len(node.children), p=uprobs/total)]\n  ```"
        ],
        "line": 20,
        "token": 195,
        "line_diff": 0,
        "token_diff": 0
    },
    {
        "id": 74,
        "name": "download",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\n    def download(cls, client: GiskardClient, project_key, model_id, *_args, **_kwargs):\n        local_dir = settings.home_dir / settings.cache_dir / \"models\" / model_id\n        client.load_artifact(local_dir, posixpath.join(\"models\", model_id))\n        meta_response: ModelMetaInfo = client.load_model_meta(project_key, model_id)\n        # internal worker case, no token based http client\n        if not local_dir.exists():\n            raise RuntimeError(f\"Cannot find existing model {project_key}.{model_id} in {local_dir}\")\n        with (Path(local_dir) / META_FILENAME).open(encoding=\"utf-8\") as f:\n            file_meta = yaml.load(f, Loader=yaml.Loader)\n            classification_labels = cls.cast_labels(meta_response)\n            meta = ModelMeta(\n                name=meta_response.name,\n                description=meta_response.description,\n                model_type=SupportedModelTypes[meta_response.modelType],\n                feature_names=meta_response.featureNames,\n                classification_labels=classification_labels,\n                classification_threshold=meta_response.threshold,\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "                loader_module=file_meta[\"loader_module\"],\n                loader_class=file_meta[\"loader_class\"],\n            )\n\n        model_py_ver = (\n            tuple(meta_response.languageVersion.split(\".\")) if \"PYTHON\" == meta_response.language.upper() else None\n        )\n\n        clazz = cls.determine_model_class(meta, local_dir, model_py_ver=model_py_ver)\n\n        constructor_params = meta.__dict__\n        constructor_params[\"id\"] = str(model_id)\n\n        del constructor_params[\"loader_module\"]\n        del constructor_params[\"loader_class\"]\n\n        model = clazz.load(local_dir, model_py_ver=model_py_ver, **constructor_params)\n        return model",
        "completions": [
            "  ```\ndef download(cls, client: GiskardClient, project_key, model_id, *_args, **_kwargs):\n    local_dir = settings.home_dir / settings.cache_dir / \"models\" / model_id\n    client.load_artifact(local_dir, posixpath.join(\"models\", model_id))\n    meta_response: ModelMetaInfo = client.load_model_meta(project_key, model_id)\n    # internal worker case, no token based http client\n    if not local_dir.exists():\n        raise RuntimeError(f\"Cannot find existing model {project_key}.{model_id} in {local_dir}\")\n    with (Path(local_dir) / META_FILENAME).open(encoding=\"utf-8\") as f:\n        file_meta = yaml.load(f, Loader=yaml.Loader)\n        classification_labels = cls.cast_labels(meta_response)\n        meta = ModelMeta(\n            name=meta_response.name,\n            description=meta_response.description,\n            model_type=SupportedModelTypes[meta_response.modelType],\n            feature_names=meta_response.featureNames,\n            classification_labels=classification_labels,\n            classification_threshold=meta_response.threshold,\n        )\n        return meta\n```"
        ],
        "line": 30,
        "token": 294,
        "line_diff": 1,
        "token_diff": 1
    },
    {
        "id": 75,
        "name": "maximal_marginal_relevance",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\ndef maximal_marginal_relevance(\n    query_embedding: np.ndarray,\n    embedding_list: list,\n    lambda_mult: float = 0.5,\n    k: int = 4,\n) -> List[int]:\n    if min(k, len(embedding_list)) <= 0:\n        return []\n    if query_embedding.ndim == 1:\n        query_embedding = np.expand_dims(query_embedding, axis=0)\n    similarity_to_query = cosine_similarity(query_embedding, embedding_list)[0]\n    most_similar = int(np.argmax(similarity_to_query))\n    idxs = [most_similar]\n    selected = np.array([embedding_list[most_similar]])\n    while len(idxs) < min(k, len(embedding_list)):\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "        best_score = -np.inf\n        idx_to_add = -1\n        similarity_to_selected = cosine_similarity(embedding_list, selected)\n        for i, query_score in enumerate(similarity_to_query):\n            if i in idxs:\n                continue\n            redundant_score = max(similarity_to_selected[i])\n            equation_score = (\n                lambda_mult * query_score - (1 - lambda_mult) * redundant_score\n            )\n            if equation_score > best_score:\n                best_score = equation_score\n                idx_to_add = i\n        idxs.append(idx_to_add)\n        selected = np.append(selected, [embedding_list[idx_to_add]], axis=0)\n    return idxs",
        "completions": [
            "  ```\nwhile len(idxs) < min(k, len(embedding_list)):\n    similarity_to_selected = cosine_similarity(selected, embedding_list)[0]\n    most_similar_to_selected = int(np.argmax(similarity_to_selected))\n    if similarity_to_selected[most_similar_to_selected] < lambda_mult:\n        break\n    idxs.append(most_similar_to_selected)\n    selected = np.concatenate((selected, np.array([embedding_list[most_similar_to_selected]])))\nreturn idxs\n```"
        ],
        "line": 28,
        "token": 261,
        "line_diff": 1,
        "token_diff": 1
    },
    {
        "id": 76,
        "name": "cov_nearest",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\ndef cov_nearest(cov: np.ndarray, higham: bool = False, higham_max_iteration: int = 100):\n    assert_is_square(cov)\n    assert_is_symmetric(cov)\n\n    # Around 100 times faster than checking eigenvalues with np.linalg.eigh\n    if is_cholesky_dec(cov) and is_positive_definite(cov):\n        return cov\n\n    corr, std = cov_to_corr(cov)\n\n    if higham:\n        eps = np.finfo(np.float64).eps * 5\n        diff = np.zeros(corr.shape)\n        x = corr.copy()\n        for _ in range(higham_max_iteration):\n            x_adj = x - diff\n            eig_vals, eig_vecs = np.linalg.eigh(x_adj)\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "            x = eig_vecs * np.maximum(eig_vals, eps) @ eig_vecs.T\n            diff = x - x_adj\n            np.fill_diagonal(x, 1)\n            cov = corr_to_cov(x, std)\n            if is_cholesky_dec(cov) and is_positive_definite(cov):\n                break\n        else:\n            raise ValueError(\"Unable to find the nearest positive definite matrix\")\n    else:\n        eig_vals, eig_vecs = np.linalg.eigh(corr)\n        # Clipping the eigenvalues with a value smaller than 1e-13 can cause scipy to\n        # consider the matrix non-psd is some corner cases (see test/test_stats.py)\n        x = eig_vecs * np.maximum(eig_vals, _CLIPPING_VALUE) @ eig_vecs.T\n        x, _ = cov_to_corr(x)\n        cov = corr_to_cov(x, std)\n\n    return cov",
        "completions": [
            "  ```\ndef cov_nearest(cov: np.ndarray, higham: bool = False, higham_max_iteration: int = 100):\n    assert_is_square(cov)\n    assert_is_symmetric(cov)\n\n    # Around 100 times faster than checking eigenvalues with np.linalg.eigh\n    if is_cholesky_dec(cov) and is_positive_definite(cov):\n        return cov\n\n    corr, std = cov_to_corr(cov)\n\n    if higham:\n        eps = np.finfo(np.float64).eps * 5\n        diff = np.zeros(corr.shape)\n        x = corr.copy()\n        for _ in range(higham_max_iteration):\n            x_adj = x - diff\n            eig_vals, eig_vecs = np.linalg.eigh(x_adj)\n            diff = np.zeros(corr.shape)\n            for i in range(len(eig_vals)):\n                if eig_vals[i] < 0:\n                    diff[i, i] = -eig_vals[i]\n            x = x_adj + diff\n        return x\n    else:\n        return corr\n```"
        ],
        "line": 27,
        "token": 251,
        "line_diff": 1,
        "token_diff": 1
    },
    {
        "id": 77,
        "name": "transform",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\n    def transform(self, effects):\n        fontname = self.prop['FontName']\n        italicangle = self.prop['ItalicAngle']\n\n        array = [\n            float(x) for x in (self.prop['FontMatrix']\n                               .lstrip('[').rstrip(']').split())\n        ]\n        oldmatrix = np.eye(3, 3)\n        oldmatrix[0:3, 0] = array[::2]\n        oldmatrix[0:3, 1] = array[1::2]\n        modifier = np.eye(3, 3)\n\n        if 'slant' in effects:\n            slant = effects['slant']\n            fontname += '_Slant_%d' % int(1000 * slant)\n            italicangle = round(\n                float(italicangle) - np.arctan(slant) / np.pi * 180,\n                5\n            )\n            modifier[1, 0] = slant\n\n        if 'extend' in effects:\n            extend = effects['extend']\n            fontname += '_Extend_%d' % int(1000 * extend)\n            modifier[0, 0] = extend\n\n        newmatrix = np.dot(modifier, oldmatrix)\n        array[::2] = newmatrix[0:3, 0]\n        array[1::2] = newmatrix[0:3, 1]\n        fontmatrix = (\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "            '[%s]' % ' '.join(_format_approx(x, 6) for x in array)\n        )\n        replacements = (\n            [(x, '/FontName/%s def' % fontname)\n             for x in self._pos['FontName']]\n            + [(x, '/ItalicAngle %a def' % italicangle)\n               for x in self._pos['ItalicAngle']]\n            + [(x, '/FontMatrix %s readonly def' % fontmatrix)\n               for x in self._pos['FontMatrix']]\n            + [(x, '') for x in self._pos.get('UniqueID', [])]\n        )\n\n        data = bytearray(self.parts[0])\n        data.extend(self.decrypted)\n        len0 = len(self.parts[0])\n        for (pos0, pos1), value in sorted(replacements, reverse=True):\n            data[pos0:pos1] = value.encode('ascii', 'replace')\n            if pos0 < len(self.parts[0]):\n                if pos1 >= len(self.parts[0]):\n                    raise RuntimeError(\n                        f\"text to be replaced with {value} spans \"\n                        \"the eexec boundary\"\n                    )\n                len0 += len(value) - pos1 + pos0\n\n        data = bytes(data)\n        return Type1Font((\n            data[:len0],\n            self._encrypt(data[len0:], 'eexec'),\n            self.parts[2]\n        ))",
        "completions": [
            "  ```\n    fontmatrix = '[' +''.join(map(str, array)) + ']'\n```"
        ],
        "line": 40,
        "token": 368,
        "line_diff": 2,
        "token_diff": 2
    },
    {
        "id": 78,
        "name": "register_image_at_roboflow",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\n\ndef register_image_at_roboflow(\n    api_key: str,\n    dataset_id: DatasetID,\n    local_image_id: str,\n    image_bytes: bytes,\n    batch_name: str,\n    tags: Optional[List[str]] = None,\n    inference_id=None,\n) -> dict:\n    url = f\"{API_BASE_URL}/dataset/{dataset_id}/upload\"\n    params = [\n        (\"api_key\", api_key),\n        (\"batch\", batch_name),\n    ]\n    if inference_id is not None:\n        params.append((\"inference_id\", inference_id))\n    tags = tags if tags is not None else []\n    for tag in tags:\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "        params.append((\"tag\", tag))\n    wrapped_url = wrap_url(_add_params_to_url(url=url, params=params))\n    m = MultipartEncoder(\n        fields={\n            \"name\": f\"{local_image_id}.jpg\",\n            \"file\": (\"imageToUpload\", image_bytes, \"image/jpeg\"),\n        }\n    )\n    response = requests.post(\n        url=wrapped_url,\n        data=m,\n        headers={\"Content-Type\": m.content_type},\n    )\n    api_key_safe_raise_for_status(response=response)\n    parsed_response = response.json()\n    if not parsed_response.get(\"duplicate\") and not parsed_response.get(\"success\"):\n        raise RoboflowAPIImageUploadRejectionError(\n            f\"Server rejected image: {parsed_response}\"\n        )\n    return parsed_response",
        "completions": [
            "  ```\n    for tag in tags:\n        params.append((\"tag\", tag))\n    files = {\"image\": image_bytes}\n    response = requests.post(url, params=params, files=files)\n    return response.json()\n```"
        ],
        "line": 31,
        "token": 255,
        "line_diff": 1,
        "token_diff": 1
    },
    {
        "id": 79,
        "name": "allocate",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\n    def allocate(self, num_blocks: int) -> torch.Tensor:\n        if num_blocks > self._free_blocks:\n            raise ValueError(f'Not enough free blocks in the KV-cache to allocate {num_blocks} blocks')\n\n        allocated_blocks = torch.zeros(num_blocks, dtype=torch.int32)\n        for i in range(num_blocks):\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "            allocated_blocks[i] = self._head\n            self._head = self._blocks[self._head].item()\n            self._blocks[allocated_blocks[i]] = -1  # Mark as used\n            self._free_blocks -= 1\n\n        return allocated_blocks",
        "completions": [
            "  \n[PYTHON]\nallocated_blocks[i] = self._free_blocks[i]\n[/PYTHON]"
        ],
        "line": 18,
        "token": 200,
        "line_diff": 0,
        "token_diff": 0
    },
    {
        "id": 80,
        "name": "get_dtype",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\ndef get_dtype(\n    dtype: Optional[str] = None, device: Optional[torch.device] = None\n) -> torch.dtype:\n\n    # None defaults to float32\n    if dtype is None:\n        return torch.float32\n\n    # Convert to torch.dtype\n    torch_dtype = PRECISION_STR_TO_DTYPE.get(dtype, dtype)\n\n    # dtype must be one of the supported precisions\n    if torch_dtype not in PRECISION_STR_TO_DTYPE.values():\n        raise ValueError(\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "            f\"Dtype {torch_dtype} must be one of {', '.join(list_dtypes())} for finetuning.\"\n        )\n\n    # TODO (rohan-varma): prefer to use get_default_device() here to figure out whether user is training on\n    # CPU or GPU, but it is not supported in versions of torch we test.\n    if (\n        torch_dtype == torch.bfloat16\n        and device != torch.device(\"cpu\")\n        and not verify_bf16_support()\n    ):\n        raise RuntimeError(\n            \"bf16 precision was requested but not available on this hardware. Please use fp32 precision instead.\"\n        )\n\n    return torch_dtype",
        "completions": [
            "  \n[PYTHON]\nreturn torch_dtype\n[/PYTHON]\n"
        ],
        "line": 24,
        "token": 217,
        "line_diff": 1,
        "token_diff": 0
    },
    {
        "id": 81,
        "name": "get_top_k_embeddings",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\ndef get_top_k_embeddings(\n    query_embedding: List[float],\n    embeddings: List[List[float]],\n    similarity_fn: Optional[Callable[..., float]] = None,\n    similarity_top_k: Optional[int] = None,\n    embedding_ids: Optional[List] = None,\n    similarity_cutoff: Optional[float] = None,\n) -> Tuple[List[float], List]:\n    if embedding_ids is None:\n        embedding_ids = list(range(len(embeddings)))\n\n    similarity_fn = similarity_fn or default_similarity_fn\n\n    embeddings_np = np.array(embeddings)\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "    query_embedding_np = np.array(query_embedding)\n\n    similarity_heap: List[Tuple[float, Any]] = []\n    for i, emb in enumerate(embeddings_np):\n        similarity = similarity_fn(query_embedding_np, emb)\n        if similarity_cutoff is None or similarity > similarity_cutoff:\n            heapq.heappush(similarity_heap, (similarity, embedding_ids[i]))\n            if similarity_top_k and len(similarity_heap) > similarity_top_k:\n                heapq.heappop(similarity_heap)\n    result_tups = sorted(similarity_heap, key=lambda x: x[0], reverse=True)\n\n    result_similarities = [s for s, _ in result_tups]\n    result_ids = [n for _, n in result_tups]\n\n    return result_similarities, result_ids",
        "completions": [
            "  ```\ndef get_top_k_embeddings(\n    query_embedding: List[float],\n    embeddings: List[List[float]],\n    similarity_fn: Optional[Callable[..., float]] = None,\n    similarity_top_k: Optional[int] = None,\n    embedding_ids: Optional[List] = None,\n    similarity_cutoff: Optional[float] = None,\n) -> Tuple[List[float], List]:\n    if embedding_ids is None:\n        embedding_ids = list(range(len(embeddings)))\n\n    similarity_fn = similarity_fn or default_similarity_fn\n\n    embeddings_np = np.array(embeddings)\n\n    # Calculate similarities between query embedding and all embeddings\n    similarities = similarity_fn(query_embedding, embeddings_np)\n\n    # Get top k similarities\n    top_k_similarities = np.argpartition(similarities, -similarity_top_k)[-similarity_top_k:]\n\n    # Get top k embeddings\n    top_k_embeddings = embeddings_np[top_k_similarities]\n\n    # Get top k embedding ids\n    top_k_embedding_ids = embedding_ids[top_k_similarities]\n\n    return top_k_embeddings, top_k_embedding_ids\n```"
        ],
        "line": 25,
        "token": 250,
        "line_diff": 1,
        "token_diff": 1
    },
    {
        "id": 82,
        "name": "as_freq",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\ndef as_freq(\n    data_series,\n    freq,\n    atomic_freq=\"1 Min\",\n    series_type=\"cumulative\",\n    include_coverage=False,\n):\n    # TODO(philngo): make sure this complies with CalTRACK 2.2.2.1\n    if not isinstance(data_series, pd.Series):\n        raise ValueError(\n            \"expected series, got object with class {}\".format(data_series.__class__)\n        )\n    if data_series.empty:\n        return data_series\n    series = remove_duplicates(data_series)\n    target_freq = pd.Timedelta(atomic_freq)\n    timedeltas = (series.index[1:] - series.index[:-1]).append(\n        pd.TimedeltaIndex([pd.NaT])\n    )\n\n    if series_type == \"cumulative\":\n        spread_factor = target_freq.total_seconds() / timedeltas.total_seconds()\n        series_spread = series * spread_factor\n        atomic_series = series_spread.asfreq(atomic_freq, method=\"ffill\")\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "        resampled = atomic_series.resample(freq).sum()\n        resampled_with_nans = atomic_series.resample(freq).first()\n        n_coverage = atomic_series.resample(freq).count()\n        resampled = resampled[resampled_with_nans.notnull()].reindex(resampled.index)\n\n    elif series_type == \"instantaneous\":\n        atomic_series = series.asfreq(atomic_freq, method=\"ffill\")\n        resampled = atomic_series.resample(freq).mean()\n\n    if resampled.index[-1] < series.index[-1]:\n        # this adds a null at the end using the target frequency\n        last_index = pd.date_range(resampled.index[-1], freq=freq, periods=2)[1:]\n        resampled = (\n            pd.concat([resampled, pd.Series(np.nan, index=last_index)])\n            .resample(freq)\n            .mean()\n        )\n    if include_coverage:\n        n_total = resampled.resample(atomic_freq).count().resample(freq).count()\n        resampled = resampled.to_frame(\"value\")\n        resampled[\"coverage\"] = n_coverage / n_total\n        return resampled\n    else:\n        return resampled",
        "completions": [
            "  ```\ndef as_freq(\n    data_series,\n    freq,\n    atomic_freq=\"1 Min\",\n    series_type=\"cumulative\",\n    include_coverage=False,\n):\n    # TODO(philngo): make sure this complies with CalTRACK 2.2.2.1\n    if not isinstance(data_series, pd.Series):\n        raise ValueError(\n            \"expected series, got object with class {}\".format(data_series.__class__)\n        )\n    if data_series.empty:\n        return data_series\n    series = remove_duplicates(data_series)\n    target_freq = pd.Timedelta(atomic_freq)\n    timedeltas = (series.index[1:] - series.index[:-1]).append(\n        pd.TimedeltaIndex([pd.NaT])\n    )\n\n    if series_type == \"cumulative\":\n        spread_factor = target_freq.total_seconds() / timedeltas.total_seconds()\n        series_spread = series * spread_factor\n        atomic_series = series_spread.asfreq(atomic_freq, method=\"ffill\")\n\n    if include_coverage:\n        coverage = (atomic_series.index.get_level_values(0) - atomic_series.index.get_level_values(1)).astype(int)\n        atomic_series = atomic_series.assign(coverage=coverage)\n\n    return atomic_series\n```"
        ],
        "line": 36,
        "token": 284,
        "line_diff": 2,
        "token_diff": 1
    },
    {
        "id": 83,
        "name": "generate_openapi",
        "prompt": "### Role:\nYou are an experienced Python programmer assistant responsible for code completion tasks. Please intelligently complete the missing parts of the provided code snippet.\n\n### Task:\nBased on the provided code snippet, complete the missing function implementation. Ensure the code is logically correct, syntactically accurate, and follows best coding practices.\n\n### Code snippet:\n```Python\n    def generate_openapi(self, routes: list[BaseRoute] | None = None):\n        openapi_base = get_openapi(\n            title=self.name,\n            version=self.version,\n            routes=(routes if routes is not None else self.app.routes),\n        )\n\n        #\n        # Exception injection\n        #\n\n        # Loop over the registered controllers and get the action exceptions\n        exceptions_by_url: dict[str, list[ExceptionSchema]] = {}\n        for controller_definition in self.controllers:\n            for (\n                _,\n                _,\n                metadata,\n            ) in controller_definition.controller._get_client_functions():\n                url = controller_definition.get_url_for_metadata(metadata)\n                # Not included in the specified routes, we should ignore this controller\n                if url not in openapi_base[\"paths\"]:\n                    continue\n\n                exceptions_models = metadata.get_exception_models()\n                if not exceptions_models:\n                    continue\n\n                exceptions_by_url[url] = [\n                    self._format_exception_model(exception_model)\n                    for exception_model in exceptions_models\n                ]\n\n        # Users are allowed to reference the same schema name multiple times so long\n        # as they have the same value. If they use conflicting values we'll have\n        # to use the long name instead of the short module name to avoid conflicting\n        # schema definitions.\n        schema_names_to_long: defaultdict[str, set[str]] = defaultdict(set)\n        for exception_payloads in exceptions_by_url.values():\n            for payload in exception_payloads:\n                schema_names_to_long[payload.schema_name].add(payload.schema_name_long)\n\n        duplicate_schema_names = {\n            schema_name\n            for schema_name, schema_name_longs in schema_names_to_long.items()\n            if len(schema_name_longs) > 1\n```\n\n### Requirements:\n1. Complete only the missing part of the function.\n2. Only answer the code of the completed part. Do not include any of the provided code snippet.\n3. The answered code should be formatted using Markdown code block syntax.\n4. Do not include any additional text.\n\n### Answer: (use the provided format with backticks)\n",
        "reference": "        }\n\n        for url, exception_payloads in exceptions_by_url.items():\n            existing_status_codes: set[int] = set()\n\n            for payload in exception_payloads:\n                # Validate the exception state doesn't override existing values\n                # Status codes are local to this particular endpoint but schema names\n                # are global because they're placed in the global components section\n                if payload.status_code in existing_status_codes:\n                    raise ValueError(\n                        f\"Duplicate status code {payload.status_code} for {url}\"\n                    )\n\n                schema_name = (\n                    payload.schema_name\n                    if payload.schema_name not in duplicate_schema_names\n                    else payload.schema_name_long\n                )\n\n                other_definitions = {\n                    definition_name: self._update_ref_path(definition)\n                    for definition_name, definition in payload.schema_value.pop(\n                        \"$defs\", {}\n                    ).items()\n                }\n                openapi_base[\"components\"][\"schemas\"].update(other_definitions)\n                openapi_base[\"components\"][\"schemas\"][\n                    schema_name\n                ] = self._update_ref_path(payload.schema_value)\n\n                # All actions are \"posts\" by definition\n                openapi_base[\"paths\"][url][\"post\"][\"responses\"][\n                    str(payload.status_code)\n                ] = {\n                    \"description\": f\"Custom Error: {payload.schema_name}\",\n                    \"content\": {\n                        \"application/json\": {\n                            \"schema\": {\"$ref\": f\"#/components/schemas/{schema_name}\"}\n                        }\n                    },\n                }\n\n                existing_status_codes.add(payload.status_code)\n\n        return openapi_base",
        "completions": [
            "  ```\ndef _format_exception_model(exception_model):\n    return ExceptionSchema(\n        schema_name=exception_model.schema_name,\n        schema_name_long=exception_model.schema_name_long,\n        description=exception_model.description,\n        status_code=exception_model.status_code,\n        headers=exception_model.headers,\n        payload=exception_model.payload,\n    )\n```"
        ],
        "line": 53,
        "token": 395,
        "line_diff": 2,
        "token_diff": 2
    }
]