{"SOURCE": "codesearchnet", "instruction": "Can you tell what is the following Python 3 function doing\ndef parse_files(self, fls):\n\n for f in fls:\n # Make sure paths exists\n if os.path.exists(f):\n self._parser(f)\n else:\n logger.warning(\"File {} does not exist\".format(f))", "response": "Private method for parsing the output files."} {"SOURCE": "codesearchnet", "instruction": "Here you have a function in Python 3, explain what it does\ndef iter_filter(self, filters, databases=None, fields=None,\n filter_behavior=\"and\"):\n \"\"\"General purpose filter iterator.\n\n This general filter iterator allows the filtering of entries based\n on one or more custom filters. These filters must contain\n an entry of the `storage` attribute, a comparison operator, and the\n test value. For example, to filter out entries with coverage below 80::\n\n my_filter = [\"coverage\", \">=\", 80]\n\n Filters should always be provide as a list of lists::\n\n iter_filter([[\"coverage\", \">=\", 80]])\n # or\n my_filters = [[\"coverage\", \">=\", 80],\n [\"identity\", \">=\", 50]]\n\n iter_filter(my_filters)\n\n As a convenience, a list of the desired databases can be directly\n specified using the `database` argument, which will only report\n entries for the specified databases::\n\n iter_filter(my_filters, databases=[\"plasmidfinder\"])\n\n By default, this method will yield the complete entry record. However,\n the returned filters can be specified using the `fields` option::\n\n iter_filter(my_filters, fields=[\"reference\", \"coverage\"])\n\n Parameters\n ----------\n filters : list\n List of lists with the custom filter. Each list should have three\n elements. (1) the key from the entry to be compared; (2) the\n comparison operator; (3) the test value. Example:\n ``[[\"identity\", \">\", 80]]``.\n databases : list\n List of databases that should be reported.\n fields : list\n List of fields from each individual entry that are yielded.\n filter_behavior : str\n options: ``'and'`` ``'or'``\n Sets the behaviour of the filters, if multiple filters have been\n provided. By default it is set to ``'and'``, which means that an\n entry has to pass all filters. It can be set to ``'or'``, in which\n case one one of the filters has to pass.\n\n yields\n ------\n dic : dict\n Dictionary object containing a :py:attr:`Abricate.storage` entry\n that passed the filters.\n\n \"\"\"\n\n if filter_behavior not in [\"and\", \"or\"]:\n raise ValueError(\"Filter behavior must be either 'and' or 'or'\")\n\n for dic in self.storage.values():\n\n # This attribute will determine whether an entry will be yielded\n # or not\n _pass = False\n\n # Stores the flags with the test results for each filter\n # The results will be either True or False\n flag = []\n\n # Filter for databases\n if databases:\n # Skip entry if not in specified database\n if dic[\"database\"] not in databases:\n continue\n\n # Apply filters\n for f in filters:\n # Get value of current filter\n val = dic[f[0]]\n if not self._test_truth(val, f[1], f[2]):\n flag.append(False)\n else:\n flag.append(True)\n\n # Test whether the entry will pass based on the test results\n # and the filter behaviour\n if filter_behavior == \"and\":\n if all(flag):\n _pass = True\n elif filter_behavior == \"or\":\n if any(flag):\n _pass = True\n\n if _pass:\n if fields:\n yield dict((x, y) for x, y in dic.items() if x in fields)\n else:\n yield dic", "response": "This method returns a generator that yields the complete entry record for the specified filters."} {"SOURCE": "codesearchnet", "instruction": "Can you tell what is the following Python 3 function doing\ndef _get_contig_id(contig_str):\n\n contig_id = contig_str\n\n try:\n contig_id = re.search(\".*NODE_([0-9]*)_.*\", contig_str).group(1)\n except AttributeError:\n pass\n\n try:\n contig_id = re.search(\".*Contig_([0-9]*)_.*\", contig_str).group(1)\n except AttributeError:\n pass\n\n return contig_id", "response": "Tries to retrieve contig id. Returns the original string if it is unable to retrieve the id."} {"SOURCE": "codesearchnet", "instruction": "Make a summary of the following Python 3 code\ndef get_plot_data(self):\n\n json_dic = {\"plotData\": []}\n sample_dic = {}\n sample_assembly_map = {}\n\n for entry in self.storage.values():\n\n sample_id = re.match(\"(.*)_abr\", entry[\"log_file\"]).groups()[0]\n if sample_id not in sample_dic:\n sample_dic[sample_id] = {}\n\n # Get contig ID using the same regex as in `assembly_report.py`\n # template\n contig_id = self._get_contig_id(entry[\"reference\"])\n # Get database\n database = entry[\"database\"]\n if database not in sample_dic[sample_id]:\n sample_dic[sample_id][database] = []\n\n # Update the sample-assembly correspondence dict\n if sample_id not in sample_assembly_map:\n sample_assembly_map[sample_id] = entry[\"infile\"]\n\n sample_dic[sample_id][database].append(\n {\"contig\": contig_id,\n \"seqRange\": entry[\"seq_range\"],\n \"gene\": entry[\"gene\"].replace(\"'\", \"\"),\n \"accession\": entry[\"accession\"],\n \"coverage\": entry[\"coverage\"],\n \"identity\": entry[\"identity\"],\n },\n )\n\n for sample, data in sample_dic.items():\n json_dic[\"plotData\"].append(\n {\n \"sample\": sample,\n \"data\": {\"abricateXrange\": data},\n \"assemblyFile\": sample_assembly_map[sample]\n }\n )\n\n return json_dic", "response": "Generates the JSON report to plot the gene boxes of the current assembly."} {"SOURCE": "codesearchnet", "instruction": "Can you tell what is the following Python 3 function doing\ndef write_report_data(self):\n\n json_plot = self.get_plot_data()\n json_table = self.get_table_data()\n\n json_dic = {**json_plot, **json_table}\n\n with open(\".report.json\", \"w\") as json_report:\n json_report.write(json.dumps(json_dic, separators=(\",\", \":\")))", "response": "Writes the JSON report to a json file"} {"SOURCE": "codesearchnet", "instruction": "Make a summary of the following Python 3 code\ndef main(sample_id, assembly_file, coverage_bp_file=None):\n\n logger.info(\"Starting assembly report\")\n assembly_obj = Assembly(assembly_file, sample_id)\n\n logger.info(\"Retrieving summary statistics for assembly\")\n assembly_obj.get_summary_stats(\"{}_assembly_report.csv\".format(sample_id))\n\n size_dist = [len(x) for x in assembly_obj.contigs.values()]\n json_dic = {\n \"tableRow\": [{\n \"sample\": sample_id,\n \"data\": [\n {\"header\": \"Contigs\",\n \"value\": assembly_obj.summary_info[\"ncontigs\"],\n \"table\": \"assembly\",\n \"columnBar\": True},\n {\"header\": \"Assembled BP\",\n \"value\": assembly_obj.summary_info[\"total_len\"],\n \"table\": \"assembly\",\n \"columnBar\": True},\n ]\n }],\n \"plotData\": [{\n \"sample\": sample_id,\n \"data\": {\n \"size_dist\": size_dist\n }\n }]\n }\n\n if coverage_bp_file:\n try:\n window = 2000\n gc_sliding_data = assembly_obj.get_gc_sliding(window=window)\n cov_sliding_data = \\\n assembly_obj.get_coverage_sliding(coverage_bp_file,\n window=window)\n\n # Get total basepairs based on the individual coverage of each\n # contig bpx\n total_bp = sum(\n [sum(x) for x in assembly_obj.contig_coverage.values()]\n )\n\n # Add data to json report\n json_dic[\"plotData\"][0][\"data\"][\"genomeSliding\"] = {\n \"gcData\": gc_sliding_data,\n \"covData\": cov_sliding_data,\n \"window\": window,\n \"xbars\": assembly_obj._get_window_labels(window),\n \"assemblyFile\": os.path.basename(assembly_file)\n }\n json_dic[\"plotData\"][0][\"data\"][\"sparkline\"] = total_bp\n\n except:\n logger.error(\"Unexpected error creating sliding window data:\\\\n\"\n \"{}\".format(traceback.format_exc()))\n\n # Write json report\n with open(\".report.json\", \"w\") as json_report:\n\n json_report.write(json.dumps(json_dic, separators=(\",\", \":\")))\n\n with open(\".status\", \"w\") as status_fh:\n status_fh.write(\"pass\")", "response": "Main function of the assembly_report template."} {"SOURCE": "codesearchnet", "instruction": "Make a summary of the following Python 3 code\ndef _parse_assembly(self, assembly_file):\n\n with open(assembly_file) as fh:\n\n header = None\n logger.debug(\"Starting iteration of assembly file: {}\".format(\n assembly_file))\n\n for line in fh:\n\n # Skip empty lines\n if not line.strip():\n continue\n\n if line.startswith(\">\"):\n # Add contig header to contig dictionary\n header = line[1:].strip()\n self.contigs[header] = []\n\n else:\n # Add sequence string for the current contig\n self.contigs[header].append(line.strip())\n\n # After populating the contigs dictionary, convert the values\n # list into a string sequence\n self.contigs = OrderedDict(\n (header, \"\".join(seq)) for header, seq in self.contigs.items())", "response": "This method parses an assembly file in fasta format and populates the self. contigs attribute with the data for each contig in the assembly."} {"SOURCE": "codesearchnet", "instruction": "How would you explain what the following Python 3 function does\ndef get_summary_stats(self, output_csv=None):\n\n contig_size_list = []\n\n self.summary_info[\"ncontigs\"] = len(self.contigs)\n\n for contig_id, sequence in self.contigs.items():\n\n logger.debug(\"Processing contig: {}\".format(contig_id))\n\n # Get contig sequence size\n contig_len = len(sequence)\n\n # Add size for average contig size\n contig_size_list.append(contig_len)\n\n # Add to total assembly length\n self.summary_info[\"total_len\"] += contig_len\n\n # Add to average gc\n self.summary_info[\"avg_gc\"].append(\n sum(map(sequence.count, [\"G\", \"C\"])) / contig_len\n )\n\n # Add to missing data\n self.summary_info[\"missing_data\"] += sequence.count(\"N\")\n\n # Get average contig size\n logger.debug(\"Getting average contig size\")\n self.summary_info[\"avg_contig_size\"] = \\\n sum(contig_size_list) / len(contig_size_list)\n\n # Get average gc content\n logger.debug(\"Getting average GC content\")\n self.summary_info[\"avg_gc\"] = \\\n sum(self.summary_info[\"avg_gc\"]) / len(self.summary_info[\"avg_gc\"])\n\n # Get N50\n logger.debug(\"Getting N50\")\n cum_size = 0\n for l in sorted(contig_size_list, reverse=True):\n cum_size += l\n if cum_size >= self.summary_info[\"total_len\"] / 2:\n self.summary_info[\"n50\"] = l\n break\n\n if output_csv:\n logger.debug(\"Writing report to csv\")\n # Write summary info to CSV\n with open(output_csv, \"w\") as fh:\n summary_line = \"{}, {}\\\\n\".format(\n self.sample, \",\".join(\n [str(x) for x in self.summary_info.values()]))\n fh.write(summary_line)", "response": "Generates a CSV report with summary statistics about the assembly and the contigs in the contigs list."} {"SOURCE": "codesearchnet", "instruction": "Can you implement a function in Python 3 that\nreturns the mapping between sliding window points and their contigs and their x - axis positions and the x - axis labels.", "response": "def _get_window_labels(self, window):\n \"\"\"Returns the mapping between sliding window points and their contigs,\n and the x-axis position of contig\n\n Parameters\n ----------\n window : int\n Size of the window.\n\n Returns\n -------\n xbars : list\n The x-axis position of the ending for each contig.\n labels : list\n The x-axis labels for each data point in the sliding window\n\n \"\"\"\n\n # Get summary stats, if they have not yet been triggered\n if not self.summary_info:\n self.get_summary_stats()\n\n # Get contig boundary positon\n c = 0\n xbars = []\n for contig, seq in self.contigs.items():\n contig_id = self._get_contig_id(contig)\n self.contig_boundaries[contig_id] = [c, c + len(seq)]\n c += len(seq)\n xbars.append((contig_id, c, contig))\n\n return xbars"} {"SOURCE": "codesearchnet", "instruction": "Can you generate the documentation for the following Python 3 function\ndef _gc_prop(s, length):\n\n gc = sum(map(s.count, [\"c\", \"g\"]))\n\n return gc / length", "response": "Get proportion of GC from a string\n Arbitrary string\n "} {"SOURCE": "codesearchnet", "instruction": "Given the following Python 3 function, write the documentation\ndef get_gc_sliding(self, window=2000):\n\n gc_res = []\n\n # Get complete sequence to calculate sliding window values\n complete_seq = \"\".join(self.contigs.values()).lower()\n\n for i in range(0, len(complete_seq), window):\n\n seq_window = complete_seq[i:i + window]\n\n # Get GC proportion\n gc_res.append(round(self._gc_prop(seq_window, len(seq_window)), 2))\n\n return gc_res", "response": "Calculates a sliding window of the GC content for each data point in the assembly\n "} {"SOURCE": "codesearchnet", "instruction": "Implement a function in Python 3 to\nwrite the report to a JSON file.", "response": "def write_json_report(sample_id, data1, data2):\n \"\"\"Writes the report\n\n Parameters\n ----------\n data1\n data2\n\n Returns\n -------\n\n \"\"\"\n\n parser_map = {\n \"base_sequence_quality\": \">>Per base sequence quality\",\n \"sequence_quality\": \">>Per sequence quality scores\",\n \"base_gc_content\": \">>Per sequence GC content\",\n \"base_n_content\": \">>Per base N content\",\n \"sequence_length_dist\": \">>Sequence Length Distribution\",\n \"per_base_sequence_content\": \">>Per base sequence content\"\n }\n\n json_dic = {\n \"plotData\": [{\n \"sample\": sample_id,\n \"data\": {\n \"base_sequence_quality\": {\"status\": None, \"data\": []},\n \"sequence_quality\": {\"status\": None, \"data\": []},\n \"base_gc_content\": {\"status\": None, \"data\": []},\n \"base_n_content\": {\"status\": None, \"data\": []},\n \"sequence_length_dist\": {\"status\": None, \"data\": []},\n \"per_base_sequence_content\": {\"status\": None, \"data\": []}\n }\n }]\n }\n\n for cat, start_str in parser_map.items():\n\n if cat == \"per_base_sequence_content\":\n fs = 1\n fe = 5\n else:\n fs = 1\n fe = 2\n\n report1, status1 = _get_quality_stats(data1, start_str,\n field_start=fs, field_end=fe)\n report2, status2 = _get_quality_stats(data2, start_str,\n field_start=fs, field_end=fe)\n\n status = None\n for i in [\"fail\", \"warn\", \"pass\"]:\n if i in [status1, status2]:\n status = i\n\n json_dic[\"plotData\"][0][\"data\"][cat][\"data\"] = [report1, report2]\n json_dic[\"plotData\"][0][\"data\"][cat][\"status\"] = status\n\n return json_dic"} {"SOURCE": "codesearchnet", "instruction": "Can you generate the documentation for the following Python 3 function\ndef get_trim_index(biased_list):\n\n # Return index 0 if there are no biased positions\n if set(biased_list) == {False}:\n return 0\n\n if set(biased_list[:5]) == {False}:\n return 0\n\n # Iterate over the biased_list array. Keep the iteration going until\n # we find a biased position with the two following positions unbiased\n # (e.g.: True, False, False).\n # When this condition is verified, return the last biased position\n # index for subsequent trimming.\n for i, val in enumerate(biased_list):\n if val and set(biased_list[i+1:i+3]) == {False}:\n return i + 1\n\n # If the previous iteration could not find and index to trim, it means\n # that the whole list is basically biased. Return the length of the\n # biased_list\n return len(biased_list)", "response": "Returns the index of the optimal trim position from a list of True elements."} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 function that can\nassess the optimal trim range for a given FastQC data file.", "response": "def trim_range(data_file):\n \"\"\"Assess the optimal trim range for a given FastQC data file.\n\n This function will parse a single FastQC data file, namely the\n *'Per base sequence content'* category. It will retrieve the A/T and G/C\n content for each nucleotide position in the reads, and check whether the\n G/C and A/T proportions are between 80% and 120%. If they are, that\n nucleotide position is marked as biased for future removal.\n\n Parameters\n ----------\n data_file: str\n Path to FastQC data file.\n\n Returns\n -------\n trim_nt: list\n List containing the range with the best trimming positions for the\n corresponding FastQ file. The first element is the 5' end trim index\n and the second element is the 3' end trim index.\n \"\"\"\n\n logger.debug(\"Starting trim range assessment\")\n\n # Target string for nucleotide bias assessment\n target_nuc_bias = \">>Per base sequence content\"\n logger.debug(\"Target string to start nucleotide bias assessment set to \"\n \"{}\".format(target_nuc_bias))\n # This flag will become True when gathering base proportion data\n # from file.\n gather = False\n\n # This variable will store a boolean array on the biased/unbiased\n # positions. Biased position will be True, while unbiased positions\n # will be False\n biased = []\n\n with open(data_file) as fh:\n\n for line in fh:\n # Start assessment of nucleotide bias\n if line.startswith(target_nuc_bias):\n # Skip comment line\n logger.debug(\"Found target string at line: {}\".format(line))\n next(fh)\n gather = True\n # Stop assessment when reaching end of target module\n elif line.startswith(\">>END_MODULE\") and gather:\n logger.debug(\"Stopping parsing at line: {}\".format(line))\n break\n elif gather:\n # Get proportions of each nucleotide\n g, a, t, c = [float(x) for x in line.strip().split()[1:]]\n # Get 'GC' and 'AT content\n gc = (g + 0.1) / (c + 0.1)\n at = (a + 0.1) / (t + 0.1)\n # Assess bias\n if 0.8 <= gc <= 1.2 and 0.8 <= at <= 1.2:\n biased.append(False)\n else:\n biased.append(True)\n\n logger.debug(\"Finished bias assessment with result: {}\".format(biased))\n\n # Split biased list in half to get the 5' and 3' ends\n biased_5end, biased_3end = biased[:int(len(biased)/2)],\\\n biased[int(len(biased)/2):][::-1]\n\n logger.debug(\"Getting optimal trim range from biased list\")\n trim_nt = [0, 0]\n # Assess number of nucleotides to clip at 5' end\n trim_nt[0] = get_trim_index(biased_5end)\n logger.debug(\"Optimal trim range at 5' end set to: {}\".format(trim_nt[0]))\n # Assess number of nucleotides to clip at 3' end\n trim_nt[1] = len(biased) - get_trim_index(biased_3end)\n logger.debug(\"Optimal trim range at 3' end set to: {}\".format(trim_nt[1]))\n\n return trim_nt"} {"SOURCE": "codesearchnet", "instruction": "Can you generate a brief explanation for the following Python 3 code\ndef get_sample_trim(p1_data, p2_data):\n\n sample_ranges = [trim_range(x) for x in [p1_data, p2_data]]\n\n # Get the optimal trim position for 5' end\n optimal_5trim = max([x[0] for x in sample_ranges])\n # Get optimal trim position for 3' end\n optimal_3trim = min([x[1] for x in sample_ranges])\n\n return optimal_5trim, optimal_3trim", "response": "This function returns the optimal read trim range from data files of paired - end FastQ reads."} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 script for\nparsing a FastQC summary report file and returns it as a dictionary.", "response": "def get_summary(summary_file):\n \"\"\"Parses a FastQC summary report file and returns it as a dictionary.\n\n This function parses a typical FastQC summary report file, retrieving\n only the information on the first two columns. For instance, a line could\n be::\n\n 'PASS\tBasic Statistics\tSH10762A_1.fastq.gz'\n\n This parser will build a dictionary with the string in the second column\n as a key and the QC result as the value. In this case, the returned\n ``dict`` would be something like::\n\n {\"Basic Statistics\": \"PASS\"}\n\n Parameters\n ----------\n summary_file: str\n Path to FastQC summary report.\n\n Returns\n -------\n summary_info: :py:data:`OrderedDict`\n Returns the information of the FastQC summary report as an ordered\n dictionary, with the categories as strings and the QC result as values.\n\n \"\"\"\n\n summary_info = OrderedDict()\n logger.debug(\"Retrieving summary information from file: {}\".format(\n summary_file))\n\n with open(summary_file) as fh:\n for line in fh:\n # Skip empty lines\n if not line.strip():\n continue\n # Populate summary info\n fields = [x.strip() for x in line.split(\"\\t\")]\n summary_info[fields[1]] = fields[0]\n\n logger.debug(\"Retrieved summary information from file: {}\".format(\n summary_info))\n\n return summary_info"} {"SOURCE": "codesearchnet", "instruction": "Can you generate the documentation for the following Python 3 function\ndef check_summary_health(summary_file, **kwargs):\n\n # Store the summary categories that cannot fail. If they fail, do not\n # proceed with this sample\n fail_sensitive = kwargs.get(\"fail_sensitive\", [\n \"Per base sequence quality\",\n \"Overrepresented sequences\",\n \"Sequence Length Distribution\",\n \"Per sequence GC content\"\n ])\n logger.debug(\"Fail sensitive categories: {}\".format(fail_sensitive))\n\n # Store summary categories that must pass. If they do not, do not proceed\n # with that sample\n must_pass = kwargs.get(\"must_pass\", [\n \"Per base N content\",\n \"Adapter Content\"\n ])\n logger.debug(\"Must pass categories: {}\".format(must_pass))\n\n warning_fail_sensitive = kwargs.get(\"warning_fail_sensitive\", [\n \"Per base sequence quality\",\n \"Overrepresented sequences\",\n\n ])\n\n warning_must_pass = kwargs.get(\"warning_must_pass\", [\n \"Per base sequence content\"\n ])\n\n # Get summary dictionary\n summary_info = get_summary(summary_file)\n\n # This flag will change to False if one of the tests fails\n health = True\n # List of failing categories\n failed = []\n # List of warning categories\n warning = []\n\n for cat, test in summary_info.items():\n\n logger.debug(\"Assessing category {} with result {}\".format(cat, test))\n\n # FAILURES\n # Check for fail sensitive\n if cat in fail_sensitive and test == \"FAIL\":\n health = False\n failed.append(\"{}:{}\".format(cat, test))\n logger.error(\"Category {} failed a fail sensitive \"\n \"category\".format(cat))\n\n # Check for must pass\n if cat in must_pass and test != \"PASS\":\n health = False\n failed.append(\"{}:{}\".format(cat, test))\n logger.error(\"Category {} failed a must pass category\".format(\n cat))\n\n # WARNINGS\n # Check for fail sensitive\n if cat in warning_fail_sensitive and test == \"FAIL\":\n warning.append(\"Failed category: {}\".format(cat))\n logger.warning(\"Category {} flagged at a fail sensitive \"\n \"category\".format(cat))\n\n if cat in warning_must_pass and test != \"PASS\":\n warning.append(\"Did not pass category: {}\".format(cat))\n logger.warning(\"Category {} flagged at a must pass \"\n \"category\".format(cat))\n\n # Passed all tests\n return health, failed, warning", "response": "Checks the health of a sample from the FastQC summary file."} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 script for\nparsing a bowtie log file and populate the self. n_reads self. align_0x self. align_1x and self. overall_rate attributes with the data from the log file.", "response": "def parse_log(self, bowtie_log):\n \"\"\"Parse a bowtie log file.\n\n This is a bowtie log parsing method that populates the\n :py:attr:`self.n_reads, self.align_0x, self.align_1x, self.align_mt1x and self.overall_rate` attributes with\n data from the log file.\n\n Disclamer: THIS METHOD IS HORRIBLE BECAUSE THE BOWTIE LOG IS HORRIBLE.\n\n The insertion of data on the attribytes is done by the\n :py:meth:`set_attribute method.\n\n Parameters\n ----------\n bowtie_log : str\n Path to the boetie log file.\n\n \"\"\"\n\n print(\"is here!\")\n\n # Regexes - thanks to https://github.com/ewels/MultiQC/blob/master/multiqc/modules/bowtie2/bowtie2.py\n regexes = {\n 'unpaired': {\n 'unpaired_aligned_none': r\"(\\\\d+) \\\\([\\\\d\\\\.]+%\\\\) aligned 0 times\",\n 'unpaired_aligned_one': r\"(\\\\d+) \\\\([\\\\d\\\\.]+%\\\\) aligned exactly 1 time\",\n 'unpaired_aligned_multi': r\"(\\\\d+) \\\\([\\\\d\\\\.]+%\\\\) aligned >1 times\"\n },\n 'paired': {\n 'paired_aligned_none': r\"(\\\\d+) \\\\([\\\\d\\\\.]+%\\\\) aligned concordantly 0 times\",\n 'paired_aligned_one': r\"(\\\\d+) \\\\([\\\\d\\\\.]+%\\\\) aligned concordantly exactly 1 time\",\n 'paired_aligned_multi': r\"(\\\\d+) \\\\([\\\\d\\\\.]+%\\\\) aligned concordantly >1 times\",\n 'paired_aligned_discord_one': r\"(\\\\d+) \\\\([\\\\d\\\\.]+%\\\\) aligned discordantly 1 time\",\n 'paired_aligned_discord_multi': r\"(\\\\d+) \\\\([\\\\d\\\\.]+%\\\\) aligned discordantly >1 times\",\n 'paired_aligned_mate_one': r\"(\\\\d+) \\\\([\\\\d\\\\.]+%\\\\) aligned exactly 1 time\",\n 'paired_aligned_mate_multi': r\"(\\\\d+) \\\\([\\\\d\\\\.]+%\\\\) aligned >1 times\",\n 'paired_aligned_mate_none': r\"(\\\\d+) \\\\([\\\\d\\\\.]+%\\\\) aligned 0 times\"\n }\n }\n\n #Missing parser for unpaired (not implemented in flowcraft yet)\n\n with open(bowtie_log, \"r\") as f:\n #Go through log file line by line\n for l in f:\n\n print(l)\n\n #total reads\n total = re.search(r\"(\\\\d+) reads; of these:\", l)\n print(total)\n if total:\n print(total)\n self.set_n_reads(total.group(1))\n\n\n # Paired end reads aka the pain\n paired = re.search(r\"(\\\\d+) \\\\([\\\\d\\\\.]+%\\\\) were paired; of these:\", l)\n if paired:\n paired_total = int(paired.group(1))\n\n paired_numbers = {}\n\n # Do nested loop whilst we have this level of indentation\n l = f.readline()\n while l.startswith(' '):\n for k, r in regexes['paired'].items():\n match = re.search(r, l)\n if match:\n paired_numbers[k] = int(match.group(1))\n l = f.readline()\n\n\n align_zero_times = paired_numbers['paired_aligned_none'] + paired_numbers['paired_aligned_mate_none']\n if align_zero_times:\n self.set_align_0x(align_zero_times)\n\n align_one_time = paired_numbers['paired_aligned_one'] + paired_numbers['paired_aligned_mate_one']\n if align_one_time:\n self.set_align_1x(align_one_time)\n\n align_more_than_one_time = paired_numbers['paired_aligned_multi'] + paired_numbers['paired_aligned_mate_multi']\n if align_more_than_one_time:\n self.set_align_mt1x(align_more_than_one_time)\n\n\n # Overall alignment rate\n overall = re.search(r\"([\\\\d\\\\.]+)% overall alignment rate\", l)\n if overall:\n self.overall_rate = float(overall.group(1))"} {"SOURCE": "codesearchnet", "instruction": "Can you generate a brief explanation for the following Python 3 code\ndef _parse_process_name(name_str):\n\n directives = None\n\n fields = name_str.split(\"=\")\n process_name = fields[0]\n\n if len(fields) == 2:\n _directives = fields[1].replace(\"'\", '\"')\n try:\n directives = json.loads(_directives)\n except json.decoder.JSONDecodeError:\n raise eh.ProcessError(\n \"Could not parse directives for process '{}'. The raw\"\n \" string is: {}\\n\"\n \"Possible causes include:\\n\"\n \"\\t1. Spaces inside directives\\n\"\n \"\\t2. Missing '=' symbol before directives\\n\"\n \"\\t3. Missing quotes (' or \\\") around directives\\n\"\n \"A valid example: process_name={{'cpus':'2'}}\".format(\n process_name, name_str))\n\n return process_name, directives", "response": "Parses the process name and its directives and returns the process name and its\n directives as a dictionary."} {"SOURCE": "codesearchnet", "instruction": "Implement a Python 3 function for\nbuilding the list of process connections for the next flow generator.", "response": "def _build_connections(self, process_list, ignore_dependencies,\n auto_dependency):\n \"\"\"Parses the process connections dictionaries into a process list\n\n This method is called upon instantiation of the NextflowGenerator\n class. Essentially, it sets the main input/output channel names of the\n processes so that they can be linked correctly.\n\n If a connection between two consecutive process is not possible due\n to a mismatch in the input/output types, it exits with an error.\n\n Returns\n -------\n\n \"\"\"\n\n logger.debug(\"=============================\")\n logger.debug(\"Building pipeline connections\")\n logger.debug(\"=============================\")\n\n logger.debug(\"Processing connections: {}\".format(process_list))\n\n for p, con in enumerate(process_list):\n\n logger.debug(\"Processing connection '{}': {}\".format(p, con))\n\n # Get lanes\n in_lane = con[\"input\"][\"lane\"]\n out_lane = con[\"output\"][\"lane\"]\n logger.debug(\"[{}] Input lane: {}\".format(p, in_lane))\n logger.debug(\"[{}] Output lane: {}\".format(p, out_lane))\n\n # Update the total number of lines of the pipeline\n if out_lane > self.lanes:\n self.lanes = out_lane\n\n # Get process names and directives for the output process\n p_in_name, p_out_name, out_directives = self._get_process_names(\n con, p)\n\n # Check if process is available or correctly named\n if p_out_name not in self.process_map:\n logger.error(colored_print(\n \"\\nThe process '{}' is not available.\"\n .format(p_out_name), \"red_bold\"))\n guess_process(p_out_name, self.process_map)\n sys.exit(1)\n\n # Instance output process\n out_process = self.process_map[p_out_name](template=p_out_name)\n\n # Update directives, if provided\n if out_directives:\n out_process.update_attributes(out_directives)\n\n # Set suffix strings for main input/output channels. Suffixes are\n # based on the lane and the arbitrary and unique process id\n # e.g.: 'process_1_1'\n input_suf = \"{}_{}\".format(in_lane, p)\n output_suf = \"{}_{}\".format(out_lane, p)\n logger.debug(\"[{}] Setting main channels with input suffix '{}'\"\n \" and output suffix '{}'\".format(\n p, input_suf, output_suf))\n out_process.set_main_channel_names(input_suf, output_suf, out_lane)\n\n # Instance input process, if it exists. In case of init, the\n # output process forks from the raw input user data\n if p_in_name != \"__init__\":\n # Create instance of input process\n in_process = self.process_map[p_in_name](template=p_in_name)\n # Test if two processes can be connected by input/output types\n logger.debug(\"[{}] Testing connection between input and \"\n \"output processes\".format(p))\n self._test_connection(in_process, out_process)\n out_process.parent_lane = in_lane\n else:\n # When the input process is __init__, set the parent_lane\n # to None. This will tell the engine that this process\n # will receive the main input from the raw user input.\n out_process.parent_lane = None\n logger.debug(\"[{}] Parent lane: {}\".format(\n p, out_process.parent_lane))\n\n # If the current connection is a fork, add it to the fork tree\n if in_lane != out_lane:\n logger.debug(\"[{}] Connection is a fork. Adding lanes to \"\n \"fork list\".format(p))\n self._fork_tree[in_lane].append(out_lane)\n # Update main output fork of parent process\n try:\n parent_process = [\n x for x in self.processes if x.lane == in_lane and\n x.template == p_in_name\n ][0]\n logger.debug(\n \"[{}] Updating main forks of parent fork '{}' with\"\n \" '{}'\".format(p, parent_process,\n out_process.input_channel))\n parent_process.update_main_forks(out_process.input_channel)\n except IndexError:\n pass\n else:\n # Get parent process, naive version\n parent_process = self.processes[-1]\n\n # Check if the last process' lane matches the lane of the\n # current output process. If not, get the last process\n # in the same lane\n if parent_process.lane and parent_process.lane != out_lane:\n parent_process = [x for x in self.processes[::-1]\n if x.lane == out_lane][0]\n\n if parent_process.output_channel:\n logger.debug(\n \"[{}] Updating input channel of output process\"\n \" with '{}'\".format(\n p, parent_process.output_channel))\n out_process.input_channel = parent_process.output_channel\n\n # Check for process dependencies\n if out_process.dependencies and not ignore_dependencies:\n logger.debug(\"[{}] Dependencies found for process '{}': \"\n \"{}\".format(p, p_out_name,\n out_process.dependencies))\n parent_lanes = self._get_fork_tree(out_lane)\n for dep in out_process.dependencies:\n if not self._search_tree_backwards(dep, parent_lanes):\n if auto_dependency:\n self._add_dependency(\n out_process, dep, in_lane, out_lane, p)\n elif not self.export_parameters:\n logger.error(colored_print(\n \"\\nThe following dependency of the process\"\n \" '{}' is missing: {}\".format(p_out_name, dep),\n \"red_bold\"))\n sys.exit(1)\n\n self.processes.append(out_process)\n\n logger.debug(\"Completed connections: {}\".format(self.processes))\n logger.debug(\"Fork tree: {}\".format(self._fork_tree))"} {"SOURCE": "codesearchnet", "instruction": "Explain what the following Python 3 code does\ndef _get_process_names(self, con, pid):\n\n try:\n _p_in_name = con[\"input\"][\"process\"]\n p_in_name, _ = self._parse_process_name(_p_in_name)\n logger.debug(\"[{}] Input channel: {}\".format(pid, p_in_name))\n _p_out_name = con[\"output\"][\"process\"]\n p_out_name, out_directives = self._parse_process_name(\n _p_out_name)\n logger.debug(\"[{}] Output channel: {}\".format(pid, p_out_name))\n # Exception is triggered when the process name/directives cannot\n # be processes.\n except eh.ProcessError as ex:\n logger.error(colored_print(ex.value, \"red_bold\"))\n sys.exit(1)\n\n return p_in_name, p_out_name, out_directives", "response": "Returns the input output process names and output process directives from the input process and output process."} {"SOURCE": "codesearchnet", "instruction": "Can you generate a brief explanation for the following Python 3 code\ndef _add_dependency(self, p, template, inlane, outlane, pid):\n\n dependency_proc = self.process_map[template](template=template)\n\n if dependency_proc.input_type != p.input_type:\n logger.error(\"Cannot automatically add dependency with different\"\n \" input type. Input type of process '{}' is '{}.\"\n \" Input type of dependency '{}' is '{}'\".format(\n p.template, p.input_type, template,\n dependency_proc.input_type))\n\n input_suf = \"{}_{}_dep\".format(inlane, pid)\n output_suf = \"{}_{}_dep\".format(outlane, pid)\n dependency_proc.set_main_channel_names(input_suf, output_suf, outlane)\n\n # To insert the dependency process before the current process, we'll\n # need to move the input channel name of the later to the former, and\n # set a new connection between the dependency and the process.\n dependency_proc.input_channel = p.input_channel\n p.input_channel = dependency_proc.output_channel\n\n # If the current process was the first in the pipeline, change the\n # lanes so that the dependency becomes the first process\n if not p.parent_lane:\n p.parent_lane = outlane\n dependency_proc.parent_lane = None\n else:\n dependency_proc.parent_lane = inlane\n p.parent_lane = outlane\n\n self.processes.append(dependency_proc)", "response": "Automatically adds a dependency of a process."} {"SOURCE": "codesearchnet", "instruction": "Create a Python 3 function to\nsearch the process tree backwards in search of a provided process template attribute and parent lanes.", "response": "def _search_tree_backwards(self, template, parent_lanes):\n \"\"\"Searches the process tree backwards in search of a provided process\n\n The search takes into consideration the provided parent lanes and\n searches only those\n\n Parameters\n ----------\n template : str\n Name of the process template attribute being searched\n parent_lanes : list\n List of integers with the parent lanes to be searched\n\n Returns\n -------\n bool\n Returns True when the template is found. Otherwise returns False.\n \"\"\"\n\n for p in self.processes[::-1]:\n\n # Ignore process in different lanes\n if p.lane not in parent_lanes:\n continue\n\n # template found\n if p.template == template:\n return True\n\n return False"} {"SOURCE": "codesearchnet", "instruction": "How would you explain what the following Python 3 function does\ndef _build_header(self):\n\n logger.debug(\"===============\")\n logger.debug(\"Building header\")\n logger.debug(\"===============\")\n self.template += hs.header", "response": "Builds the master header string"} {"SOURCE": "codesearchnet", "instruction": "Can you create a Python 3 function that\nadds the footer template to the master template string", "response": "def _build_footer(self):\n \"\"\"Adds the footer template to the master template string\"\"\"\n\n logger.debug(\"===============\")\n logger.debug(\"Building header\")\n logger.debug(\"===============\")\n self.template += fs.footer"} {"SOURCE": "codesearchnet", "instruction": "Can you write a function in Python 3 where it\ngives a process, this method updates the :attr:`~Process.main_raw_inputs` attribute with the corresponding raw input channel of that process. The input channel and input type can be overridden if the `input_channel` and `input_type` arguments are provided. Parameters ---------- p : flowcraft.Process.Process Process instance whose raw input will be modified sink_channel: str Sets the channel where the raw input will fork into. It overrides the process's `input_channel` attribute. input_type: str Sets the type of the raw input. It overrides the process's `input_type` attribute.", "response": "def _update_raw_input(self, p, sink_channel=None, input_type=None):\n \"\"\"Given a process, this method updates the\n :attr:`~Process.main_raw_inputs` attribute with the corresponding\n raw input channel of that process. The input channel and input type\n can be overridden if the `input_channel` and `input_type` arguments\n are provided.\n\n Parameters\n ----------\n p : flowcraft.Process.Process\n Process instance whose raw input will be modified\n sink_channel: str\n Sets the channel where the raw input will fork into. It overrides\n the process's `input_channel` attribute.\n input_type: str\n Sets the type of the raw input. It overrides the process's\n `input_type` attribute.\n \"\"\"\n\n process_input = input_type if input_type else p.input_type\n process_channel = sink_channel if sink_channel else p.input_channel\n\n logger.debug(\"[{}] Setting raw input channel \"\n \"with input type '{}'\".format(p.template, process_input))\n # Get the dictionary with the raw forking information for the\n # provided input\n raw_in = p.get_user_channel(process_channel, process_input)\n logger.debug(\"[{}] Fetched process raw user: {}\".format(p.template,\n raw_in))\n\n if process_input in self.main_raw_inputs:\n self.main_raw_inputs[process_input][\"raw_forks\"].append(\n raw_in[\"input_channel\"])\n else:\n self.main_raw_inputs[process_input] = {\n \"channel\": raw_in[\"channel\"],\n \"channel_str\": \"{}\\n{} = {}\".format(\n raw_in[\"checks\"].format(raw_in[\"params\"]),\n raw_in[\"channel\"],\n raw_in[\"channel_str\"].format(raw_in[\"params\"])),\n \"raw_forks\": [raw_in[\"input_channel\"]]\n }\n logger.debug(\"[{}] Updated main raw inputs: {}\".format(\n p.template, self.main_raw_inputs))"} {"SOURCE": "codesearchnet", "instruction": "How would you explain what the following Python 3 function does\ndef _update_extra_inputs(self, p):\n\n if p.extra_input:\n logger.debug(\"[{}] Found extra input: {}\".format(\n p.template, p.extra_input))\n\n if p.extra_input == \"default\":\n # Check if the default type is now present in the main raw\n # inputs. If so, issue an error. The default param can only\n # be used when not present in the main raw inputs\n if p.input_type in self.main_raw_inputs:\n logger.error(colored_print(\n \"\\nThe default input param '{}' of the process '{}'\"\n \" is already specified as a main input parameter of\"\n \" the pipeline. Please choose a different extra_input\"\n \" name.\".format(p.input_type, p.template), \"red_bold\"))\n sys.exit(1)\n param = p.input_type\n else:\n param = p.extra_input\n\n dest_channel = \"EXTRA_{}_{}\".format(p.template, p.pid)\n\n if param not in self.extra_inputs:\n self.extra_inputs[param] = {\n \"input_type\": p.input_type,\n \"channels\": [dest_channel]\n }\n else:\n if self.extra_inputs[param][\"input_type\"] != p.input_type:\n logger.error(colored_print(\n \"\\nThe extra_input parameter '{}' for process\"\n \" '{}' was already defined with a different \"\n \"input type '{}'. Please choose a different \"\n \"extra_input name.\".format(\n p.input_type, p.template,\n self.extra_inputs[param][\"input_type\"]),\n \"red_bold\"))\n sys.exit(1)\n self.extra_inputs[param][\"channels\"].append(dest_channel)\n\n logger.debug(\"[{}] Added extra channel '{}' linked to param: '{}' \"\n \"\".format(p.template, param,\n self.extra_inputs[param]))\n p.update_main_input(\n \"{}.mix({})\".format(p.input_channel, dest_channel)\n )", "response": "This method updates the process. extra_inputs attribute with the corresponding extra_input inputs of that process."} {"SOURCE": "codesearchnet", "instruction": "Create a Python 3 function to\ngive a process, this method updates the :attr:`~Process.secondary_channels` attribute with the corresponding secondary inputs of that channel. The rationale of the secondary channels is the following: - Start storing any secondary emitting channels, by checking the `link_start` list attribute of each process. If there are channel names in the link start, it adds to the secondary channels dictionary. - Check for secondary receiving channels, by checking the `link_end` list attribute. If the link name starts with a `__` signature, it will created an implicit link with the last process with an output type after the signature. Otherwise, it will check is a corresponding link start already exists in the at least one process upstream of the pipeline and if so, it will update the ``secondary_channels`` attribute with the new link. Parameters ---------- p : flowcraft.Process.Process", "response": "def _update_secondary_channels(self, p):\n \"\"\"Given a process, this method updates the\n :attr:`~Process.secondary_channels` attribute with the corresponding\n secondary inputs of that channel.\n\n The rationale of the secondary channels is the following:\n\n - Start storing any secondary emitting channels, by checking the\n `link_start` list attribute of each process. If there are\n channel names in the link start, it adds to the secondary\n channels dictionary.\n - Check for secondary receiving channels, by checking the\n `link_end` list attribute. If the link name starts with a\n `__` signature, it will created an implicit link with the last\n process with an output type after the signature. Otherwise,\n it will check is a corresponding link start already exists in\n the at least one process upstream of the pipeline and if so,\n it will update the ``secondary_channels`` attribute with the\n new link.\n\n Parameters\n ----------\n p : flowcraft.Process.Process\n \"\"\"\n\n # Check if the current process has a start of a secondary\n # side channel\n if p.link_start:\n logger.debug(\"[{}] Found secondary link start: {}\".format(\n p.template, p.link_start))\n for l in p.link_start:\n # If there are multiple link starts in the same lane, the\n # last one is the only one saved.\n if l in self.secondary_channels:\n self.secondary_channels[l][p.lane] = {\"p\": p, \"end\": []}\n else:\n self.secondary_channels[l] = {p.lane: {\"p\": p, \"end\": []}}\n\n # check if the current process receives a secondary side channel.\n # If so, add to the links list of that side channel\n if p.link_end:\n logger.debug(\"[{}] Found secondary link end: {}\".format(\n p.template, p.link_end))\n for l in p.link_end:\n\n # Get list of lanes from the parent forks.\n parent_forks = self._get_fork_tree(p.lane)\n\n # Parse special case where the secondary channel links with\n # the main output of the specified type\n if l[\"link\"].startswith(\"__\"):\n self._set_implicit_link(p, l)\n continue\n\n # Skip if there is no match for the current link in the\n # secondary channels\n if l[\"link\"] not in self.secondary_channels:\n continue\n\n for lane in parent_forks:\n if lane in self.secondary_channels[l[\"link\"]]:\n self.secondary_channels[\n l[\"link\"]][lane][\"end\"].append(\"{}\".format(\n \"{}_{}\".format(l[\"alias\"], p.pid)))\n\n logger.debug(\"[{}] Secondary links updated: {}\".format(\n p.template, self.secondary_channels))"} {"SOURCE": "codesearchnet", "instruction": "Implement a Python 3 function for\nsetting the main channels for the pipeline and updates the raw input and output forks.", "response": "def _set_channels(self):\n \"\"\"Sets the main channels for the pipeline\n\n This method will parse de the :attr:`~Process.processes` attribute\n and perform the following tasks for each process:\n\n - Sets the input/output channels and main input forks and adds\n them to the process's\n :attr:`flowcraft.process.Process._context`\n attribute (See\n :func:`~NextflowGenerator.set_channels`).\n - Automatically updates the main input channel of the first\n process of each lane so that they fork from the user provide\n parameters (See\n :func:`~NextflowGenerator._update_raw_input`).\n - Check for the presence of secondary channels and adds them to the\n :attr:`~NextflowGenerator.secondary_channels` attribute.\n\n Notes\n -----\n **On the secondary channel setup**: With this approach, there can only\n be one secondary link start for each type of secondary link. For\n instance, If there are two processes that start a secondary channel\n for the ``SIDE_max_len`` channel, only the last one will be recorded,\n and all receiving processes will get the channel from the latest\n process. Secondary channels can only link if the source process if\n downstream of the sink process in its \"forking\" path.\n \"\"\"\n\n logger.debug(\"=====================\")\n logger.debug(\"Setting main channels\")\n logger.debug(\"=====================\")\n\n for i, p in enumerate(self.processes):\n\n # Set main channels for the process\n logger.debug(\"[{}] Setting main channels with pid: {}\".format(\n p.template, i))\n p.set_channels(pid=i)\n\n # If there is no parent lane, set the raw input channel from user\n logger.debug(\"{} {} {}\".format(p.parent_lane, p.input_type, p.template))\n if not p.parent_lane and p.input_type:\n self._update_raw_input(p)\n\n self._update_extra_inputs(p)\n\n self._update_secondary_channels(p)\n\n logger.info(colored_print(\n \"\\tChannels set for {} \\u2713\".format(p.template)))"} {"SOURCE": "codesearchnet", "instruction": "Can you generate a brief explanation for the following Python 3 code\ndef _set_init_process(self):\n\n logger.debug(\"========================\")\n logger.debug(\"Setting secondary inputs\")\n logger.debug(\"========================\")\n\n # Get init process\n init_process = self.processes[0]\n logger.debug(\"Setting main raw inputs: \"\n \"{}\".format(self.main_raw_inputs))\n init_process.set_raw_inputs(self.main_raw_inputs)\n logger.debug(\"Setting extra inputs: {}\".format(self.extra_inputs))\n init_process.set_extra_inputs(self.extra_inputs)", "response": "Sets main raw inputs and secondary inputs on the init process for the current locale."} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 script to\nset the secondary channels for the pipeline.", "response": "def _set_secondary_channels(self):\n \"\"\"Sets the secondary channels for the pipeline\n\n This will iterate over the\n :py:attr:`NextflowGenerator.secondary_channels` dictionary that is\n populated when executing\n :func:`~NextflowGenerator._update_secondary_channels` method.\n \"\"\"\n\n logger.debug(\"==========================\")\n logger.debug(\"Setting secondary channels\")\n logger.debug(\"==========================\")\n\n logger.debug(\"Setting secondary channels: {}\".format(\n self.secondary_channels))\n\n for source, lanes in self.secondary_channels.items():\n\n for vals in lanes.values():\n\n if not vals[\"end\"]:\n logger.debug(\"[{}] No secondary links to setup\".format(\n vals[\"p\"].template))\n continue\n\n logger.debug(\"[{}] Setting secondary links for \"\n \"source {}: {}\".format(vals[\"p\"].template,\n source,\n vals[\"end\"]))\n\n vals[\"p\"].set_secondary_channel(source, vals[\"end\"])"} {"SOURCE": "codesearchnet", "instruction": "Can you implement a function in Python 3 that\nadds general compilers to the pipeline.", "response": "def _set_general_compilers(self):\n \"\"\"Adds compiler channels to the :attr:`processes` attribute.\n\n This method will iterate over the pipeline's processes and check\n if any process is feeding channels to a compiler process. If so, that\n compiler process is added to the pipeline and those channels are\n linked to the compiler via some operator.\n \"\"\"\n\n for c, c_info in self.compilers.items():\n\n # Instantiate compiler class object and set empty channel list\n compiler_cls = c_info[\"cls\"](template=c_info[\"template\"])\n c_info[\"channels\"] = []\n\n for p in self.processes:\n if not any([isinstance(p, x) for x in self.skip_class]):\n # Check if process has channels to feed to a compiler\n if c in p.compiler:\n # Correct channel names according to the pid of the\n # process\n channels = [\"{}_{}\".format(i, p.pid) for i in\n p.compiler[c]]\n c_info[\"channels\"].extend(channels)\n\n # If one ore more channels were detected, establish connections\n # and append compiler to the process list.\n if c_info[\"channels\"]:\n compiler_cls.set_compiler_channels(c_info[\"channels\"],\n operator=\"join\")\n self.processes.append(compiler_cls)"} {"SOURCE": "codesearchnet", "instruction": "Given the following Python 3 function, write the documentation\ndef _set_status_channels(self):\n\n status_inst = pc.StatusCompiler(template=\"status_compiler\")\n report_inst = pc.ReportCompiler(template=\"report_compiler\")\n\n # Compile status channels from pipeline process\n status_channels = []\n for p in [p for p in self.processes]:\n if not any([isinstance(p, x) for x in self.skip_class]):\n\n status_channels.extend(p.status_strs)\n\n if not status_channels:\n logger.debug(\"No status channels found. Skipping status compiler\"\n \"process\")\n return\n\n logger.debug(\"Setting status channels: {}\".format(status_channels))\n\n # Check for duplicate channels. Raise exception if found.\n if len(status_channels) != len(set(status_channels)):\n raise eh.ProcessError(\n \"Duplicate status channels detected. Please ensure that \"\n \"the 'status_channels' attributes of each process are \"\n \"unique. Here are the status channels:\\n\\n{}\".format(\n \", \".join(status_channels)\n ))\n\n status_inst.set_compiler_channels(status_channels)\n\n report_channels = [\"REPORT_{}\".format(x.lstrip(\"STATUS_\")) for x in\n status_channels]\n\n report_inst.set_compiler_channels(report_channels)\n\n self.processes.extend([status_inst, report_inst])", "response": "Sets the status channels for the status compiler process and reports process."} {"SOURCE": "codesearchnet", "instruction": "Here you have a function in Python 3, explain what it does\ndef _get_resources_string(res_dict, pid):\n\n config_str = \"\"\n ignore_directives = [\"container\", \"version\"]\n\n for p, directives in res_dict.items():\n\n for d, val in directives.items():\n\n if d in ignore_directives:\n continue\n\n config_str += '\\n\\t${}_{}.{} = {}'.format(p, pid, d, val)\n\n return config_str", "response": "Returns the nextflow resources string from a dictionary with the resources for processes and the process ID."} {"SOURCE": "codesearchnet", "instruction": "Make a summary of the following Python 3 code\ndef _get_container_string(cont_dict, pid):\n\n config_str = \"\"\n\n for p, directives in cont_dict.items():\n\n container = \"\"\n\n if \"container\" in directives:\n container += directives[\"container\"]\n\n if \"version\" in directives:\n container += \":{}\".format(directives[\"version\"])\n else:\n container += \":latest\"\n\n if container:\n config_str += '\\n\\t${}_{}.container = \"{}\"'.format(p, pid, container)\n\n return config_str", "response": "Returns the nextflow containers string from a dictionary object containing the process names and the container directives."} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 function that can\nreturn the nextflow params string from a dictionary object.", "response": "def _get_params_string(self):\n \"\"\"Returns the nextflow params string from a dictionary object.\n\n The params dict should be a set of key:value pairs with the\n parameter name, and the default parameter value::\n\n self.params = {\n \"genomeSize\": 2.1,\n \"minCoverage\": 15\n }\n\n The values are then added to the string as they are. For instance,\n a ``2.1`` float will appear as ``param = 2.1`` and a\n ``\"'teste'\" string will appear as ``param = 'teste'`` (Note the\n string).\n\n Returns\n -------\n str\n Nextflow params configuration string\n \"\"\"\n\n params_str = \"\"\n\n for p in self.processes:\n\n logger.debug(\"[{}] Adding parameters: {}\\n\".format(\n p.template, p.params)\n )\n\n # Add an header with the template name to structure the params\n # configuration\n if p.params and p.template != \"init\":\n\n p.set_param_id(\"_{}\".format(p.pid))\n params_str += \"\\n\\t/*\"\n params_str += \"\\n\\tComponent '{}_{}'\\n\".format(p.template,\n p.pid)\n params_str += \"\\t{}\\n\".format(\"-\" * (len(p.template) + len(p.pid) + 12))\n params_str += \"\\t*/\\n\"\n\n for param, val in p.params.items():\n\n if p.template == \"init\":\n param_id = param\n else:\n param_id = \"{}_{}\".format(param, p.pid)\n\n params_str += \"\\t{} = {}\\n\".format(param_id, val[\"default\"])\n\n return params_str"} {"SOURCE": "codesearchnet", "instruction": "Can you create a Python 3 function that\nreturns the merged nextflow params string from a dictionary object.", "response": "def _get_merged_params_string(self):\n \"\"\"Returns the merged nextflow params string from a dictionary object.\n\n The params dict should be a set of key:value pairs with the\n parameter name, and the default parameter value::\n\n self.params = {\n \"genomeSize\": 2.1,\n \"minCoverage\": 15\n }\n\n The values are then added to the string as they are. For instance,\n a ``2.1`` float will appear as ``param = 2.1`` and a\n ``\"'teste'\" string will appear as ``param = 'teste'`` (Note the\n string).\n\n Identical parameters in multiple processes will be merged into the same\n param.\n\n Returns\n -------\n str\n Nextflow params configuration string\n \"\"\"\n\n params_temp = {}\n\n for p in self.processes:\n\n logger.debug(\"[{}] Adding parameters: {}\".format(p.template,\n p.params))\n for param, val in p.params.items():\n\n params_temp[param] = val[\"default\"]\n\n config_str = \"\\n\\t\" + \"\\n\\t\".join([\n \"{} = {}\".format(param, val) for param, val in params_temp.items()\n ])\n\n return config_str"} {"SOURCE": "codesearchnet", "instruction": "How would you explain what the following Python 3 function does\ndef _get_manifest_string(self):\n\n config_str = \"\"\n\n config_str += '\\n\\tname = \"{}\"'.format(self.pipeline_name)\n config_str += '\\n\\tmainScript = \"{}\"'.format(self.nf_file)\n\n return config_str", "response": "Returns the nextflow manifest config string to include in the\n Nextflow manifest config file from the information on the pipeline."} {"SOURCE": "codesearchnet", "instruction": "Here you have a function in Python 3, explain what it does\ndef _set_configurations(self):\n\n logger.debug(\"======================\")\n logger.debug(\"Setting configurations\")\n logger.debug(\"======================\")\n\n resources = \"\"\n containers = \"\"\n params = \"\"\n manifest = \"\"\n\n if self.merge_params:\n params += self._get_merged_params_string()\n help_list = self._get_merged_params_help()\n else:\n params += self._get_params_string()\n help_list = self._get_params_help()\n\n for p in self.processes:\n\n # Skip processes with the directives attribute populated\n if not p.directives:\n continue\n\n logger.debug(\"[{}] Adding directives: {}\".format(\n p.template, p.directives))\n resources += self._get_resources_string(p.directives, p.pid)\n containers += self._get_container_string(p.directives, p.pid)\n\n manifest = self._get_manifest_string()\n\n self.resources = self._render_config(\"resources.config\", {\n \"process_info\": resources\n })\n self.containers = self._render_config(\"containers.config\", {\n \"container_info\": containers\n })\n self.params = self._render_config(\"params.config\", {\n \"params_info\": params\n })\n self.manifest = self._render_config(\"manifest.config\", {\n \"manifest_info\": manifest\n })\n self.help = self._render_config(\"Helper.groovy\", {\n \"nf_file\": basename(self.nf_file),\n \"help_list\": help_list,\n \"version\": __version__,\n \"pipeline_name\": \" \".join([x.upper() for x in self.pipeline_name])\n })\n self.user_config = self._render_config(\"user.config\", {})", "response": "This method will iterate over all processes and populate the nextflow configuration files with the directives that are set in the pipeline and the process configuration files with the parameters and the help list."} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 function that can\nwrite dag to file", "response": "def dag_to_file(self, dict_viz, output_file=\".treeDag.json\"):\n \"\"\"Writes dag to output file\n\n Parameters\n ----------\n dict_viz: dict\n Tree like dictionary that is used to export tree data of processes\n to html file and here for the dotfile .treeDag.json\n\n \"\"\"\n\n outfile_dag = open(os.path.join(dirname(self.nf_file), output_file)\n , \"w\")\n outfile_dag.write(json.dumps(dict_viz))\n outfile_dag.close()"} {"SOURCE": "codesearchnet", "instruction": "Create a Python 3 function for\nwriting the pipeline attributes to json file and a graphical output showing the DAG.", "response": "def render_pipeline(self):\n \"\"\"Write pipeline attributes to json\n\n This function writes the pipeline and their attributes to a json file,\n that is intended to be read by resources/pipeline_graph.html to render\n a graphical output showing the DAG.\n\n \"\"\"\n\n dict_viz = {\n \"name\": \"root\",\n \"children\": []\n }\n last_of_us = {}\n\n f_tree = self._fork_tree if self._fork_tree else {1: [1]}\n\n for x, (k, v) in enumerate(f_tree.items()):\n for p in self.processes[1:]:\n\n if x == 0 and p.lane not in [k] + v:\n continue\n\n if x > 0 and p.lane not in v:\n continue\n\n if not p.parent_lane:\n lst = dict_viz[\"children\"]\n else:\n lst = last_of_us[p.parent_lane]\n\n tooltip = {\n \"name\": \"{}_{}\".format(p.template, p.pid),\n \"process\": {\n \"pid\": p.pid,\n \"input\": p.input_type,\n \"output\": p.output_type if p.output_type else \"None\",\n \"lane\": p.lane,\n },\n \"children\": []\n }\n\n dir_var = \"\"\n for k2, v2 in p.directives.items():\n dir_var += k2\n for d in v2:\n try:\n # Remove quotes from string directives\n directive = v2[d].replace(\"'\", \"\").replace('\"', '') \\\n if isinstance(v2[d], str) else v2[d]\n dir_var += \"{}: {}\".format(d, directive)\n except KeyError:\n pass\n\n if dir_var:\n tooltip[\"process\"][\"directives\"] = dir_var\n else:\n tooltip[\"process\"][\"directives\"] = \"N/A\"\n\n lst.append(tooltip)\n\n last_of_us[p.lane] = lst[-1][\"children\"]\n\n # write to file dict_viz\n self.dag_to_file(dict_viz)\n\n # Write tree forking information for dotfile\n with open(os.path.join(dirname(self.nf_file),\n \".forkTree.json\"), \"w\") as fh:\n fh.write(json.dumps(self._fork_tree))\n\n # send with jinja to html resource\n return self._render_config(\"pipeline_graph.html\", {\"data\": dict_viz})"} {"SOURCE": "codesearchnet", "instruction": "Can you tell what is the following Python 3 function doing\ndef write_configs(self, project_root):\n\n # Write resources config\n with open(join(project_root, \"resources.config\"), \"w\") as fh:\n fh.write(self.resources)\n\n # Write containers config\n with open(join(project_root, \"containers.config\"), \"w\") as fh:\n fh.write(self.containers)\n\n # Write containers config\n with open(join(project_root, \"params.config\"), \"w\") as fh:\n fh.write(self.params)\n\n # Write manifest config\n with open(join(project_root, \"manifest.config\"), \"w\") as fh:\n fh.write(self.manifest)\n\n # Write user config if not present in the project directory\n if not exists(join(project_root, \"user.config\")):\n with open(join(project_root, \"user.config\"), \"w\") as fh:\n fh.write(self.user_config)\n\n lib_dir = join(project_root, \"lib\")\n if not exists(lib_dir):\n os.makedirs(lib_dir)\n with open(join(lib_dir, \"Helper.groovy\"), \"w\") as fh:\n fh.write(self.help)\n\n # Generate the pipeline DAG\n pipeline_to_json = self.render_pipeline()\n with open(splitext(self.nf_file)[0] + \".html\", \"w\") as fh:\n fh.write(pipeline_to_json)", "response": "Wrapper method that writes all configuration files to the pipeline\n directory"} {"SOURCE": "codesearchnet", "instruction": "Here you have a function in Python 3, explain what it does\ndef export_params(self):\n\n params_json = {}\n\n # Skip first init process\n for p in self.processes[1:]:\n params_json[p.template] = p.params\n\n # Flush params json to stdout\n sys.stdout.write(json.dumps(params_json))", "response": "Export the pipeline params as a JSON to stdout"} {"SOURCE": "codesearchnet", "instruction": "Can you generate the documentation for the following Python 3 function\ndef export_directives(self):\n\n directives_json = {}\n\n # Skip first init process\n for p in self.processes[1:]:\n directives_json[p.template] = p.directives\n\n # Flush params json to stdout\n sys.stdout.write(json.dumps(directives_json))", "response": "Export pipeline directives as a JSON to stdout\n "} {"SOURCE": "codesearchnet", "instruction": "Can you tell what is the following Python 3 function doing\ndef fetch_docker_tags(self):\n\n # dict to store the already parsed components (useful when forks are\n # given to the pipeline string via -t flag\n dict_of_parsed = {}\n\n # fetches terminal width and subtracts 3 because we always add a\n # new line character and we want a space at the beggining and at the end\n # of each line\n terminal_width = shutil.get_terminal_size().columns - 3\n\n # first header\n center_string = \" Selected container tags \"\n\n # starts a list with the headers\n tags_list = [\n [\n \"=\" * int(terminal_width / 4),\n \"{0}{1}{0}\".format(\n \"=\" * int(((terminal_width/2 - len(center_string)) / 2)),\n center_string)\n ,\n \"{}\\n\".format(\"=\" * int(terminal_width / 4))\n ],\n [\"component\", \"container\", \"tags\"],\n [\n \"=\" * int(terminal_width / 4),\n \"=\" * int(terminal_width / 2),\n \"=\" * int(terminal_width / 4)\n ]\n ]\n\n # Skip first init process and iterate through the others\n for p in self.processes[1:]:\n template = p.template\n # if component has already been printed then skip and don't print\n # again\n if template in dict_of_parsed:\n continue\n\n # starts a list of containers for the current process in\n # dict_of_parsed, in which each containers will be added to this\n # list once it gets parsed\n dict_of_parsed[template] = {\n \"container\": []\n }\n\n # fetch repo name from directives of each component.\n for directives in p.directives.values():\n try:\n repo = directives[\"container\"]\n default_version = directives[\"version\"]\n except KeyError:\n # adds the default container if container key isn't present\n # this happens for instance in integrity_coverage\n repo = \"flowcraft/flowcraft_base\"\n default_version = \"1.0.0-1\"\n # checks if repo_version already exists in list of the\n # containers for the current component being queried\n repo_version = repo + default_version\n if repo_version not in dict_of_parsed[template][\"container\"]:\n # make the request to docker hub\n r = requests.get(\n \"https://hub.docker.com/v2/repositories/{}/tags/\"\n .format(repo)\n )\n # checks the status code of the request, if it is 200 then\n # parses docker hub entry, otherwise retrieve no tags but\n # alerts the user\n if r.status_code != 404:\n # parse response content to dict and fetch results key\n r_content = json.loads(r.content)[\"results\"]\n for version in r_content:\n printed_version = (version[\"name\"] + \"*\") \\\n if version[\"name\"] == default_version \\\n else version[\"name\"]\n tags_list.append([template, repo, printed_version])\n else:\n tags_list.append([template, repo, \"No DockerHub tags\"])\n\n dict_of_parsed[template][\"container\"].append(repo_version)\n\n # iterate through each entry in tags_list and print the list of tags\n # for each component. Each entry (excluding the headers) contains\n # 3 elements (component name, container and tag version)\n for x, entry in enumerate(tags_list):\n # adds different color to the header in the first list and\n # if row is pair add one color and if is even add another (different\n # background)\n color = \"blue_bold\" if x < 3 else \\\n (\"white\" if x % 2 != 0 else \"0;37;40m\")\n # generates a small list with the terminal width for each column,\n # this will be given to string formatting as the 3, 4 and 5 element\n final_width = [\n int(terminal_width/4),\n int(terminal_width/2),\n int(terminal_width/4)\n ]\n # writes the string to the stdout\n sys.stdout.write(\n colored_print(\"\\n {0: <{3}} {1: ^{4}} {2: >{5}}\".format(\n *entry, *final_width), color)\n )\n # assures that the entire line gets the same color\n sys.stdout.write(\"\\n{0: >{1}}\\n\".format(\"(* = default)\",\n terminal_width + 3))", "response": "Returns a dict of all dockerhub tags associated with each component given by the - t flag."} {"SOURCE": "codesearchnet", "instruction": "Given the following Python 3 function, write the documentation\ndef build(self):\n\n logger.info(colored_print(\n \"\\tSuccessfully connected {} process(es) with {} \"\n \"fork(s) across {} lane(s) \\u2713\".format(\n len(self.processes[1:]), len(self._fork_tree), self.lanes)))\n\n # Generate regular nextflow header that sets up the shebang, imports\n # and all possible initial channels\n self._build_header()\n\n self._set_channels()\n\n self._set_init_process()\n\n self._set_secondary_channels()\n\n logger.info(colored_print(\n \"\\tSuccessfully set {} secondary channel(s) \\u2713\".format(\n len(self.secondary_channels))))\n\n self._set_compiler_channels()\n\n self._set_configurations()\n\n logger.info(colored_print(\n \"\\tFinished configurations \\u2713\"))\n\n for p in self.processes:\n self.template += \"\\n{}\".format(p.template_str)\n\n self._build_footer()\n\n project_root = dirname(self.nf_file)\n\n # Write configs\n self.write_configs(project_root)\n\n # Write pipeline file\n with open(self.nf_file, \"w\") as fh:\n fh.write(self.template)\n\n logger.info(colored_print(\n \"\\tPipeline written into {} \\u2713\".format(self.nf_file)))", "response": "This method builds the main pipeline and writes the code to the nextflow file."} {"SOURCE": "codesearchnet", "instruction": "Implement a function in Python 3 to\nset the k - mer range of the current sample.", "response": "def set_kmers(kmer_opt, max_read_len):\n \"\"\"Returns a kmer list based on the provided kmer option and max read len.\n\n Parameters\n ----------\n kmer_opt : str\n The k-mer option. Can be either ``'auto'``, ``'default'`` or a\n sequence of space separated integers, ``'23, 45, 67'``.\n max_read_len : int\n The maximum read length of the current sample.\n\n Returns\n -------\n kmers : list\n List of k-mer values that will be provided to Spades.\n\n \"\"\"\n\n logger.debug(\"Kmer option set to: {}\".format(kmer_opt))\n\n # Check if kmer option is set to auto\n if kmer_opt == \"auto\":\n\n if max_read_len >= 175:\n kmers = [55, 77, 99, 113, 127]\n else:\n kmers = [21, 33, 55, 67, 77]\n\n logger.debug(\"Kmer range automatically selected based on max read\"\n \"length of {}: {}\".format(max_read_len, kmers))\n\n # Check if manual kmers were specified\n elif len(kmer_opt.split()) > 1:\n\n kmers = kmer_opt.split()\n logger.debug(\"Kmer range manually set to: {}\".format(kmers))\n\n else:\n\n kmers = []\n logger.debug(\"Kmer range set to empty (will be automatically \"\n \"determined by SPAdes\")\n\n return kmers"} {"SOURCE": "codesearchnet", "instruction": "How would you explain what the following Python 3 function does\ndef main(sample_id, fastq_pair, max_len, kmer, clear):\n\n logger.info(\"Starting spades\")\n\n logger.info(\"Setting SPAdes kmers\")\n kmers = set_kmers(kmer, max_len)\n logger.info(\"SPAdes kmers set to: {}\".format(kmers))\n\n cli = [\n \"metaspades.py\",\n \"--only-assembler\",\n \"--threads\",\n \"$task.cpus\",\n \"-o\",\n \".\"\n ]\n\n # Add kmers, if any were specified\n if kmers:\n cli += [\"-k {}\".format(\",\".join([str(x) for x in kmers]))]\n\n # Add FastQ files\n cli += [\n \"-1\",\n fastq_pair[0],\n \"-2\",\n fastq_pair[1]\n ]\n\n logger.debug(\"Running metaSPAdes subprocess with command: {}\".format(cli))\n\n p = subprocess.Popen(cli, stdout=PIPE, stderr=PIPE)\n stdout, stderr = p.communicate()\n\n # Attempt to decode STDERR output from bytes. If unsuccessful, coerce to\n # string\n try:\n stderr = stderr.decode(\"utf8\")\n stdout = stdout.decode(\"utf8\")\n except (UnicodeDecodeError, AttributeError):\n stderr = str(stderr)\n stdout = str(stdout)\n\n logger.info(\"Finished metaSPAdes subprocess with STDOUT:\\\\n\"\n \"======================================\\\\n{}\".format(stdout))\n logger.info(\"Fished metaSPAdes subprocesswith STDERR:\\\\n\"\n \"======================================\\\\n{}\".format(stderr))\n logger.info(\"Finished metaSPAdes with return code: {}\".format(\n p.returncode))\n\n with open(\".status\", \"w\") as fh:\n if p.returncode != 0:\n fh.write(\"error\")\n return\n else:\n fh.write(\"pass\")\n\n # Change the default contigs.fasta assembly name to a more informative one\n if \"_trim.\" in fastq_pair[0]:\n sample_id += \"_trim\"\n\n assembly_file = \"{}_metaspades.fasta\".format(\n sample_id)\n os.rename(\"contigs.fasta\", assembly_file)\n logger.info(\"Setting main assembly file to: {}\".format(assembly_file))\n\n # Remove input fastq files when clear option is specified.\n # Only remove temporary input when the expected output exists.\n if clear == \"true\" and os.path.exists(assembly_file):\n clean_up(fastq_pair)", "response": "Main function of the metaSPAdes script."} {"SOURCE": "codesearchnet", "instruction": "Given the following Python 3 function, write the documentation\ndef _get_report_id(self):\n\n if self.watch:\n\n # Searches for the first occurence of the nextflow pipeline\n # file name in the .nextflow.log file\n pipeline_path = get_nextflow_filepath(self.log_file)\n\n # Get hash from the entire pipeline file\n pipeline_hash = hashlib.md5()\n with open(pipeline_path, \"rb\") as fh:\n for chunk in iter(lambda: fh.read(4096), b\"\"):\n pipeline_hash.update(chunk)\n # Get hash from the current working dir and hostname\n workdir = os.getcwd().encode(\"utf8\")\n hostname = socket.gethostname().encode(\"utf8\")\n hardware_addr = str(uuid.getnode()).encode(\"utf8\")\n dir_hash = hashlib.md5(workdir + hostname + hardware_addr)\n\n return pipeline_hash.hexdigest() + dir_hash.hexdigest()\n\n else:\n with open(self.report_file) as fh:\n report_json = json.loads(fh.read())\n\n metadata = report_json[\"data\"][\"results\"][0][\"nfMetadata\"]\n\n try:\n report_id = metadata[\"scriptId\"] + metadata[\"sessionId\"]\n except KeyError:\n raise eh.ReportError(\"Incomplete or corrupt report JSON file \"\n \"missing the 'scriptId' and/or 'sessionId' \"\n \"metadata information\")\n\n return report_id", "response": "Returns a hash of the reports JSON file containing the scriptId and sessionId."} {"SOURCE": "codesearchnet", "instruction": "Can you implement a function in Python 3 that\nupdates the status of the pipeline.", "response": "def _update_pipeline_status(self):\n \"\"\"\n Parses the .nextflow.log file for signatures of pipeline status and sets\n the :attr:`status_info` attribute.\n \"\"\"\n\n prev_status = self.status_info\n\n with open(self.log_file) as fh:\n\n for line in fh:\n\n if \"Session aborted\" in line:\n self.status_info = \"aborted\"\n self.send = True if prev_status != self.status_info \\\n else self.send\n return\n\n if \"Execution complete -- Goodbye\" in line:\n self.status_info = \"complete\"\n self.send = True if prev_status != self.status_info \\\n else self.send\n return\n\n self.status_info = \"running\"\n self.send = True if prev_status != self.status_info \\\n else self.send"} {"SOURCE": "codesearchnet", "instruction": "Given the following Python 3 function, write the documentation\ndef update_trace_watch(self):\n\n # Check the size stamp of the tracefile. Only proceed with the parsing\n # if it changed from the previous size.\n size_stamp = os.path.getsize(self.trace_file)\n self.trace_retry = 0\n if size_stamp and size_stamp == self.trace_sizestamp:\n return\n else:\n logger.debug(\"Updating trace size stamp to: {}\".format(size_stamp))\n self.trace_sizestamp = size_stamp\n\n with open(self.trace_file) as fh:\n\n # Skip potential empty lines at the start of file\n header = next(fh).strip()\n while not header:\n header = next(fh).strip()\n\n # Get header mappings before parsing the file\n hm = self._header_mapping(header)\n\n for line in fh:\n # Skip empty lines\n if line.strip() == \"\":\n continue\n\n fields = line.strip().split(\"\\t\")\n\n # Skip if task ID was already processes\n if fields[hm[\"task_id\"]] in self.stored_ids:\n continue\n\n if fields[hm[\"process\"]] == \"report\":\n self.report_queue.append(\n self._expand_path(fields[hm[\"hash\"]])\n )\n self.send = True\n\n # Add the processed trace line to the stored ids. It will be\n # skipped in future parsers\n self.stored_ids.append(fields[hm[\"task_id\"]])", "response": "Parses the nextflow trace file and retrieves the path of report JSON\n files that have not been sent to the service yet."} {"SOURCE": "codesearchnet", "instruction": "How would you explain what the following Python 3 function does\ndef update_log_watch(self):\n\n # Check the size stamp of the tracefile. Only proceed with the parsing\n # if it changed from the previous size.\n size_stamp = os.path.getsize(self.log_file)\n self.trace_retry = 0\n if size_stamp and size_stamp == self.log_sizestamp:\n return\n else:\n logger.debug(\"Updating log size stamp to: {}\".format(size_stamp))\n self.log_sizestamp = size_stamp\n\n self._update_pipeline_status()", "response": "Parses nextflow log file and updates the pipeline status"} {"SOURCE": "codesearchnet", "instruction": "How would you explain what the following Python 3 function does\ndef _send_live_report(self, report_id):\n\n # Determines the maximum number of reports sent at the same time in\n # the same payload\n buffer_size = 100\n logger.debug(\"Report buffer size set to: {}\".format(buffer_size))\n\n for i in range(0, len(self.report_queue), buffer_size):\n\n # Reset the report compilation batch\n reports_compilation = []\n\n # Iterate over report JSON batches determined by buffer_size\n for report in self.report_queue[i: i + buffer_size]:\n try:\n report_file = [x for x in os.listdir(report)\n if x.endswith(\".json\")][0]\n except IndexError:\n continue\n with open(join(report, report_file)) as fh:\n reports_compilation.append(json.loads(fh.read()))\n\n logger.debug(\"Payload sent with size: {}\".format(\n asizeof(json.dumps(reports_compilation))\n ))\n logger.debug(\"status: {}\".format(self.status_info))\n\n try:\n requests.put(\n self.broadcast_address,\n json={\"run_id\": report_id,\n \"report_json\": reports_compilation,\n \"status\": self.status_info}\n )\n except requests.exceptions.ConnectionError:\n logger.error(colored_print(\n \"ERROR: Could not establish connection with server. The server\"\n \" may be down or there is a problem with your internet \"\n \"connection.\", \"red_bold\"))\n sys.exit(1)\n\n # When there is no change in the report queue, but there is a change\n # in the run status of the pipeline\n if not self.report_queue:\n\n logger.debug(\"status: {}\".format(self.status_info))\n\n try:\n requests.put(\n self.broadcast_address,\n json={\"run_id\": report_id,\n \"report_json\": [],\n \"status\": self.status_info}\n )\n except requests.exceptions.ConnectionError:\n logger.error(colored_print(\n \"ERROR: Could not establish connection with server. The\"\n \" server may be down or there is a problem with your \"\n \"internet connection.\", \"red_bold\"))\n sys.exit(1)\n\n # Reset the report queue after sending the request\n self.report_queue = []", "response": "Sends a PUT request to the report queue attribute."} {"SOURCE": "codesearchnet", "instruction": "Here you have a function in Python 3, explain what it does\ndef _init_live_reports(self, report_id):\n\n logger.debug(\"Sending initial POST request to {} to start report live\"\n \" update\".format(self.broadcast_address))\n\n try:\n with open(\".metadata.json\") as fh:\n metadata = [json.load(fh)]\n except:\n metadata = []\n\n start_json = {\n \"data\": {\"results\": metadata}\n }\n\n try:\n requests.post(\n self.broadcast_address,\n json={\"run_id\": report_id, \"report_json\": start_json,\n \"status\": self.status_info}\n )\n except requests.exceptions.ConnectionError:\n logger.error(colored_print(\n \"ERROR: Could not establish connection with server. The server\"\n \" may be down or there is a problem with your internet \"\n \"connection.\", \"red_bold\"))\n sys.exit(1)", "response": "Sends a POST request to initialize the live reports"} {"SOURCE": "codesearchnet", "instruction": "Can you generate a brief explanation for the following Python 3 code\ndef _close_connection(self, report_id):\n\n logger.debug(\n \"Closing connection and sending DELETE request to {}\".format(\n self.broadcast_address))\n\n try:\n r = requests.delete(self.broadcast_address,\n json={\"run_id\": report_id})\n if r.status_code != 202:\n logger.error(colored_print(\n \"ERROR: There was a problem sending data to the server\"\n \"with reason: {}\".format(r.reason)))\n except requests.exceptions.ConnectionError:\n logger.error(colored_print(\n \"ERROR: Could not establish connection with server. The server\"\n \" may be down or there is a problem with your internet \"\n \"connection.\", \"red_bold\"))\n sys.exit(1)", "response": "Sends a DELETE request to the server and closes the connection."} {"SOURCE": "codesearchnet", "instruction": "Can you generate a brief explanation for the following Python 3 code\ndef convert_adatpers(adapter_fasta):\n\n adapter_out = \"fastqc_adapters.tab\"\n logger.debug(\"Setting output adapters file to: {}\".format(adapter_out))\n\n try:\n\n with open(adapter_fasta) as fh, \\\n open(adapter_out, \"w\") as adap_fh:\n\n for line in fh:\n if line.startswith(\">\"):\n\n head = line[1:].strip()\n # Get the next line with the sequence string\n sequence = next(fh).strip()\n\n adap_fh.write(\"{}\\\\t{}\\\\n\".format(head, sequence))\n\n logger.info(\"Converted adapters file\")\n\n return adapter_out\n\n # If an invalid adapters file is provided, return None.\n except FileNotFoundError:\n logger.warning(\"Could not find the provided adapters file: {}\".format(\n adapter_fasta))\n return", "response": "Converts a single adapter file from a FASTA file."} {"SOURCE": "codesearchnet", "instruction": "Implement a Python 3 function for\nsending dictionary to output json file This function sends master_dict dictionary to a json file if master_dict is populated with entries, otherwise it won't create the file Parameters ---------- master_dict: dict dictionary that stores all entries for a specific query sequence in multi-fasta given to mash dist as input against patlas database last_seq: str string that stores the last sequence that was parsed before writing to file and therefore after the change of query sequence between different rows on the input file mash_output: str the name/path of input file to main function, i.e., the name/path of the mash dist output txt file. sample_id: str The name of the sample being parse to .report.json file Returns -------", "response": "def send_to_output(master_dict, mash_output, sample_id, assembly_file):\n \"\"\"Send dictionary to output json file\n This function sends master_dict dictionary to a json file if master_dict is\n populated with entries, otherwise it won't create the file\n\n Parameters\n ----------\n master_dict: dict\n dictionary that stores all entries for a specific query sequence\n in multi-fasta given to mash dist as input against patlas database\n last_seq: str\n string that stores the last sequence that was parsed before writing to\n file and therefore after the change of query sequence between different\n rows on the input file\n mash_output: str\n the name/path of input file to main function, i.e., the name/path of\n the mash dist output txt file.\n sample_id: str\n The name of the sample being parse to .report.json file\n\n Returns\n -------\n\n \"\"\"\n\n plot_dict = {}\n\n # create a new file only if master_dict is populated\n if master_dict:\n out_file = open(\"{}.json\".format(\n \"\".join(mash_output.split(\".\")[0])), \"w\")\n out_file.write(json.dumps(master_dict))\n out_file.close()\n\n # iterate through master_dict in order to make contigs the keys\n for k,v in master_dict.items():\n if not v[2] in plot_dict:\n plot_dict[v[2]] = [k]\n else:\n plot_dict[v[2]].append(k)\n\n number_hits = len(master_dict)\n else:\n number_hits = 0\n\n json_dic = {\n \"tableRow\": [{\n \"sample\": sample_id,\n \"data\": [{\n \"header\": \"Mash Dist\",\n \"table\": \"plasmids\",\n \"patlas_mashdist\": master_dict,\n \"value\": number_hits\n }]\n }],\n \"plotData\": [{\n \"sample\": sample_id,\n \"data\": {\n \"patlasMashDistXrange\": plot_dict\n },\n \"assemblyFile\": assembly_file\n }]\n }\n\n with open(\".report.json\", \"w\") as json_report:\n json_report.write(json.dumps(json_dic, separators=(\",\", \":\")))"} {"SOURCE": "codesearchnet", "instruction": "How would you code a function in Python 3 to\nwrite the versions JSON for a template file.", "response": "def build_versions(self):\n \"\"\"Writes versions JSON for a template file\n\n This method creates the JSON file ``.versions`` based on the metadata\n and specific functions that are present in a given template script.\n\n It starts by fetching the template metadata, which can be specified\n via the ``__version__``, ``__template__`` and ``__build__``\n attributes. If all of these attributes exist, it starts to populate\n a JSON/dict array (Note that the absence of any one of them will\n prevent the version from being written).\n\n Then, it will search the\n template scope for functions that start with the substring\n ``__set_version`` (For example ``def __set_version_fastqc()`).\n These functions should gather the version of\n an arbitrary program and return a JSON/dict object with the following\n information::\n\n {\n \"program\": ,\n \"version\": \n \"build\": \n }\n\n This JSON/dict object is then written in the ``.versions`` file.\n \"\"\"\n\n version_storage = []\n\n template_version = self.context.get(\"__version__\", None)\n template_program = self.context.get(\"__template__\", None)\n template_build = self.context.get(\"__build__\", None)\n\n if template_version and template_program and template_build:\n if self.logger:\n self.logger.debug(\"Adding template version: {}; {}; \"\n \"{}\".format(template_program,\n template_version,\n template_build))\n version_storage.append({\n \"program\": template_program,\n \"version\": template_version,\n \"build\": template_build\n })\n\n for var, obj in self.context.items():\n if var.startswith(\"__get_version\"):\n ver = obj()\n version_storage.append(ver)\n if self.logger:\n self.logger.debug(\"Found additional software version\"\n \"{}\".format(ver))\n\n with open(\".versions\", \"w\") as fh:\n fh.write(json.dumps(version_storage, separators=(\",\", \":\")))"} {"SOURCE": "codesearchnet", "instruction": "Implement a Python 3 function for\nconverting top results from mash screen txt output to json format Parameters ---------- mash_output: str this is a string that stores the path to this file, i.e, the name of the file sample_id: str sample name", "response": "def main(mash_output, sample_id):\n '''\n converts top results from mash screen txt output to json format\n\n Parameters\n ----------\n mash_output: str\n this is a string that stores the path to this file, i.e, the name of\n the file\n sample_id: str\n sample name\n\n '''\n logger.info(\"Reading file : {}\".format(mash_output))\n read_mash_output = open(mash_output)\n\n dic = {}\n median_list = []\n filtered_dic = {}\n\n logger.info(\"Generating dictionary and list to pre-process the final json\")\n for line in read_mash_output:\n tab_split = line.split(\"\\t\")\n identity = tab_split[0]\n # shared_hashes = tab_split[1]\n median_multiplicity = tab_split[2]\n # p_value = tab_split[3]\n query_id = tab_split[4]\n # query-comment should not exist here and it is irrelevant\n\n # here identity is what in fact interests to report to json but\n # median_multiplicity also is important since it gives an rough\n # estimation of the coverage depth for each plasmid.\n # Plasmids should have higher coverage depth due to their increased\n # copy number in relation to the chromosome.\n dic[query_id] = [identity, median_multiplicity]\n median_list.append(float(median_multiplicity))\n\n output_json = open(\" \".join(mash_output.split(\".\")[:-1]) + \".json\", \"w\")\n\n # median cutoff is twice the median of all median_multiplicity values\n # reported by mash screen. In the case of plasmids, since the database\n # has 9k entries and reads shouldn't have that many sequences it seems ok...\n if len(median_list) > 0:\n # this statement assures that median_list has indeed any entries\n median_cutoff = median(median_list)\n logger.info(\"Generating final json to dump to a file\")\n for k, v in dic.items():\n # estimated copy number\n copy_number = int(float(v[1]) / median_cutoff)\n # assure that plasmid as at least twice the median coverage depth\n if float(v[1]) > median_cutoff:\n filtered_dic[\"_\".join(k.split(\"_\")[0:3])] = [\n round(float(v[0]),2),\n copy_number\n ]\n logger.info(\n \"Exported dictionary has {} entries\".format(len(filtered_dic)))\n else:\n # if no entries were found raise an error\n logger.error(\"No matches were found using mash screen for the queried reads\")\n\n output_json.write(json.dumps(filtered_dic))\n output_json.close()\n\n json_dic = {\n \"tableRow\": [{\n \"sample\": sample_id,\n \"data\": [{\n \"header\": \"Mash Screen\",\n \"table\": \"plasmids\",\n \"patlas_mashscreen\": filtered_dic,\n \"value\": len(filtered_dic)\n }]\n }],\n }\n\n with open(\".report.json\", \"w\") as json_report:\n json_report.write(json.dumps(json_dic, separators=(\",\", \":\")))"} {"SOURCE": "codesearchnet", "instruction": "Explain what the following Python 3 code does\ndef procs_dict_parser(procs_dict):\n\n logger.info(colored_print(\n \"\\n===== L I S T O F P R O C E S S E S =====\\n\", \"green_bold\"))\n\n #Sort to print alphabetically ordered list of processes to ease reading\n procs_dict_ordered = {k: procs_dict[k] for k in sorted(procs_dict)}\n\n for template, dict_proc_info in procs_dict_ordered.items():\n template_str = \"=> {}\".format(template)\n logger.info(colored_print(template_str, \"blue_bold\"))\n\n for info in dict_proc_info:\n info_str = \"{}:\".format(info)\n\n if isinstance(dict_proc_info[info], list):\n if not dict_proc_info[info]:\n arg_msg = \"None\"\n else:\n arg_msg = \", \".join(dict_proc_info[info])\n elif info == \"directives\":\n # this is used for the \"directives\", which is a dict\n if not dict_proc_info[info]:\n # if dict is empty then add None to the message\n arg_msg = \"None\"\n else:\n # otherwise fetch all template names within a component\n # and all the directives for each template to a list\n list_msg = [\"\\n {}: {}\".format(\n templt,\n \" , \".join([\"{}: {}\".format(dr, val)\n for dr, val in drs.items()]))\n for templt, drs in dict_proc_info[info].items()\n ]\n # write list to a str\n arg_msg = \"\".join(list_msg)\n else:\n arg_msg = dict_proc_info[info]\n\n logger.info(\" {} {}\".format(\n colored_print(info_str, \"white_underline\"), arg_msg\n ))", "response": "This function handles the dictionary of attributes of each Process class and prints to stdout lists of all the components that are used by the - t flag."} {"SOURCE": "codesearchnet", "instruction": "How would you implement a function in Python 3 that\nfunctions that collects all processes currently available and stores a dictionary of process class names and their arguments.", "response": "def proc_collector(process_map, args, pipeline_string):\n \"\"\"\n Function that collects all processes available and stores a dictionary of\n the required arguments of each process class to be passed to\n procs_dict_parser\n\n Parameters\n ----------\n process_map: dict\n The dictionary with the Processes currently available in flowcraft\n and their corresponding classes as values\n args: argparse.Namespace\n The arguments passed through argparser that will be access to check the\n type of list to be printed\n pipeline_string: str\n the pipeline string\n\n \"\"\"\n\n arguments_list = []\n\n # prints a detailed list of the process class arguments\n if args.detailed_list:\n # list of attributes to be passed to proc_collector\n arguments_list += [\n \"input_type\",\n \"output_type\",\n \"description\",\n \"dependencies\",\n \"conflicts\",\n \"directives\"\n ]\n\n # prints a short list with each process and the corresponding description\n if args.short_list:\n arguments_list += [\n \"description\"\n ]\n\n if arguments_list:\n # dict to store only the required entries\n procs_dict = {}\n # loops between all process_map Processes\n for name, cls in process_map.items():\n\n # instantiates each Process class\n cls_inst = cls(template=name)\n\n # checks if recipe is provided\n if pipeline_string:\n if name not in pipeline_string:\n continue\n\n d = {arg_key: vars(cls_inst)[arg_key] for arg_key in\n vars(cls_inst) if arg_key in arguments_list}\n procs_dict[name] = d\n\n procs_dict_parser(procs_dict)\n\n sys.exit(0)"} {"SOURCE": "codesearchnet", "instruction": "Given the following Python 3 function, write the documentation\ndef guess_file_compression(file_path, magic_dict=None):\n\n if not magic_dict:\n magic_dict = MAGIC_DICT\n\n max_len = max(len(x) for x in magic_dict)\n\n with open(file_path, \"rb\") as f:\n file_start = f.read(max_len)\n\n logger.debug(\"Binary signature start: {}\".format(file_start))\n\n for magic, file_type in magic_dict.items():\n if file_start.startswith(magic):\n return file_type\n\n return None", "response": "Guesses the compression of a given file."} {"SOURCE": "codesearchnet", "instruction": "Can you tell what is the following Python 3 function doing\ndef get_qual_range(qual_str):\n\n vals = [ord(c) for c in qual_str]\n\n return min(vals), max(vals)", "response": "Returns the range of Unicode encode range for a given string of characters."} {"SOURCE": "codesearchnet", "instruction": "Can you implement a function in Python 3 that\nreturns the list of all possible encodings for a given encoding range.", "response": "def get_encodings_in_range(rmin, rmax):\n \"\"\" Returns the valid encodings for a given encoding range.\n\n The encoding ranges are stored in the :py:data:`RANGES` dictionary, with\n the encoding name as a string and a list as a value containing the\n phred score and a tuple with the encoding range. For a given encoding\n range provided via the two first arguments, this function will return\n all possible encodings and phred scores.\n\n Parameters\n ----------\n rmin : int\n Minimum Unicode code in range.\n rmax : int\n Maximum Unicode code in range.\n\n Returns\n -------\n valid_encodings : list\n List of all possible encodings for the provided range.\n valid_phred : list\n List of all possible phred scores.\n\n \"\"\"\n\n valid_encodings = []\n valid_phred = []\n\n for encoding, (phred, (emin, emax)) in RANGES.items():\n if rmin >= emin and rmax <= emax:\n valid_encodings.append(encoding)\n valid_phred.append(phred)\n\n return valid_encodings, valid_phred"} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 function for\nparsing a TSV file containing coverage information into objects.", "response": "def parse_coverage_table(coverage_file):\n \"\"\"Parses a file with coverage information into objects.\n\n This function parses a TSV file containing coverage results for\n all contigs in a given assembly and will build an ``OrderedDict``\n with the information about their coverage and length. The length\n information is actually gathered from the contig header using a\n regular expression that assumes the usual header produced by Spades::\n\n contig_len = int(re.search(\"length_(.+?)_\", line).group(1))\n\n Parameters\n ----------\n coverage_file : str\n Path to TSV file containing the coverage results.\n\n Returns\n -------\n coverage_dict : OrderedDict\n Contains the coverage and length information for each contig.\n total_size : int\n Total size of the assembly in base pairs.\n total_cov : int\n Sum of coverage values across all contigs.\n \"\"\"\n\n # Stores the correspondence between a contig and the corresponding coverage\n # e.g.: {\"contig_1\": {\"cov\": 424} }\n coverage_dict = OrderedDict()\n # Stores the total coverage\n total_cov = 0\n\n with open(coverage_file) as fh:\n for line in fh:\n # Get contig and coverage\n contig, cov = line.strip().split()\n coverage_dict[contig] = {\"cov\": int(cov)}\n # Add total coverage\n total_cov += int(cov)\n logger.debug(\"Processing contig '{}' with coverage '{}'\"\n \"\".format(contig, cov))\n\n return coverage_dict, total_cov"} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 function that can\ngenerate a filtered assembly file. This function generates a filtered assembly file based on an original assembly and a minimum coverage threshold. Parameters ---------- assembly_file : str Path to original assembly file. minimum_coverage : int or float Minimum coverage required for a contig to pass the filter. coverage_info : OrderedDict or dict Dictionary containing the coverage information for each contig. output_file : str Path where the filtered assembly file will be generated.", "response": "def filter_assembly(assembly_file, minimum_coverage, coverage_info,\n output_file):\n \"\"\"Generates a filtered assembly file.\n\n This function generates a filtered assembly file based on an original\n assembly and a minimum coverage threshold.\n\n Parameters\n ----------\n assembly_file : str\n Path to original assembly file.\n minimum_coverage : int or float\n Minimum coverage required for a contig to pass the filter.\n coverage_info : OrderedDict or dict\n Dictionary containing the coverage information for each contig.\n output_file : str\n Path where the filtered assembly file will be generated.\n\n \"\"\"\n\n # This flag will determine whether sequence data should be written or\n # ignored because the current contig did not pass the minimum\n # coverage threshold\n write_flag = False\n\n with open(assembly_file) as fh, open(output_file, \"w\") as out_fh:\n\n for line in fh:\n if line.startswith(\">\"):\n # Reset write_flag\n write_flag = False\n # Get header of contig\n header = line.strip()[1:]\n # Check coverage for current contig\n contig_cov = coverage_info[header][\"cov\"]\n # If the contig coverage is above the threshold, write to\n # output filtered assembly\n if contig_cov >= minimum_coverage:\n write_flag = True\n out_fh.write(line)\n\n elif write_flag:\n out_fh.write(line)"} {"SOURCE": "codesearchnet", "instruction": "Can you generate a brief explanation for the following Python 3 code\ndef filter_bam(coverage_info, bam_file, min_coverage, output_bam):\n\n # Get list of contigs that will be kept\n contig_list = [x for x, vals in coverage_info.items()\n if vals[\"cov\"] >= min_coverage]\n\n cli = [\n \"samtools\",\n \"view\",\n \"-bh\",\n \"-F\",\n \"4\",\n \"-o\",\n output_bam,\n \"-@\",\n \"1\",\n bam_file,\n ]\n\n cli += contig_list\n\n logger.debug(\"Runnig samtools view subprocess with command: {}\".format(\n cli))\n\n p = subprocess.Popen(cli, stdout=PIPE, stderr=PIPE)\n stdout, stderr = p.communicate()\n\n # Attempt to decode STDERR output from bytes. If unsuccessful, coerce to\n # string\n try:\n stderr = stderr.decode(\"utf8\")\n stdout = stdout.decode(\"utf8\")\n except (UnicodeDecodeError, AttributeError):\n stderr = str(stderr)\n stdout = str(stdout)\n\n logger.info(\"Finished samtools view subprocess with STDOUT:\\\\n\"\n \"======================================\\\\n{}\".format(stdout))\n logger.info(\"Fished samtools view subprocesswith STDERR:\\\\n\"\n \"======================================\\\\n{}\".format(stderr))\n logger.info(\"Finished samtools view with return code: {}\".format(\n p.returncode))\n\n if not p.returncode:\n # Create index\n cli = [\n \"samtools\",\n \"index\",\n output_bam\n ]\n\n logger.debug(\"Runnig samtools index subprocess with command: \"\n \"{}\".format(cli))\n\n p = subprocess.Popen(cli, stdout=PIPE, stderr=PIPE)\n stdout, stderr = p.communicate()\n\n try:\n stderr = stderr.decode(\"utf8\")\n stdout = stdout.decode(\"utf8\")\n except (UnicodeDecodeError, AttributeError):\n stderr = str(stderr)\n stdout = str(stdout)\n\n logger.info(\"Finished samtools index subprocess with STDOUT:\\\\n\"\n \"======================================\\\\n{}\".format(\n stdout))\n logger.info(\"Fished samtools index subprocesswith STDERR:\\\\n\"\n \"======================================\\\\n{}\".format(\n stderr))\n logger.info(\"Finished samtools index with return code: {}\".format(\n p.returncode))", "response": "This function uses Samtools to filter a BAM file according to minimum coverage."} {"SOURCE": "codesearchnet", "instruction": "Can you tell what is the following Python 3 function doing\ndef check_filtered_assembly(coverage_info, coverage_bp, minimum_coverage,\n genome_size, contig_size, max_contigs,\n sample_id):\n \"\"\"Checks whether a filtered assembly passes a size threshold\n\n Given a minimum coverage threshold, this function evaluates whether an\n assembly will pass the minimum threshold of ``genome_size * 1e6 * 0.8``,\n which means 80% of the expected genome size or the maximum threshold\n of ``genome_size * 1e6 * 1.5``, which means 150% of the expected genome\n size. It will issue a warning if any of these thresholds is crossed.\n In the case of an expected genome size below 80% it will return False.\n\n Parameters\n ----------\n coverage_info : OrderedDict or dict\n Dictionary containing the coverage information for each contig.\n coverage_bp : dict\n Dictionary containing the per base coverage information for each\n contig. Used to determine the total number of base pairs in the\n final assembly.\n minimum_coverage : int\n Minimum coverage required for a contig to pass the filter.\n genome_size : int\n Expected genome size.\n contig_size : dict\n Dictionary with the len of each contig. Contig headers as keys and\n the corresponding lenght as values.\n max_contigs : int\n Maximum threshold for contig number. A warning is issued if this\n threshold is crossed.\n sample_id : str\n Id or name of the current sample\n\n Returns\n -------\n x : bool\n True if the filtered assembly size is higher than 80% of the\n expected genome size.\n\n \"\"\"\n\n # Get size of assembly after filtering contigs below minimum_coverage\n assembly_len = sum([v for k, v in contig_size.items()\n if coverage_info[k][\"cov\"] >= minimum_coverage])\n logger.debug(\"Assembly length after filtering for minimum coverage of\"\n \" {}: {}\".format(minimum_coverage, assembly_len))\n # Get number of contigs after filtering\n ncontigs = len([x for x in coverage_info.values()\n if x[\"cov\"] >= minimum_coverage])\n logger.debug(\"Number of contigs: {}\".format(ncontigs))\n # Get number of bp after filtering\n filtered_contigs = [k for k, v in coverage_info.items()\n if v[\"cov\"] >= minimum_coverage]\n logger.debug(\"Filtered contigs for minimum coverage of \"\n \"{}: {}\".format(minimum_coverage, filtered_contigs))\n total_assembled_bp = sum([sum(coverage_bp[x]) for x in filtered_contigs\n if x in coverage_bp])\n logger.debug(\"Total number of assembled base pairs:\"\n \"{}\".format(total_assembled_bp))\n\n warnings = []\n fails = []\n health = True\n\n with open(\".warnings\", \"w\") as warn_fh, \\\n open(\".report.json\", \"w\") as json_report:\n\n logger.debug(\"Checking assembly size after filtering : {}\".format(\n assembly_len))\n\n # If the filtered assembly size is above the 150% genome size\n # threshold, issue a warning\n if assembly_len > genome_size * 1e6 * 1.5:\n warn_msg = \"Assembly size ({}) smaller than the maximum\" \\\n \" threshold of 150% of expected genome size.\".format(\n assembly_len)\n logger.warning(warn_msg)\n warn_fh.write(warn_msg)\n fails.append(\"Large_genome_size_({})\".format(assembly_len))\n\n # If the number of contigs in the filtered assembly size crosses the\n # max_contigs threshold, issue a warning\n logger.debug(\"Checking number of contigs: {}\".format(\n len(coverage_info)))\n contig_threshold = max_contigs * genome_size / 1.5\n if ncontigs > contig_threshold:\n warn_msg = \"The number of contigs ({}) exceeds the threshold of \" \\\n \"100 contigs per 1.5Mb ({})\".format(\n ncontigs, round(contig_threshold, 1))\n logger.warning(warn_msg)\n warn_fh.write(warn_msg)\n warnings.append(warn_msg)\n\n # If the filtered assembly size falls below the 80% genome size\n # threshold, fail this check and return False\n if assembly_len < genome_size * 1e6 * 0.8:\n warn_msg = \"Assembly size smaller than the minimum\" \\\n \" threshold of 80% of expected genome size: {}\".format(\n assembly_len)\n logger.warning(warn_msg)\n warn_fh.write(warn_msg)\n fails.append(\"Small_genome_size_({})\".format(assembly_len))\n assembly_len = sum([v for v in contig_size.values()])\n total_assembled_bp = sum(\n [sum(coverage_bp[x]) for x in coverage_info if x in\n coverage_bp])\n logger.debug(\"Assembly length without coverage filtering: \"\n \"{}\".format(assembly_len))\n logger.debug(\"Total number of assembled base pairs without\"\n \" filtering: {}\".format(total_assembled_bp))\n\n health = False\n\n json_dic = {\n \"plotData\": [{\n \"sample\": sample_id,\n \"data\": {\n \"sparkline\": total_assembled_bp\n }\n }]\n }\n\n if warnings:\n json_dic[\"warnings\"] = [{\n \"sample\": sample_id,\n \"table\": \"assembly\",\n \"value\": warnings\n }]\n if fails:\n json_dic[\"fail\"] = [{\n \"sample\": sample_id,\n \"table\": \"assembly\",\n \"value\": [fails]\n }]\n\n json_report.write(json.dumps(json_dic, separators=(\",\", \":\")))\n\n return health", "response": "Checks whether a filtered assembly passes a minimum coverage threshold Given a minimum coverage threshold this function evaluates whether the filtered assembly passes the minimum coverage threshold and returns True if the filtered assembly passes False otherwise."} {"SOURCE": "codesearchnet", "instruction": "Given the following Python 3 function, write the documentation\ndef evaluate_min_coverage(coverage_opt, assembly_coverage, assembly_size):\n\n if coverage_opt == \"auto\":\n # Get the 1/3 value of the current assembly coverage\n min_coverage = (assembly_coverage / assembly_size) * .3\n logger.info(\"Minimum assembly coverage automatically set to: \"\n \"{}\".format(min_coverage))\n # If the 1/3 coverage is lower than 10, change it to the minimum of\n # 10\n if min_coverage < 10:\n logger.info(\"Minimum assembly coverage cannot be set to lower\"\n \" that 10. Setting to 10\")\n min_coverage = 10\n else:\n min_coverage = int(coverage_opt)\n logger.info(\"Minimum assembly coverage manually set to: {}\".format(\n min_coverage))\n\n return min_coverage", "response": "Evaluates the minimum coverage threshold from the value provided in\n ."} {"SOURCE": "codesearchnet", "instruction": "Create a Python 3 function to\nreturn the number of nucleotides and the size per contig for the entire file.", "response": "def get_assembly_size(assembly_file):\n \"\"\"Returns the number of nucleotides and the size per contig for the\n provided assembly file path\n\n Parameters\n ----------\n assembly_file : str\n Path to assembly file.\n\n Returns\n -------\n assembly_size : int\n Size of the assembly in nucleotides\n contig_size : dict\n Length of each contig (contig name as key and length as value)\n\n \"\"\"\n\n assembly_size = 0\n contig_size = {}\n header = \"\"\n\n with open(assembly_file) as fh:\n for line in fh:\n\n # Skip empty lines\n if line.strip() == \"\":\n continue\n\n if line.startswith(\">\"):\n header = line.strip()[1:]\n contig_size[header] = 0\n\n else:\n line_len = len(line.strip())\n assembly_size += line_len\n contig_size[header] += line_len\n\n return assembly_size, contig_size"} {"SOURCE": "codesearchnet", "instruction": "Make a summary of the following Python 3 code\ndef main(sample_id, assembly_file, gsize, opts, assembler):\n\n logger.info(\"Starting assembly file processing\")\n warnings = []\n fails = \"\"\n\n min_contig_len, min_kmer_cov, max_contigs = [int(x) for x in opts]\n logger.debug(\"Setting minimum conting length to: {}\".format(\n min_contig_len))\n logger.debug(\"Setting minimum kmer coverage: {}\".format(min_kmer_cov))\n\n # Parse the spades assembly file and perform the first filtering.\n logger.info(\"Starting assembly parsing\")\n assembly_obj = Assembly(assembly_file, min_contig_len, min_kmer_cov,\n sample_id)\n\n with open(\".warnings\", \"w\") as warn_fh:\n t_80 = gsize * 1000000 * 0.8\n t_150 = gsize * 1000000 * 1.5\n # Check if assembly size of the first assembly is lower than 80% of the\n # estimated genome size. If True, redo the filtering without the\n # k-mer coverage filter\n assembly_len = assembly_obj.get_assembly_length()\n logger.debug(\"Checking assembly length: {}\".format(assembly_len))\n\n if assembly_len < t_80:\n\n logger.warning(\"Assembly size ({}) smaller than the minimum \"\n \"threshold of 80% of expected genome size. \"\n \"Applying contig filters without the k-mer \"\n \"coverage filter\".format(assembly_len))\n assembly_obj.filter_contigs(*[\n [\"length\", \">=\", min_contig_len]\n ])\n\n assembly_len = assembly_obj.get_assembly_length()\n logger.debug(\"Checking updated assembly length: \"\n \"{}\".format(assembly_len))\n if assembly_len < t_80:\n\n warn_msg = \"Assembly size smaller than the minimum\" \\\n \" threshold of 80% of expected genome size: {}\".format(\n assembly_len)\n logger.warning(warn_msg)\n warn_fh.write(warn_msg)\n fails = warn_msg\n\n if assembly_len > t_150:\n\n warn_msg = \"Assembly size ({}) larger than the maximum\" \\\n \" threshold of 150% of expected genome size.\".format(\n assembly_len)\n logger.warning(warn_msg)\n warn_fh.write(warn_msg)\n fails = warn_msg\n\n logger.debug(\"Checking number of contigs: {}\".format(\n len(assembly_obj.contigs)))\n contig_threshold = (max_contigs * gsize) / 1.5\n if len(assembly_obj.contigs) > contig_threshold:\n\n warn_msg = \"The number of contigs ({}) exceeds the threshold of \" \\\n \"{} contigs per 1.5Mb ({})\".format(\n len(assembly_obj.contigs),\n max_contigs,\n round(contig_threshold, 1))\n\n logger.warning(warn_msg)\n warn_fh.write(warn_msg)\n warnings.append(warn_msg)\n\n # Write filtered assembly\n logger.debug(\"Renaming old assembly file to: {}\".format(\n \"{}.old\".format(assembly_file)))\n assembly_obj.write_assembly(\"{}_proc.fasta\".format(\n os.path.splitext(assembly_file)[0]))\n # Write report\n output_report = \"{}.report.csv\".format(sample_id)\n assembly_obj.write_report(output_report)\n # Write json report\n with open(\".report.json\", \"w\") as json_report:\n json_dic = {\n \"tableRow\": [{\n \"sample\": sample_id,\n \"data\": [\n {\"header\": \"Contigs ({})\".format(assembler),\n \"value\": len(assembly_obj.contigs),\n \"table\": \"assembly\",\n \"columnBar\": True},\n {\"header\": \"Assembled BP ({})\".format(assembler),\n \"value\": assembly_len,\n \"table\": \"assembly\",\n \"columnBar\": True}\n ]\n }],\n }\n\n if warnings:\n json_dic[\"warnings\"] = [{\n \"sample\": sample_id,\n \"table\": \"assembly\",\n \"value\": warnings\n }]\n\n if fails:\n json_dic[\"fail\"] = [{\n \"sample\": sample_id,\n \"table\": \"assembly\",\n \"value\": [fails]\n }]\n\n json_report.write(json.dumps(json_dic, separators=(\",\", \":\")))\n\n with open(\".status\", \"w\") as status_fh:\n status_fh.write(\"pass\")", "response": "Main function for the process_spades template."} {"SOURCE": "codesearchnet", "instruction": "Can you tell what is the following Python 3 function doing\ndef convert_camel_case(name):\n s1 = re.sub('(.)([A-Z][a-z]+)', r'\\1_\\2', name)\n return re.sub('([a-z0-9])([A-Z])', r'\\1_\\2', s1).lower()", "response": "Convers a CamelCase string into a snake_case one"} {"SOURCE": "codesearchnet", "instruction": "Can you write a function in Python 3 where it\ncollects Process classes and return dict mapping templates to classes This function crawls through the components module and retrieves all classes that inherit from the Process class. Then, it converts the name of the classes (which should be CamelCase) to snake_case, which is used as the template name. Returns ------- dict Dictionary mapping the template name (snake_case) to the corresponding process class.", "response": "def collect_process_map():\n \"\"\"Collects Process classes and return dict mapping templates to classes\n\n This function crawls through the components module and retrieves all\n classes that inherit from the Process class. Then, it converts the name\n of the classes (which should be CamelCase) to snake_case, which is used\n as the template name.\n\n Returns\n -------\n dict\n Dictionary mapping the template name (snake_case) to the corresponding\n process class.\n \"\"\"\n\n process_map = {}\n\n prefix = \"{}.\".format(components.__name__)\n for importer, modname, _ in pkgutil.iter_modules(components.__path__,\n prefix):\n\n _module = importer.find_module(modname).load_module(modname)\n\n _component_classes = [\n cls for cls in _module.__dict__.values() if\n isinstance(cls, type) and cls.__name__ != \"Process\"\n ]\n\n for cls in _component_classes:\n process_map[convert_camel_case(cls.__name__)] = cls\n\n return process_map"} {"SOURCE": "codesearchnet", "instruction": "Make a summary of the following Python 3 code\ndef main(newick):\n\n logger.info(\"Starting newick file processing\")\n\n print(newick)\n\n tree = dendropy.Tree.get(file=open(newick, 'r'), schema=\"newick\")\n\n tree.reroot_at_midpoint()\n\n to_write=tree.as_string(\"newick\").strip().replace(\"[&R] \", '').replace(' ', '_').replace(\"'\", \"\")\n\n with open(\".report.json\", \"w\") as json_report:\n json_dic = {\n \"treeData\": [{\n \"trees\": [\n to_write\n ]\n }],\n }\n\n json_report.write(json.dumps(json_dic, separators=(\",\", \":\")))\n\n with open(\".status\", \"w\") as status_fh:\n status_fh.write(\"pass\")", "response": "This function is the main function of the process_newick template. It is the main function of the process_newick template. It is the main function of the process_newick template. It is the main function of the process_newick template. It is the main function of the process_newick template. It is the main function of the process_newick template."} {"SOURCE": "codesearchnet", "instruction": "How would you implement a function in Python 3 that\nfactorizes s. t. CUR", "response": "def factorize(self):\n \"\"\" Factorize s.t. CUR = data\n\n Updated Values\n --------------\n .C : updated values for C.\n .U : updated values for U.\n .R : updated values for R.\n \"\"\"\n\n [prow, pcol] = self.sample_probability()\n\n self._rid = self.sample(self._rrank, prow)\n self._cid = self.sample(self._crank, pcol)\n\n self._cmdinit()\n\n self.computeUCR()"} {"SOURCE": "codesearchnet", "instruction": "Can you generate the documentation for the following Python 3 function\ndef factorize(self):\n [prow, pcol] = self.sample_probability()\n self._rid = self.sample(self._rrank, prow)\n self._cid = self.sample(self._crank, pcol)\n\n self._rcnt = np.ones(len(self._rid))\n self._ccnt = np.ones(len(self._cid))\n\n self.computeUCR()", "response": "Factorize s. t. CUR"} {"SOURCE": "codesearchnet", "instruction": "Explain what the following Python 3 code does\ndef quickhull(sample):\n\n link = lambda a, b: np.concatenate((a, b[1:]))\n edge = lambda a, b: np.concatenate(([a], [b]))\n\n def dome(sample, base):\n h, t = base\n dists = np.dot(sample - h, np.dot(((0, -1), (1, 0)), (t - h)))\n outer = np.repeat(sample, dists > 0, axis=0)\n\n if len(outer):\n pivot = sample[np.argmax(dists)]\n return link(dome(outer, edge(h, pivot)),\n dome(outer, edge(pivot, t)))\n else:\n return base\n\n if len(sample) > 2:\n axis = sample[:, 0]\n base = np.take(sample, [np.argmin(axis), np.argmax(axis)], axis=0)\n return link(dome(sample, base),\n dome(sample, base[::-1]))\n else:\n return sample", "response": "Find data points on the convex hull of a supplied data set."} {"SOURCE": "codesearchnet", "instruction": "Can you tell what is the following Python 3 function doing\ndef _map_w_to_data(self):\n\n # assign W to the next best data sample\n self._Wmapped_index = vq(self.data, self.W)\n self.Wmapped = np.zeros(self.W.shape)\n\n # do not directly assign, i.e. Wdist = self.data[:,sel]\n # as self might be unsorted (in non ascending order)\n # -> sorting sel would screw the matching to W if\n # self.data is stored as a hdf5 table (see h5py)\n for i, s in enumerate(self._Wmapped_index):\n self.Wmapped[:,i] = self.data[:,s]", "response": "Map W to data points that are most similar to basis vectors W\n "} {"SOURCE": "codesearchnet", "instruction": "Can you tell what is the following Python 3 function doing\ndef factorize(self, show_progress=False, compute_w=True, compute_h=True,\n compute_err=True, niter=1):\n \"\"\" Factorize s.t. WH = data\n\n Parameters\n ----------\n show_progress : bool\n print some extra information to stdout.\n compute_h : bool\n iteratively update values for H.\n compute_w : bool\n iteratively update values for W.\n compute_err : bool\n compute Frobenius norm |data-WH| after each update and store\n it to .ferr[k].\n\n Updated Values\n --------------\n .W : updated values for W.\n .H : updated values for H.\n .ferr : Frobenius norm |data-WH|.\n \"\"\"\n\n AA.factorize(self, niter=1, show_progress=show_progress,\n compute_w=compute_w, compute_h=compute_h,\n compute_err=compute_err)", "response": "Factorize s. t. WH = data\n "} {"SOURCE": "codesearchnet", "instruction": "Create a Python 3 function for\nupdating the W attribute of the object with new values", "response": "def update_w(self):\n \"\"\" compute new W \"\"\"\n\n def select_next(iterval):\n \"\"\" select the next best data sample using robust map\n or simply the max iterval ... \"\"\"\n\n if self._robust_map:\n k = np.argsort(iterval)[::-1]\n d_sub = self.data[:,k[:self._robust_nselect]]\n self.sub.extend(k[:self._robust_nselect])\n\n # cluster d_sub\n kmeans_mdl = Kmeans(d_sub, num_bases=self._robust_cluster)\n kmeans_mdl.factorize(niter=10)\n\n # get largest cluster\n h = np.histogram(kmeans_mdl.assigned, range(self._robust_cluster+1))[0]\n largest_cluster = np.argmax(h)\n sel = pdist(kmeans_mdl.W[:, largest_cluster:largest_cluster+1], d_sub)\n sel = k[np.argmin(sel)]\n else:\n sel = np.argmax(iterval)\n\n return sel\n\n EPS = 10**-8\n\n if scipy.sparse.issparse(self.data):\n norm_data = np.sqrt(self.data.multiply(self.data).sum(axis=0))\n norm_data = np.array(norm_data).reshape((-1))\n else:\n norm_data = np.sqrt(np.sum(self.data**2, axis=0))\n\n\n self.select = []\n\n if self._method == 'pca' or self._method == 'aa':\n iterval = norm_data.copy()\n\n if self._method == 'nmf':\n iterval = np.sum(self.data, axis=0)/(np.sqrt(self.data.shape[0])*norm_data)\n iterval = 1.0 - iterval\n\n self.select.append(select_next(iterval))\n\n\n for l in range(1, self._num_bases):\n\n if scipy.sparse.issparse(self.data):\n c = self.data[:, self.select[-1]:self.select[-1]+1].T * self.data\n c = np.array(c.todense())\n else:\n c = np.dot(self.data[:,self.select[-1]], self.data)\n\n c = c/(norm_data * norm_data[self.select[-1]])\n\n if self._method == 'pca':\n c = 1.0 - np.abs(c)\n c = c * norm_data\n\n elif self._method == 'aa':\n c = (c*-1.0 + 1.0)/2.0\n c = c * norm_data\n\n elif self._method == 'nmf':\n c = 1.0 - np.abs(c)\n\n ### update the estimated volume\n iterval = c * iterval\n\n # detect the next best data point\n self.select.append(select_next(iterval))\n\n self._logger.info('cur_nodes: ' + str(self.select))\n\n # sort indices, otherwise h5py won't work\n self.W = self.data[:, np.sort(self.select)]\n\n # \"unsort\" it again to keep the correct order\n self.W = self.W[:, np.argsort(np.argsort(self.select))]"} {"SOURCE": "codesearchnet", "instruction": "Implement a function in Python 3 to\nfactorize s. t. WH = data", "response": "def factorize(self, show_progress=False, compute_w=True, compute_h=True,\n compute_err=True, robust_cluster=3, niter=1, robust_nselect=-1):\n \"\"\" Factorize s.t. WH = data\n\n Parameters\n ----------\n show_progress : bool\n print some extra information to stdout.\n False, default\n compute_h : bool\n iteratively update values for H.\n True, default\n compute_w : bool\n iteratively update values for W.\n default, True\n compute_err : bool\n compute Frobenius norm |data-WH| after each update and store\n it to .ferr[k].\n robust_cluster : int, optional\n set the number of clusters for robust map selection.\n 3, default\n robust_nselect : int, optional\n set the number of samples to consider for robust map\n selection.\n -1, default (automatically determine suitable number)\n\n Updated Values\n --------------\n .W : updated values for W.\n .H : updated values for H.\n .ferr : Frobenius norm |data-WH|.\n \"\"\"\n self._robust_cluster = robust_cluster\n self._robust_nselect = robust_nselect\n\n if self._robust_nselect == -1:\n self._robust_nselect = np.round(np.log(self.data.shape[1])*2)\n\n AA.factorize(self, niter=1, show_progress=show_progress,\n compute_w=compute_w, compute_h=compute_h,\n compute_err=compute_err)"} {"SOURCE": "codesearchnet", "instruction": "Can you generate a brief explanation for the following Python 3 code\ndef processFlat(self):\n self.config[\"hier\"] = False\n est_idxs, est_labels, F = self.process()\n assert est_idxs[0] == 0 and est_idxs[-1] == F.shape[1] - 1\n return self._postprocess(est_idxs, est_labels)", "response": "Main process. for flat segmentation."} {"SOURCE": "codesearchnet", "instruction": "Can you generate a brief explanation for the following Python 3 code\ndef processHierarchical(self):\n self.config[\"hier\"] = True\n est_idxs, est_labels, F = self.process()\n for layer in range(len(est_idxs)):\n assert est_idxs[layer][0] == 0 and \\\n est_idxs[layer][-1] == F.shape[1] - 1\n est_idxs[layer], est_labels[layer] = \\\n self._postprocess(est_idxs[layer], est_labels[layer])\n return est_idxs, est_labels", "response": "Main process. for hierarchial segmentation."} {"SOURCE": "codesearchnet", "instruction": "Can you generate the documentation for the following Python 3 function\ndef compute_gaussian_krnl(M):\n g = signal.gaussian(M, M // 3., sym=True)\n G = np.dot(g.reshape(-1, 1), g.reshape(1, -1))\n G[M // 2:, :M // 2] = -G[M // 2:, :M // 2]\n G[:M // 2, M // 2:] = -G[:M // 2, M // 2:]\n return G", "response": "Creates a gaussian kernel following Foote s paper."} {"SOURCE": "codesearchnet", "instruction": "Given the following Python 3 function, write the documentation\ndef compute_ssm(X, metric=\"seuclidean\"):\n D = distance.pdist(X, metric=metric)\n D = distance.squareform(D)\n D /= D.max()\n return 1 - D", "response": "Computes the self - similarity matrix of X."} {"SOURCE": "codesearchnet", "instruction": "Given the following Python 3 function, write the documentation\ndef compute_nc(X, G):\n N = X.shape[0]\n M = G.shape[0]\n nc = np.zeros(N)\n\n for i in range(M // 2, N - M // 2 + 1):\n nc[i] = np.sum(X[i - M // 2:i + M // 2, i - M // 2:i + M // 2] * G)\n\n # Normalize\n nc += nc.min()\n nc /= nc.max()\n return nc", "response": "Computes the novelty curve from the self - similarity matrix X and\n the gaussian kernel G."} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 script to\nobtain peaks from a novelty curve using an adaptive threshold.", "response": "def pick_peaks(nc, L=16):\n \"\"\"Obtain peaks from a novelty curve using an adaptive threshold.\"\"\"\n offset = nc.mean() / 20.\n\n nc = filters.gaussian_filter1d(nc, sigma=4) # Smooth out nc\n\n th = filters.median_filter(nc, size=L) + offset\n #th = filters.gaussian_filter(nc, sigma=L/2., mode=\"nearest\") + offset\n\n peaks = []\n for i in range(1, nc.shape[0] - 1):\n # is it a peak?\n if nc[i - 1] < nc[i] and nc[i] > nc[i + 1]:\n # is it above the threshold?\n if nc[i] > th[i]:\n peaks.append(i)\n #plt.plot(nc)\n #plt.plot(th)\n #for peak in peaks:\n #plt.axvline(peak)\n #plt.show()\n\n return peaks"} {"SOURCE": "codesearchnet", "instruction": "Here you have a function in Python 3, explain what it does\ndef processFlat(self):\n # Preprocess to obtain features\n F = self._preprocess()\n\n # Normalize\n F = msaf.utils.normalize(F, norm_type=self.config[\"bound_norm_feats\"])\n\n # Make sure that the M_gaussian is even\n if self.config[\"M_gaussian\"] % 2 == 1:\n self.config[\"M_gaussian\"] += 1\n\n # Median filter\n F = median_filter(F, M=self.config[\"m_median\"])\n #plt.imshow(F.T, interpolation=\"nearest\", aspect=\"auto\"); plt.show()\n\n # Self similarity matrix\n S = compute_ssm(F)\n\n # Compute gaussian kernel\n G = compute_gaussian_krnl(self.config[\"M_gaussian\"])\n #plt.imshow(S, interpolation=\"nearest\", aspect=\"auto\"); plt.show()\n\n # Compute the novelty curve\n nc = compute_nc(S, G)\n\n # Find peaks in the novelty curve\n est_idxs = pick_peaks(nc, L=self.config[\"L_peaks\"])\n\n # Add first and last frames\n est_idxs = np.concatenate(([0], est_idxs, [F.shape[0] - 1]))\n\n # Empty labels\n est_labels = np.ones(len(est_idxs) - 1) * -1\n\n # Post process estimations\n est_idxs, est_labels = self._postprocess(est_idxs, est_labels)\n\n return est_idxs, est_labels", "response": "Main function for processing the flat data."} {"SOURCE": "codesearchnet", "instruction": "Create a Python 3 function for\nfactorizing s. t. WH = data - WH", "response": "def factorize(self, show_progress=False, compute_w=True, compute_h=True,\n compute_err=True, niter=1):\n \"\"\" Factorize s.t. WH = data\n\n Parameters\n ----------\n show_progress : bool\n print some extra information to stdout.\n niter : int\n number of iterations.\n compute_h : bool\n iteratively update values for H.\n compute_w : bool\n iteratively update values for W.\n compute_err : bool\n compute Frobenius norm |data-WH| after each update and store\n it to .ferr[k].\n\n Updated Values\n --------------\n .W : updated values for W.\n .H : updated values for H.\n .ferr : Frobenius norm |data-WH|.\n \"\"\"\n if show_progress:\n self._logger.setLevel(logging.INFO)\n else:\n self._logger.setLevel(logging.ERROR)\n\n # create W and H if they don't already exist\n # -> any custom initialization to W,H should be done before\n if not hasattr(self,'W'):\n self.init_w()\n\n if not hasattr(self,'H'):\n self.init_h()\n\n if compute_err:\n self.ferr = np.zeros(niter)\n\n for i in range(niter):\n if compute_w:\n self.update_w()\n\n if compute_h:\n self.update_h()\n\n if compute_err:\n self.ferr[i] = self.frobenius_norm()\n self._logger.info('Iteration ' + str(i+1) + '/' + str(niter) +\n ' FN:' + str(self.ferr[i]))\n else:\n self._logger.info('Iteration ' + str(i+1) + '/' + str(niter))"} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 function that can\ncompute the novelty curve from the structural features X.", "response": "def compute_nc(X):\n \"\"\"Computes the novelty curve from the structural features.\"\"\"\n N = X.shape[0]\n # nc = np.sum(np.diff(X, axis=0), axis=1) # Difference between SF's\n\n nc = np.zeros(N)\n for i in range(N - 1):\n nc[i] = distance.euclidean(X[i, :], X[i + 1, :])\n\n # Normalize\n nc += np.abs(nc.min())\n nc /= float(nc.max())\n return nc"} {"SOURCE": "codesearchnet", "instruction": "Given the following Python 3 function, write the documentation\ndef pick_peaks(nc, L=16, offset_denom=0.1):\n offset = nc.mean() * float(offset_denom)\n th = filters.median_filter(nc, size=L) + offset\n #th = filters.gaussian_filter(nc, sigma=L/2., mode=\"nearest\") + offset\n #import pylab as plt\n #plt.plot(nc)\n #plt.plot(th)\n #plt.show()\n # th = np.ones(nc.shape[0]) * nc.mean() - 0.08\n peaks = []\n for i in range(1, nc.shape[0] - 1):\n # is it a peak?\n if nc[i - 1] < nc[i] and nc[i] > nc[i + 1]:\n # is it above the threshold?\n if nc[i] > th[i]:\n peaks.append(i)\n return peaks", "response": "Obtain peaks from a novelty curve using an adaptive threshold."} {"SOURCE": "codesearchnet", "instruction": "Can you implement a function in Python 3 that\nshifts circularly the X squre matrix in order to get a time - lag matrix.", "response": "def circular_shift(X):\n \"\"\"Shifts circularly the X squre matrix in order to get a\n time-lag matrix.\"\"\"\n N = X.shape[0]\n L = np.zeros(X.shape)\n for i in range(N):\n L[i, :] = np.asarray([X[(i + j) % N, j] for j in range(N)])\n return L"} {"SOURCE": "codesearchnet", "instruction": "Create a Python 3 function to\ntime - delay embedding with m dimensions and tau delays.", "response": "def embedded_space(X, m, tau=1):\n \"\"\"Time-delay embedding with m dimensions and tau delays.\"\"\"\n N = X.shape[0] - int(np.ceil(m))\n Y = np.zeros((N, int(np.ceil(X.shape[1] * m))))\n for i in range(N):\n # print X[i:i+m,:].flatten().shape, w, X.shape\n # print Y[i,:].shape\n rem = int((m % 1) * X.shape[1]) # Reminder for float m\n Y[i, :] = np.concatenate((X[i:i + int(m), :].flatten(),\n X[i + int(m), :rem]))\n return Y"} {"SOURCE": "codesearchnet", "instruction": "Can you create a Python 3 function that\nprocesses the flat data for the segment indeces and return a dictionary of the number of times and labels.", "response": "def processFlat(self):\n \"\"\"Main process.\n Returns\n -------\n est_idxs : np.array(N)\n Estimated times for the segment boundaries in frame indeces.\n est_labels : np.array(N-1)\n Estimated labels for the segments.\n \"\"\"\n # Structural Features params\n Mp = self.config[\"Mp_adaptive\"] # Size of the adaptive threshold for\n # peak picking\n od = self.config[\"offset_thres\"] # Offset coefficient for adaptive\n # thresholding\n\n M = self.config[\"M_gaussian\"] # Size of gaussian kernel in beats\n m = self.config[\"m_embedded\"] # Number of embedded dimensions\n k = self.config[\"k_nearest\"] # k*N-nearest neighbors for the\n # recurrence plot\n\n # Preprocess to obtain features, times, and input boundary indeces\n F = self._preprocess()\n\n # Normalize\n F = U.normalize(F, norm_type=self.config[\"bound_norm_feats\"])\n\n # Check size in case the track is too short\n if F.shape[0] > 20:\n\n if self.framesync:\n red = 0.1\n F_copy = np.copy(F)\n F = librosa.util.utils.sync(\n F.T, np.linspace(0, F.shape[0], num=F.shape[0] * red),\n pad=False).T\n\n # Emedding the feature space (i.e. shingle)\n E = embedded_space(F, m)\n # plt.imshow(E.T, interpolation=\"nearest\", aspect=\"auto\"); plt.show()\n\n # Recurrence matrix\n R = librosa.segment.recurrence_matrix(\n E.T,\n k=k * int(F.shape[0]),\n width=1, # zeros from the diagonal\n metric=\"euclidean\",\n sym=True).astype(np.float32)\n\n # Circular shift\n L = circular_shift(R)\n #plt.imshow(L, interpolation=\"nearest\", cmap=plt.get_cmap(\"binary\"))\n #plt.show()\n\n # Obtain structural features by filtering the lag matrix\n SF = gaussian_filter(L.T, M=M, axis=1)\n SF = gaussian_filter(L.T, M=1, axis=0)\n # plt.imshow(SF.T, interpolation=\"nearest\", aspect=\"auto\")\n #plt.show()\n\n # Compute the novelty curve\n nc = compute_nc(SF)\n\n # Find peaks in the novelty curve\n est_bounds = pick_peaks(nc, L=Mp, offset_denom=od)\n\n # Re-align embedded space\n est_bounds = np.asarray(est_bounds) + int(np.ceil(m / 2.))\n\n if self.framesync:\n est_bounds /= red\n F = F_copy\n else:\n est_bounds = []\n\n # Add first and last frames\n est_idxs = np.concatenate(([0], est_bounds, [F.shape[0] - 1]))\n est_idxs = np.unique(est_idxs)\n\n assert est_idxs[0] == 0 and est_idxs[-1] == F.shape[0] - 1\n\n # Empty labels\n est_labels = np.ones(len(est_idxs) - 1) * - 1\n\n # Post process estimations\n est_idxs, est_labels = self._postprocess(est_idxs, est_labels)\n\n # plt.figure(1)\n # plt.plot(nc);\n # [plt.axvline(p, color=\"m\", ymin=.6) for p in est_bounds]\n # [plt.axvline(b, color=\"b\", ymax=.6, ymin=.3) for b in brian_bounds]\n # [plt.axvline(b, color=\"g\", ymax=.3) for b in ann_bounds]\n # plt.show()\n\n return est_idxs, est_labels"} {"SOURCE": "codesearchnet", "instruction": "Can you tell what is the following Python 3 function doing\ndef _plot_formatting(title, est_file, algo_ids, last_bound, N, output_file):\n import matplotlib.pyplot as plt\n if title is None:\n title = os.path.basename(est_file).split(\".\")[0]\n plt.title(title)\n plt.yticks(np.arange(0, 1, 1 / float(N)) + 1 / (float(N) * 2))\n plt.gcf().subplots_adjust(bottom=0.22)\n plt.gca().set_yticklabels(algo_ids)\n plt.xlabel(\"Time (seconds)\")\n plt.xlim((0, last_bound))\n plt.tight_layout()\n if output_file is not None:\n plt.savefig(output_file)\n plt.show()", "response": "Formats the plot with the correct axis labels title ticks and\n so on."} {"SOURCE": "codesearchnet", "instruction": "Explain what the following Python 3 code does\ndef plot_boundaries(all_boundaries, est_file, algo_ids=None, title=None,\n output_file=None):\n \"\"\"Plots all the boundaries.\n\n Parameters\n ----------\n all_boundaries: list\n A list of np.arrays containing the times of the boundaries, one array\n for each algorithm.\n est_file: str\n Path to the estimated file (JSON file)\n algo_ids : list\n List of algorithm ids to to read boundaries from.\n If None, all algorithm ids are read.\n title : str\n Title of the plot. If None, the name of the file is printed instead.\n \"\"\"\n import matplotlib.pyplot as plt\n N = len(all_boundaries) # Number of lists of boundaries\n if algo_ids is None:\n algo_ids = io.get_algo_ids(est_file)\n\n # Translate ids\n for i, algo_id in enumerate(algo_ids):\n algo_ids[i] = translate_ids[algo_id]\n algo_ids = [\"GT\"] + algo_ids\n\n figsize = (6, 4)\n plt.figure(1, figsize=figsize, dpi=120, facecolor='w', edgecolor='k')\n for i, boundaries in enumerate(all_boundaries):\n color = \"b\"\n if i == 0:\n color = \"g\"\n for b in boundaries:\n plt.axvline(b, i / float(N), (i + 1) / float(N), color=color)\n plt.axhline(i / float(N), color=\"k\", linewidth=1)\n\n # Format plot\n _plot_formatting(title, est_file, algo_ids, all_boundaries[0][-1], N,\n output_file)", "response": "Plots all the boundaries."} {"SOURCE": "codesearchnet", "instruction": "Create a Python 3 function to\nplot all the labels of the boundaries.", "response": "def plot_labels(all_labels, gt_times, est_file, algo_ids=None, title=None,\n output_file=None):\n \"\"\"Plots all the labels.\n\n Parameters\n ----------\n all_labels: list\n A list of np.arrays containing the labels of the boundaries, one array\n for each algorithm.\n gt_times: np.array\n Array with the ground truth boundaries.\n est_file: str\n Path to the estimated file (JSON file)\n algo_ids : list\n List of algorithm ids to to read boundaries from.\n If None, all algorithm ids are read.\n title : str\n Title of the plot. If None, the name of the file is printed instead.\n \"\"\"\n import matplotlib.pyplot as plt\n N = len(all_labels) # Number of lists of labels\n if algo_ids is None:\n algo_ids = io.get_algo_ids(est_file)\n\n # Translate ids\n for i, algo_id in enumerate(algo_ids):\n algo_ids[i] = translate_ids[algo_id]\n algo_ids = [\"GT\"] + algo_ids\n\n # Index the labels to normalize them\n for i, labels in enumerate(all_labels):\n all_labels[i] = mir_eval.util.index_labels(labels)[0]\n\n # Get color map\n cm = plt.get_cmap('gist_rainbow')\n max_label = max(max(labels) for labels in all_labels)\n\n # To intervals\n gt_inters = utils.times_to_intervals(gt_times)\n\n # Plot labels\n figsize = (6, 4)\n plt.figure(1, figsize=figsize, dpi=120, facecolor='w', edgecolor='k')\n for i, labels in enumerate(all_labels):\n for label, inter in zip(labels, gt_inters):\n plt.axvspan(inter[0], inter[1], ymin=i / float(N),\n ymax=(i + 1) / float(N), alpha=0.6,\n color=cm(label / float(max_label)))\n plt.axhline(i / float(N), color=\"k\", linewidth=1)\n\n # Draw the boundary lines\n for bound in gt_times:\n plt.axvline(bound, color=\"g\")\n\n # Format plot\n _plot_formatting(title, est_file, algo_ids, gt_times[-1], N,\n output_file)"} {"SOURCE": "codesearchnet", "instruction": "Can you write a function in Python 3 where it\nplots the results of one track with ground truth if it exists.", "response": "def plot_one_track(file_struct, est_times, est_labels, boundaries_id, labels_id,\n title=None):\n \"\"\"Plots the results of one track, with ground truth if it exists.\"\"\"\n import matplotlib.pyplot as plt\n # Set up the boundaries id\n bid_lid = boundaries_id\n if labels_id is not None:\n bid_lid += \" + \" + labels_id\n try:\n # Read file\n jam = jams.load(file_struct.ref_file)\n ann = jam.search(namespace='segment_.*')[0]\n ref_inters, ref_labels = ann.to_interval_values()\n\n # To times\n ref_times = utils.intervals_to_times(ref_inters)\n all_boundaries = [ref_times, est_times]\n all_labels = [ref_labels, est_labels]\n algo_ids = [\"GT\", bid_lid]\n except:\n logging.warning(\"No references found in %s. Not plotting groundtruth\"\n % file_struct.ref_file)\n all_boundaries = [est_times]\n all_labels = [est_labels]\n algo_ids = [bid_lid]\n\n N = len(all_boundaries)\n\n # Index the labels to normalize them\n for i, labels in enumerate(all_labels):\n all_labels[i] = mir_eval.util.index_labels(labels)[0]\n\n # Get color map\n cm = plt.get_cmap('gist_rainbow')\n max_label = max(max(labels) for labels in all_labels)\n\n figsize = (8, 4)\n plt.figure(1, figsize=figsize, dpi=120, facecolor='w', edgecolor='k')\n for i, boundaries in enumerate(all_boundaries):\n color = \"b\"\n if i == 0:\n color = \"g\"\n for b in boundaries:\n plt.axvline(b, i / float(N), (i + 1) / float(N), color=color)\n if labels_id is not None:\n labels = all_labels[i]\n inters = utils.times_to_intervals(boundaries)\n for label, inter in zip(labels, inters):\n plt.axvspan(inter[0], inter[1], ymin=i / float(N),\n ymax=(i + 1) / float(N), alpha=0.6,\n color=cm(label / float(max_label)))\n plt.axhline(i / float(N), color=\"k\", linewidth=1)\n\n # Format plot\n _plot_formatting(title, os.path.basename(file_struct.audio_file), algo_ids,\n all_boundaries[0][-1], N, None)"} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 script for\nplotting a given tree containing hierarchical segmentation.", "response": "def plot_tree(T, res=None, title=None, cmap_id=\"Pastel2\"):\n \"\"\"Plots a given tree, containing hierarchical segmentation.\n\n Parameters\n ----------\n T: mir_eval.segment.tree\n A tree object containing the hierarchical segmentation.\n res: float\n Frame-rate resolution of the tree (None to use seconds).\n title: str\n Title for the plot. `None` for no title.\n cmap_id: str\n Color Map ID\n \"\"\"\n import matplotlib.pyplot as plt\n def round_time(t, res=0.1):\n v = int(t / float(res)) * res\n return v\n\n # Get color map\n cmap = plt.get_cmap(cmap_id)\n\n # Get segments by level\n level_bounds = []\n for level in T.levels:\n if level == \"root\":\n continue\n segments = T.get_segments_in_level(level)\n level_bounds.append(segments)\n\n # Plot axvspans for each segment\n B = float(len(level_bounds))\n #plt.figure(figsize=figsize)\n for i, segments in enumerate(level_bounds):\n labels = utils.segment_labels_to_floats(segments)\n for segment, label in zip(segments, labels):\n #print i, label, cmap(label)\n if res is None:\n start = segment.start\n end = segment.end\n xlabel = \"Time (seconds)\"\n else:\n start = int(round_time(segment.start, res=res) / res)\n end = int(round_time(segment.end, res=res) / res)\n xlabel = \"Time (frames)\"\n plt.axvspan(start, end,\n ymax=(len(level_bounds) - i) / B,\n ymin=(len(level_bounds) - i - 1) / B,\n facecolor=cmap(label))\n\n # Plot labels\n L = float(len(T.levels) - 1)\n plt.yticks(np.linspace(0, (L - 1) / L, num=L) + 1 / L / 2.,\n T.levels[1:][::-1])\n plt.xlabel(xlabel)\n if title is not None:\n plt.title(title)\n plt.gca().set_xlim([0, end])"} {"SOURCE": "codesearchnet", "instruction": "Can you tell what is the following Python 3 function doing\ndef get_feat_segments(F, bound_idxs):\n # Make sure bound_idxs are not empty\n assert len(bound_idxs) > 0, \"Boundaries can't be empty\"\n\n # Make sure that boundaries are sorted\n bound_idxs = np.sort(bound_idxs)\n\n # Make sure we're not out of bounds\n assert bound_idxs[0] >= 0 and bound_idxs[-1] < F.shape[0], \\\n \"Boundaries are not correct for the given feature dimensions.\"\n\n # Obtain the segments\n feat_segments = []\n for i in range(len(bound_idxs) - 1):\n feat_segments.append(F[bound_idxs[i]:bound_idxs[i + 1], :])\n return feat_segments", "response": "Returns a set of segments defined by the bound_idxs."} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 script for\ngiving a list of feature segments return a list of 2D - Fourier MagnitudeCoefs with the maximum size as main size and zero pad the rest.", "response": "def feat_segments_to_2dfmc_max(feat_segments, offset=4):\n \"\"\"From a list of feature segments, return a list of 2D-Fourier Magnitude\n Coefs using the maximum segment size as main size and zero pad the rest.\n\n Parameters\n ----------\n feat_segments: list\n List of segments, one for each boundary interval.\n offset: int >= 0\n Number of frames to ignore from beginning and end of each segment.\n\n Returns\n -------\n fmcs: np.ndarray\n Tensor containing the 2D-FMC matrices, one matrix per segment.\n \"\"\"\n if len(feat_segments) == 0:\n return []\n\n # Get maximum segment size\n max_len = max([feat_segment.shape[0] for feat_segment in feat_segments])\n\n fmcs = []\n for feat_segment in feat_segments:\n # Zero pad if needed\n X = np.zeros((max_len, feat_segment.shape[1]))\n\n # Remove a set of frames in the beginning an end of the segment\n if feat_segment.shape[0] <= offset or offset == 0:\n X[:feat_segment.shape[0], :] = feat_segment\n else:\n X[:feat_segment.shape[0] - offset, :] = \\\n feat_segment[offset // 2:-offset // 2, :]\n\n # Compute the 2D-FMC\n try:\n fmcs.append(utils2d.compute_ffmc2d(X))\n except:\n logging.warning(\"Couldn't compute the 2D Fourier Transform\")\n fmcs.append(np.zeros((X.shape[0] * X.shape[1]) // 2 + 1))\n\n # Normalize\n # fmcs[-1] = fmcs[-1] / float(fmcs[-1].max())\n\n return np.asarray(fmcs)"} {"SOURCE": "codesearchnet", "instruction": "Can you tell what is the following Python 3 function doing\ndef compute_similarity(F, bound_idxs, dirichlet=False, xmeans=False, k=5,\n offset=4):\n \"\"\"Main function to compute the segment similarity of file file_struct.\n\n Parameters\n ----------\n F: np.ndarray\n Matrix containing one feature vector per row.\n bound_idxs: np.ndarray\n Array with the indeces of the segment boundaries.\n dirichlet: boolean\n Whether to use the dirichlet estimator of the number of unique labels.\n xmeans: boolean\n Whether to use the xmeans estimator of the number of unique labels.\n k: int > 0\n If the other two predictors are `False`, use fixed number of labels.\n offset: int >= 0\n Number of frames to ignore from beginning and end of each segment.\n\n Returns\n -------\n labels_est: np.ndarray\n Estimated labels, containing integer identifiers.\n \"\"\"\n # Get the feature segments\n feat_segments = get_feat_segments(F, bound_idxs)\n\n # Get the 2D-FMCs segments\n fmcs = feat_segments_to_2dfmc_max(feat_segments, offset)\n if len(fmcs) == 0:\n return np.arange(len(bound_idxs) - 1)\n\n # Compute the labels using kmeans\n if dirichlet:\n k_init = np.min([fmcs.shape[0], k])\n # Only compute the dirichlet method if the fmc shape is small enough\n if fmcs.shape[1] > 500:\n labels_est = compute_labels_kmeans(fmcs, k=k)\n else:\n dpgmm = mixture.DPGMM(n_components=k_init, covariance_type='full')\n # dpgmm = mixture.VBGMM(n_components=k_init, covariance_type='full')\n dpgmm.fit(fmcs)\n k = len(dpgmm.means_)\n labels_est = dpgmm.predict(fmcs)\n # print(\"Estimated with Dirichlet Process:\", k)\n if xmeans:\n xm = XMeans(fmcs, plot=False)\n k = xm.estimate_K_knee(th=0.01, maxK=8)\n labels_est = compute_labels_kmeans(fmcs, k=k)\n # print(\"Estimated with Xmeans:\", k)\n else:\n labels_est = compute_labels_kmeans(fmcs, k=k)\n\n return labels_est", "response": "Compute the similarity of a file with a set of features."} {"SOURCE": "codesearchnet", "instruction": "How would you explain what the following Python 3 function does\ndef processFlat(self):\n # Preprocess to obtain features, times, and input boundary indeces\n F = self._preprocess()\n\n # Normalize\n F = U.normalize(F, norm_type=self.config[\"label_norm_feats\"],\n floor=self.config[\"label_norm_floor\"],\n min_db=self.config[\"label_norm_min_db\"])\n\n # Find the labels using 2D-FMCs\n est_labels = compute_similarity(F, self.in_bound_idxs,\n dirichlet=self.config[\"dirichlet\"],\n xmeans=self.config[\"xmeans\"],\n k=self.config[\"k\"],\n offset=self.config[\"2dfmc_offset\"])\n\n # Post process estimations\n self.in_bound_idxs, est_labels = self._postprocess(self.in_bound_idxs,\n est_labels)\n\n return self.in_bound_idxs, est_labels", "response": "Main function for processing the flat indeces and labels."} {"SOURCE": "codesearchnet", "instruction": "Can you create a Python 3 function that\nfits the OLDA model to the given data and labels.", "response": "def fit(self, X, Y):\n '''Fit the OLDA model\n\n Parameters\n ----------\n X : array-like, shape [n_samples]\n Training data: each example is an n_features-by-* data array\n\n Y : array-like, shape [n_samples]\n Training labels: each label is an array of change-points\n (eg, a list of segment boundaries)\n\n Returns\n -------\n self : object\n '''\n \n # Re-initialize the scatter matrices\n self.scatter_ordinal_ = None\n self.scatter_within_ = None\n \n # Reduce to partial-fit\n self.partial_fit(X, Y)\n \n return self"} {"SOURCE": "codesearchnet", "instruction": "Can you tell what is the following Python 3 function doing\ndef partial_fit(self, X, Y):\n '''Partial-fit the OLDA model\n\n Parameters\n ----------\n X : array-like, shape [n_samples]\n Training data: each example is an n_features-by-* data array\n\n Y : array-like, shape [n_samples]\n Training labels: each label is an array of change-points\n (eg, a list of segment boundaries)\n\n Returns\n -------\n self : object\n '''\n \n for (xi, yi) in itertools.izip(X, Y):\n \n prev_mean = None\n prev_length = None\n \n if self.scatter_within_ is None:\n # First round: initialize\n d, n = xi.shape\n \n if yi[0] > 0:\n yi = np.concatenate([np.array([0]), yi])\n if yi[-1] < n:\n yi = np.concatenate([yi, np.array([n])])\n \n self.scatter_within_ = self.sigma * np.eye(d)\n self.scatter_ordinal_ = np.zeros(d)\n \n \n # iterate over segments\n for (seg_start, seg_end) in zip(yi[:-1], yi[1:]):\n \n seg_length = seg_end - seg_start\n \n if seg_length < 2:\n continue\n\n seg_mean = np.mean(xi[:, seg_start:seg_end], axis=1, keepdims=True)\n seg_cov = np.cov(xi[:, seg_start:seg_end]) \n self.scatter_within_ = self.scatter_within_ + seg_length * seg_cov\n \n \n if prev_mean is not None:\n diff_ord = seg_mean - (prev_length * prev_mean + seg_length * seg_mean) / (prev_length + seg_length)\n self.scatter_ordinal_ = self.scatter_ordinal_ + seg_length * np.dot(diff_ord, diff_ord.T)\n \n diff_ord = prev_mean - (prev_length * prev_mean + seg_length * seg_mean) / (prev_length + seg_length)\n self.scatter_ordinal_ = self.scatter_ordinal_ + prev_length * np.dot(diff_ord, diff_ord.T)\n\n prev_mean = seg_mean\n prev_length = seg_length\n \n e_vals, e_vecs = scipy.linalg.eig(self.scatter_ordinal_, self.scatter_within_)\n self.e_vals_ = e_vals\n self.e_vecs_ = e_vecs\n self.components_ = e_vecs.T\n return self", "response": "Partial fit the OLDA model."} {"SOURCE": "codesearchnet", "instruction": "Given the following Python 3 function, write the documentation\ndef compute_features(self):\n linear_cqt = np.abs(librosa.cqt(\n self._audio, sr=self.sr, hop_length=self.hop_length,\n n_bins=self.n_bins, norm=self.norm, filter_scale=self.filter_scale)\n ) ** 2\n cqt = librosa.amplitude_to_db(linear_cqt, ref=self.ref_power).T\n return cqt", "response": "Compute the features for a give\n time frame and beat."} {"SOURCE": "codesearchnet", "instruction": "Create a Python 3 function for\ncomputing the features for a give time frame and beat.", "response": "def compute_features(self):\n \"\"\"Actual implementation of the features.\n\n Returns\n -------\n mfcc: np.array(N, F)\n The features, each row representing a feature vector for a give\n time frame/beat.\n \"\"\"\n S = librosa.feature.melspectrogram(self._audio,\n sr=self.sr,\n n_fft=self.n_fft,\n hop_length=self.hop_length,\n n_mels=self.n_mels)\n log_S = librosa.amplitude_to_db(S, ref=self.ref_power)\n mfcc = librosa.feature.mfcc(S=log_S, n_mfcc=self.n_mfcc).T\n return mfcc"} {"SOURCE": "codesearchnet", "instruction": "How would you code a function in Python 3 to\ncompute the features for a give time frame and beat.", "response": "def compute_features(self):\n \"\"\"Actual implementation of the features.\n\n Returns\n -------\n pcp: np.array(N, F)\n The features, each row representing a feature vector for a give\n time frame/beat.\n \"\"\"\n audio_harmonic, _ = self.compute_HPSS()\n pcp_cqt = np.abs(librosa.hybrid_cqt(audio_harmonic,\n sr=self.sr,\n hop_length=self.hop_length,\n n_bins=self.n_bins,\n norm=self.norm,\n fmin=self.f_min)) ** 2\n pcp = librosa.feature.chroma_cqt(C=pcp_cqt,\n sr=self.sr,\n hop_length=self.hop_length,\n n_octaves=self.n_octaves,\n fmin=self.f_min).T\n return pcp"} {"SOURCE": "codesearchnet", "instruction": "How would you code a function in Python 3 to\ncompute the features for a give time frame and beat.", "response": "def compute_features(self):\n \"\"\"Actual implementation of the features.\n\n Returns\n -------\n tonnetz: np.array(N, F)\n The features, each row representing a feature vector for a give\n time frame/beat.\n \"\"\"\n pcp = PCP(self.file_struct, self.feat_type, self.sr, self.hop_length,\n self.n_bins, self.norm, self.f_min, self.n_octaves).features\n tonnetz = librosa.feature.tonnetz(chroma=pcp.T).T\n return tonnetz"} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 function that can\ncompute the features for a give time frame.", "response": "def compute_features(self):\n \"\"\"Actual implementation of the features.\n\n Returns\n -------\n tempogram: np.array(N, F)\n The features, each row representing a feature vector for a give\n time frame/beat.\n \"\"\"\n return librosa.feature.tempogram(self._audio, sr=self.sr,\n hop_length=self.hop_length,\n win_length=self.win_length).T"} {"SOURCE": "codesearchnet", "instruction": "Make a summary of the following Python 3 code\ndef read_estimations(est_file, boundaries_id, labels_id=None, **params):\n # Open file and read jams\n jam = jams.load(est_file)\n\n # Find correct estimation\n est = find_estimation(jam, boundaries_id, labels_id, params)\n if est is None:\n raise NoEstimationsError(\"No estimations for file: %s\" % est_file)\n\n # Get data values\n all_boundaries, all_labels = est.to_interval_values()\n\n if params[\"hier\"]:\n hier_bounds = defaultdict(list)\n hier_labels = defaultdict(list)\n for bounds, labels in zip(all_boundaries, all_labels):\n level = labels[\"level\"]\n hier_bounds[level].append(bounds)\n hier_labels[level].append(labels[\"label\"])\n # Order\n all_boundaries = []\n all_labels = []\n for key in sorted(list(hier_bounds.keys())):\n all_boundaries.append(np.asarray(hier_bounds[key]))\n all_labels.append(np.asarray(hier_labels[key]))\n\n return all_boundaries, all_labels", "response": "Reads the estimations of an algorithm from a JAMS file and returns the boundaries and labels."} {"SOURCE": "codesearchnet", "instruction": "Implement a Python 3 function for\nreading the boundary times and labels of a single audio file.", "response": "def read_references(audio_path, annotator_id=0):\n \"\"\"Reads the boundary times and the labels.\n\n Parameters\n ----------\n audio_path : str\n Path to the audio file\n\n Returns\n -------\n ref_times : list\n List of boundary times\n ref_labels : list\n List of labels\n\n Raises\n ------\n IOError: if `audio_path` doesn't exist.\n \"\"\"\n # Dataset path\n ds_path = os.path.dirname(os.path.dirname(audio_path))\n\n # Read references\n jam_path = os.path.join(ds_path, ds_config.references_dir,\n os.path.basename(audio_path)[:-4] +\n ds_config.references_ext)\n\n jam = jams.load(jam_path, validate=False)\n ann = jam.search(namespace='segment_.*')[annotator_id]\n ref_inters, ref_labels = ann.to_interval_values()\n\n # Intervals to times\n ref_times = utils.intervals_to_times(ref_inters)\n\n return ref_times, ref_labels"} {"SOURCE": "codesearchnet", "instruction": "Make a summary of the following Python 3 code\ndef align_times(times, frames):\n dist = np.minimum.outer(times, frames)\n bound_frames = np.argmax(np.maximum(0, dist), axis=1)\n aligned_times = np.unique(bound_frames)\n return aligned_times", "response": "Aligns the times to the closest frame times."} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 script for\nfinding the correct estimation from all the estimations contained in a JAMS file given the specified parameters.", "response": "def find_estimation(jam, boundaries_id, labels_id, params):\n \"\"\"Finds the correct estimation from all the estimations contained in a\n JAMS file given the specified arguments.\n\n Parameters\n ----------\n jam : jams.JAMS\n JAMS object.\n boundaries_id : str\n Identifier of the algorithm used to compute the boundaries.\n labels_id : str\n Identifier of the algorithm used to compute the labels.\n params : dict\n Additional search parameters. E.g. {\"feature\" : \"pcp\"}.\n\n Returns\n -------\n ann : jams.Annotation\n Found estimation.\n `None` if it couldn't be found.\n \"\"\"\n # Use handy JAMS search interface\n namespace = \"multi_segment\" if params[\"hier\"] else \"segment_open\"\n # TODO: This is a workaround to issue in JAMS. Should be\n # resolved in JAMS 0.2.3, but for now, this works too.\n ann = jam.search(namespace=namespace).\\\n search(**{\"Sandbox.boundaries_id\": boundaries_id}).\\\n search(**{\"Sandbox.labels_id\": lambda x:\n (isinstance(x, six.string_types) and\n re.match(labels_id, x) is not None) or x is None})\n for key, val in zip(params.keys(), params.values()):\n if isinstance(val, six.string_types):\n ann = ann.search(**{\"Sandbox.%s\" % key: val})\n else:\n ann = ann.search(**{\"Sandbox.%s\" % key: lambda x: x == val})\n\n # Check estimations found\n if len(ann) > 1:\n logging.warning(\"More than one estimation with same parameters.\")\n\n if len(ann) > 0:\n ann = ann[0]\n\n # If we couldn't find anything, let's return None\n if not ann:\n ann = None\n\n return ann"} {"SOURCE": "codesearchnet", "instruction": "How would you code a function in Python 3 to\nsave the segment estimations in a JAMS file.", "response": "def save_estimations(file_struct, times, labels, boundaries_id, labels_id,\n **params):\n \"\"\"Saves the segment estimations in a JAMS file.\n\n Parameters\n ----------\n file_struct : FileStruct\n Object with the different file paths of the current file.\n times : np.array or list\n Estimated boundary times.\n If `list`, estimated hierarchical boundaries.\n labels : np.array(N, 2)\n Estimated labels (None in case we are only storing boundary\n evaluations).\n boundaries_id : str\n Boundary algorithm identifier.\n labels_id : str\n Labels algorithm identifier.\n params : dict\n Dictionary with additional parameters for both algorithms.\n \"\"\"\n # Remove features if they exist\n params.pop(\"features\", None)\n\n # Get duration\n dur = get_duration(file_struct.features_file)\n\n # Convert to intervals and sanity check\n if 'numpy' in str(type(times)):\n # Flat check\n inters = utils.times_to_intervals(times)\n assert len(inters) == len(labels), \"Number of boundary intervals \" \\\n \"(%d) and labels (%d) do not match\" % (len(inters), len(labels))\n # Put into lists to simplify the writing process later\n inters = [inters]\n labels = [labels]\n else:\n # Hierarchical check\n inters = []\n for level in range(len(times)):\n est_inters = utils.times_to_intervals(times[level])\n inters.append(est_inters)\n assert len(inters[level]) == len(labels[level]), \\\n \"Number of boundary intervals (%d) and labels (%d) do not \" \\\n \"match in level %d\" % (len(inters[level]), len(labels[level]),\n level)\n\n # Create new estimation\n namespace = \"multi_segment\" if params[\"hier\"] else \"segment_open\"\n ann = jams.Annotation(namespace=namespace)\n\n # Find estimation in file\n if os.path.isfile(file_struct.est_file):\n jam = jams.load(file_struct.est_file, validate=False)\n curr_ann = find_estimation(jam, boundaries_id, labels_id, params)\n if curr_ann is not None:\n curr_ann.data = ann.data # cleanup all data\n ann = curr_ann # This will overwrite the existing estimation\n else:\n jam.annotations.append(ann)\n else:\n # Create new JAMS if it doesn't exist\n jam = jams.JAMS()\n jam.file_metadata.duration = dur\n jam.annotations.append(ann)\n\n # Save metadata and parameters\n ann.annotation_metadata.version = msaf.__version__\n ann.annotation_metadata.data_source = \"MSAF\"\n sandbox = {}\n sandbox[\"boundaries_id\"] = boundaries_id\n sandbox[\"labels_id\"] = labels_id\n sandbox[\"timestamp\"] = \\\n datetime.datetime.today().strftime(\"%Y/%m/%d %H:%M:%S\")\n for key in params:\n sandbox[key] = params[key]\n ann.sandbox = sandbox\n\n # Save actual data\n for i, (level_inters, level_labels) in enumerate(zip(inters, labels)):\n for bound_inter, label in zip(level_inters, level_labels):\n dur = float(bound_inter[1]) - float(bound_inter[0])\n label = chr(int(label) + 65)\n if params[\"hier\"]:\n value = {\"label\": label, \"level\": i}\n else:\n value = label\n ann.append(time=bound_inter[0], duration=dur,\n value=value)\n\n # Write results\n jam.save(file_struct.est_file)"} {"SOURCE": "codesearchnet", "instruction": "Can you generate the documentation for the following Python 3 function\ndef get_all_boundary_algorithms():\n algo_ids = []\n for name in msaf.algorithms.__all__:\n module = eval(msaf.algorithms.__name__ + \".\" + name)\n if module.is_boundary_type:\n algo_ids.append(module.algo_id)\n return algo_ids", "response": "Gets all the possible boundary algorithms in MSAF."} {"SOURCE": "codesearchnet", "instruction": "Can you tell what is the following Python 3 function doing\ndef get_all_label_algorithms():\n algo_ids = []\n for name in msaf.algorithms.__all__:\n module = eval(msaf.algorithms.__name__ + \".\" + name)\n if module.is_label_type:\n algo_ids.append(module.algo_id)\n return algo_ids", "response": "Gets all the possible label algorithms in MSAF."} {"SOURCE": "codesearchnet", "instruction": "Here you have a function in Python 3, explain what it does\ndef get_configuration(feature, annot_beats, framesync, boundaries_id,\n labels_id):\n \"\"\"Gets the configuration dictionary from the current parameters of the\n algorithms to be evaluated.\"\"\"\n config = {}\n config[\"annot_beats\"] = annot_beats\n config[\"feature\"] = feature\n config[\"framesync\"] = framesync\n bound_config = {}\n if boundaries_id != \"gt\":\n bound_config = \\\n eval(msaf.algorithms.__name__ + \".\" + boundaries_id).config\n config.update(bound_config)\n if labels_id is not None:\n label_config = \\\n eval(msaf.algorithms.__name__ + \".\" + labels_id).config\n\n # Make sure we don't have parameter name duplicates\n if labels_id != boundaries_id:\n overlap = set(bound_config.keys()). \\\n intersection(set(label_config.keys()))\n assert len(overlap) == 0, \\\n \"Parameter %s must not exist both in %s and %s algorithms\" % \\\n (overlap, boundaries_id, labels_id)\n config.update(label_config)\n return config", "response": "Gets the configuration dictionary from the current parameters of the the\n algorithms to be evaluated."} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 script to\nget the files of the given dataset.", "response": "def get_dataset_files(in_path):\n \"\"\"Gets the files of the given dataset.\"\"\"\n # Get audio files\n audio_files = []\n for ext in ds_config.audio_exts:\n audio_files += glob.glob(\n os.path.join(in_path, ds_config.audio_dir, \"*\" + ext))\n\n # Make sure directories exist\n utils.ensure_dir(os.path.join(in_path, ds_config.features_dir))\n utils.ensure_dir(os.path.join(in_path, ds_config.estimations_dir))\n utils.ensure_dir(os.path.join(in_path, ds_config.references_dir))\n\n # Get the file structs\n file_structs = []\n for audio_file in audio_files:\n file_structs.append(FileStruct(audio_file))\n\n # Sort by audio file name\n file_structs = sorted(file_structs,\n key=lambda file_struct: file_struct.audio_file)\n\n return file_structs"} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 function for\nreading hierarchical references from a jams file.", "response": "def read_hier_references(jams_file, annotation_id=0, exclude_levels=[]):\n \"\"\"Reads hierarchical references from a jams file.\n\n Parameters\n ----------\n jams_file : str\n Path to the jams file.\n annotation_id : int > 0\n Identifier of the annotator to read from.\n exclude_levels: list\n List of levels to exclude. Empty list to include all levels.\n\n Returns\n -------\n hier_bounds : list\n List of the segment boundary times in seconds for each level.\n hier_labels : list\n List of the segment labels for each level.\n hier_levels : list\n List of strings for the level identifiers.\n \"\"\"\n hier_bounds = []\n hier_labels = []\n hier_levels = []\n jam = jams.load(jams_file)\n namespaces = [\"segment_salami_upper\", \"segment_salami_function\",\n \"segment_open\", \"segment_tut\", \"segment_salami_lower\"]\n\n # Remove levels if needed\n for exclude in exclude_levels:\n if exclude in namespaces:\n namespaces.remove(exclude)\n\n # Build hierarchy references\n for ns in namespaces:\n ann = jam.search(namespace=ns)\n if not ann:\n continue\n ref_inters, ref_labels = ann[annotation_id].to_interval_values()\n hier_bounds.append(utils.intervals_to_times(ref_inters))\n hier_labels.append(ref_labels)\n hier_levels.append(ns)\n\n return hier_bounds, hier_labels, hier_levels"} {"SOURCE": "codesearchnet", "instruction": "Explain what the following Python 3 code does\ndef get_duration(features_file):\n with open(features_file) as f:\n feats = json.load(f)\n return float(feats[\"globals\"][\"dur\"])", "response": "Reads the duration of a given features file."} {"SOURCE": "codesearchnet", "instruction": "Explain what the following Python 3 code does\ndef write_mirex(times, labels, out_file):\n inters = msaf.utils.times_to_intervals(times)\n assert len(inters) == len(labels)\n out_str = \"\"\n for inter, label in zip(inters, labels):\n out_str += \"%.3f\\t%.3f\\t%s\\n\" % (inter[0], inter[1], label)\n with open(out_file, \"w\") as f:\n f.write(out_str[:-1])", "response": "Writes the results to a MIREX file."} {"SOURCE": "codesearchnet", "instruction": "Explain what the following Python 3 code does\ndef _get_dataset_file(self, dir, ext):\n audio_file_ext = \".\" + self.audio_file.split(\".\")[-1]\n base_file = os.path.basename(self.audio_file).replace(\n audio_file_ext, ext)\n return os.path.join(self.ds_path, dir, base_file)", "response": "Gets the desired dataset file."} {"SOURCE": "codesearchnet", "instruction": "Explain what the following Python 3 code does\ndef processFlat(self):\n # Preprocess to obtain features (array(n_frames, n_features))\n F = self._preprocess()\n\n # Do something with the default parameters\n # (these are defined in the in the config.py file).\n assert self.config[\"my_param1\"] == 1.0\n\n # Identify boundaries in frame indeces with the new algorithm\n my_bounds = np.array([0, F.shape[0] - 1])\n\n # Label the segments (use -1 to have empty segments)\n my_labels = np.ones(len(my_bounds) - 1) * -1\n\n # Post process estimations\n est_idxs, est_labels = self._postprocess(my_bounds, my_labels)\n\n # We're done!\n return est_idxs, est_labels", "response": "Main function for processing the flat data."} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 script to\nload a ground - truth segmentation and align times to the nearest detected beats.", "response": "def align_segmentation(beat_times, song):\n '''Load a ground-truth segmentation, and align times to the nearest\n detected beats.\n\n Arguments:\n beat_times -- array\n song -- path to the audio file\n\n Returns:\n segment_beats -- array\n beat-aligned segment boundaries\n\n segment_times -- array\n true segment times\n\n segment_labels -- array\n list of segment labels\n '''\n try:\n segment_times, segment_labels = msaf.io.read_references(song)\n except:\n return None, None, None\n segment_times = np.asarray(segment_times)\n\n # Map to intervals\n segment_intervals = msaf.utils.times_to_intervals(segment_times)\n\n # Map beats to intervals\n beat_intervals = np.asarray(zip(beat_times[:-1], beat_times[1:]))\n\n # Map beats to segments\n beat_segment_ids = librosa.util.match_intervals(beat_intervals,\n segment_intervals)\n\n segment_beats = []\n segment_times_out = []\n segment_labels_out = []\n\n # print segment_times, beat_segment_ids, len(beat_times),\n # len(beat_segment_ids)\n for i in range(segment_times.shape[0]):\n hits = np.argwhere(beat_segment_ids == i)\n if len(hits) > 0 and i < len(segment_intervals) and \\\n i < len(segment_labels):\n segment_beats.extend(hits[0])\n segment_times_out.append(segment_intervals[i, :])\n segment_labels_out.append(segment_labels[i])\n\n # Pull out the segment start times\n segment_beats = list(segment_beats)\n # segment_times_out = np.asarray(\n # segment_times_out)[:, 0].squeeze().reshape((-1, 1))\n\n # if segment_times_out.ndim == 0:\n # segment_times_out = segment_times_out[np.newaxis]\n segment_times_out = segment_times\n\n return segment_beats, segment_times_out, segment_labels_out"} {"SOURCE": "codesearchnet", "instruction": "Implement a function in Python 3 to\nestimate the beats using librosa.", "response": "def estimate_beats(self):\n \"\"\"Estimates the beats using librosa.\n\n Returns\n -------\n times: np.array\n Times of estimated beats in seconds.\n frames: np.array\n Frame indeces of estimated beats.\n \"\"\"\n # Compute harmonic-percussive source separation if needed\n if self._audio_percussive is None:\n self._audio_harmonic, self._audio_percussive = self.compute_HPSS()\n\n # Compute beats\n tempo, frames = librosa.beat.beat_track(\n y=self._audio_percussive, sr=self.sr,\n hop_length=self.hop_length)\n\n # To times\n times = librosa.frames_to_time(frames, sr=self.sr,\n hop_length=self.hop_length)\n\n # TODO: Is this really necessary?\n if len(times) > 0 and times[0] == 0:\n times = times[1:]\n frames = frames[1:]\n\n return times, frames"} {"SOURCE": "codesearchnet", "instruction": "Given the following Python 3 function, write the documentation\ndef read_ann_beats(self):\n times, frames = (None, None)\n\n # Read annotations if they exist in correct folder\n if os.path.isfile(self.file_struct.ref_file):\n try:\n jam = jams.load(self.file_struct.ref_file)\n except TypeError:\n logging.warning(\n \"Can't read JAMS file %s. Maybe it's not \"\n \"compatible with current JAMS version?\" %\n self.file_struct.ref_file)\n return times, frames\n beat_annot = jam.search(namespace=\"beat.*\")\n\n # If beat annotations exist, get times and frames\n if len(beat_annot) > 0:\n beats_inters, _ = beat_annot[0].to_interval_values()\n times = beats_inters[:, 0]\n frames = librosa.time_to_frames(times, sr=self.sr,\n hop_length=self.hop_length)\n return times, frames", "response": "Reads the annotated beats if available."} {"SOURCE": "codesearchnet", "instruction": "Given the following Python 3 function, write the documentation\ndef compute_beat_sync_features(self, beat_frames, beat_times, pad):\n if beat_frames is None:\n return None, None\n\n # Make beat synchronous\n beatsync_feats = librosa.util.utils.sync(self._framesync_features.T,\n beat_frames, pad=pad).T\n\n # Assign times (and add last time if padded)\n beatsync_times = np.copy(beat_times)\n if beatsync_times.shape[0] != beatsync_feats.shape[0]:\n beatsync_times = np.concatenate((beatsync_times,\n [self._framesync_times[-1]]))\n return beatsync_feats, beatsync_times", "response": "Compute the features for the beat - synchronous."} {"SOURCE": "codesearchnet", "instruction": "Make a summary of the following Python 3 code\ndef read_features(self, tol=1e-3):\n try:\n # Read JSON file\n with open(self.file_struct.features_file) as f:\n feats = json.load(f)\n\n # Store duration\n if self.dur is None:\n self.dur = float(feats[\"globals\"][\"dur\"])\n\n # Check that we have the correct global parameters\n assert(np.isclose(\n self.dur, float(feats[\"globals\"][\"dur\"]), rtol=tol))\n assert(self.sr == int(feats[\"globals\"][\"sample_rate\"]))\n assert(self.hop_length == int(feats[\"globals\"][\"hop_length\"]))\n assert(os.path.basename(self.file_struct.audio_file) ==\n os.path.basename(feats[\"globals\"][\"audio_file\"]))\n\n # Check for specific features params\n feat_params_err = FeatureParamsError(\n \"Couldn't find features for %s id in file %s\" %\n (self.get_id(), self.file_struct.features_file))\n if self.get_id() not in feats.keys():\n raise feat_params_err\n for param_name in self.get_param_names():\n value = getattr(self, param_name)\n if hasattr(value, '__call__'):\n # Special case of functions\n if value.__name__ != \\\n feats[self.get_id()][\"params\"][param_name]:\n raise feat_params_err\n else:\n if str(value) != \\\n feats[self.get_id()][\"params\"][param_name]:\n raise feat_params_err\n\n # Store actual features\n self._est_beats_times = np.array(feats[\"est_beats\"])\n self._est_beatsync_times = np.array(feats[\"est_beatsync_times\"])\n self._est_beats_frames = librosa.core.time_to_frames(\n self._est_beats_times, sr=self.sr, hop_length=self.hop_length)\n self._framesync_features = \\\n np.array(feats[self.get_id()][\"framesync\"])\n self._est_beatsync_features = \\\n np.array(feats[self.get_id()][\"est_beatsync\"])\n\n # Read annotated beats if available\n if \"ann_beats\" in feats.keys():\n self._ann_beats_times = np.array(feats[\"ann_beats\"])\n self._ann_beatsync_times = np.array(feats[\"ann_beatsync_times\"])\n self._ann_beats_frames = librosa.core.time_to_frames(\n self._ann_beats_times, sr=self.sr,\n hop_length=self.hop_length)\n self._ann_beatsync_features = \\\n np.array(feats[self.get_id()][\"ann_beatsync\"])\n except KeyError:\n raise WrongFeaturesFormatError(\n \"The features file %s is not correctly formatted\" %\n self.file_struct.features_file)\n except AssertionError:\n raise FeaturesNotFound(\n \"The features for the given parameters were not found in \"\n \"features file %s\" % self.file_struct.features_file)\n except IOError:\n raise NoFeaturesFileError(\"Could not find features file %s\",\n self.file_struct.features_file)", "response": "Reads the features from a file and stores them in the current\n object."} {"SOURCE": "codesearchnet", "instruction": "Explain what the following Python 3 code does\ndef write_features(self):\n out_json = collections.OrderedDict()\n try:\n # Only save the necessary information\n self.read_features()\n except (WrongFeaturesFormatError, FeaturesNotFound,\n NoFeaturesFileError):\n # We need to create the file or overwite it\n # Metadata\n out_json = collections.OrderedDict({\"metadata\": {\n \"versions\": {\"librosa\": librosa.__version__,\n \"msaf\": msaf.__version__,\n \"numpy\": np.__version__},\n \"timestamp\": datetime.datetime.today().strftime(\n \"%Y/%m/%d %H:%M:%S\")}})\n\n # Global parameters\n out_json[\"globals\"] = {\n \"dur\": self.dur,\n \"sample_rate\": self.sr,\n \"hop_length\": self.hop_length,\n \"audio_file\": self.file_struct.audio_file\n }\n\n # Beats\n out_json[\"est_beats\"] = self._est_beats_times.tolist()\n out_json[\"est_beatsync_times\"] = self._est_beatsync_times.tolist()\n if self._ann_beats_times is not None:\n out_json[\"ann_beats\"] = self._ann_beats_times.tolist()\n out_json[\"ann_beatsync_times\"] = self._ann_beatsync_times.tolist()\n except FeatureParamsError:\n # We have other features in the file, simply add these ones\n with open(self.file_struct.features_file) as f:\n out_json = json.load(f)\n finally:\n # Specific parameters of the current features\n out_json[self.get_id()] = {}\n out_json[self.get_id()][\"params\"] = {}\n for param_name in self.get_param_names():\n value = getattr(self, param_name)\n # Check for special case of functions\n if hasattr(value, '__call__'):\n value = value.__name__\n else:\n value = str(value)\n out_json[self.get_id()][\"params\"][param_name] = value\n\n # Actual features\n out_json[self.get_id()][\"framesync\"] = \\\n self._framesync_features.tolist()\n out_json[self.get_id()][\"est_beatsync\"] = \\\n self._est_beatsync_features.tolist()\n if self._ann_beatsync_features is not None:\n out_json[self.get_id()][\"ann_beatsync\"] = \\\n self._ann_beatsync_features.tolist()\n\n # Save it\n with open(self.file_struct.features_file, \"w\") as f:\n json.dump(out_json, f, indent=2)", "response": "Saves the current features to a JSON file."} {"SOURCE": "codesearchnet", "instruction": "How would you implement a function in Python 3 that\nreturns the parameter names for these features avoiding the global parameters.", "response": "def get_param_names(self):\n \"\"\"Returns the parameter names for these features, avoiding\n the global parameters.\"\"\"\n return [name for name in vars(self) if not name.startswith('_') and\n name not in self._global_param_names]"} {"SOURCE": "codesearchnet", "instruction": "Given the following Python 3 function, write the documentation\ndef _compute_framesync_times(self):\n self._framesync_times = librosa.core.frames_to_time(\n np.arange(self._framesync_features.shape[0]), self.sr,\n self.hop_length)", "response": "Computes the framesync times based on the framesync features."} {"SOURCE": "codesearchnet", "instruction": "Can you implement a function in Python 3 that\ncomputes all the features for all the beatsync and framesync files from the audio file.", "response": "def _compute_all_features(self):\n \"\"\"Computes all the features (beatsync, framesync) from the audio.\"\"\"\n # Read actual audio waveform\n self._audio, _ = librosa.load(self.file_struct.audio_file,\n sr=self.sr)\n\n # Get duration of audio file\n self.dur = len(self._audio) / float(self.sr)\n\n # Compute actual features\n self._framesync_features = self.compute_features()\n\n # Compute framesync times\n self._compute_framesync_times()\n\n # Compute/Read beats\n self._est_beats_times, self._est_beats_frames = self.estimate_beats()\n self._ann_beats_times, self._ann_beats_frames = self.read_ann_beats()\n\n # Beat-Synchronize\n pad = True # Always append to the end of the features\n self._est_beatsync_features, self._est_beatsync_times = \\\n self.compute_beat_sync_features(self._est_beats_frames,\n self._est_beats_times, pad)\n self._ann_beatsync_features, self._ann_beatsync_times = \\\n self.compute_beat_sync_features(self._ann_beats_frames,\n self._ann_beats_times, pad)"} {"SOURCE": "codesearchnet", "instruction": "Explain what the following Python 3 code does\ndef frame_times(self):\n frame_times = None\n # Make sure we have already computed the features\n self.features\n if self.feat_type is FeatureTypes.framesync:\n self._compute_framesync_times()\n frame_times = self._framesync_times\n elif self.feat_type is FeatureTypes.est_beatsync:\n frame_times = self._est_beatsync_times\n elif self.feat_type is FeatureTypes.ann_beatsync:\n frame_times = self._ann_beatsync_times\n\n return frame_times", "response": "This getter returns the frame times for the corresponding type of\n features."} {"SOURCE": "codesearchnet", "instruction": "Implement a Python 3 function for\nselecting the features from the given parameters.", "response": "def select_features(cls, features_id, file_struct, annot_beats, framesync):\n \"\"\"Selects the features from the given parameters.\n\n Parameters\n ----------\n features_id: str\n The identifier of the features (it must be a key inside the\n `features_registry`)\n file_struct: msaf.io.FileStruct\n The file struct containing the files to extract the features from\n annot_beats: boolean\n Whether to use annotated (`True`) or estimated (`False`) beats\n framesync: boolean\n Whether to use framesync (`True`) or beatsync (`False`) features\n\n Returns\n -------\n features: obj\n The actual features object that inherits from `msaf.Features`\n \"\"\"\n if not annot_beats and framesync:\n feat_type = FeatureTypes.framesync\n elif annot_beats and not framesync:\n feat_type = FeatureTypes.ann_beatsync\n elif not annot_beats and not framesync:\n feat_type = FeatureTypes.est_beatsync\n else:\n raise FeatureTypeNotFound(\"Type of features not valid.\")\n\n # Select features with default parameters\n if features_id not in features_registry.keys():\n raise FeaturesNotFound(\n \"The features '%s' are invalid (valid features are %s)\"\n % (features_id, features_registry.keys()))\n\n return features_registry[features_id](file_struct, feat_type)"} {"SOURCE": "codesearchnet", "instruction": "Can you generate the documentation for the following Python 3 function\ndef _preprocess(self, valid_features=[\"pcp\", \"tonnetz\", \"mfcc\",\n \"cqt\", \"tempogram\"]):\n \"\"\"This method obtains the actual features.\"\"\"\n # Use specific feature\n if self.feature_str not in valid_features:\n raise RuntimeError(\"Feature %s in not valid for algorithm: %s \"\n \"(valid features are %s).\" %\n (self.feature_str, __name__, valid_features))\n else:\n try:\n F = self.features.features\n except KeyError:\n raise RuntimeError(\"Feature %s in not supported by MSAF\" %\n (self.feature_str))\n\n return F", "response": "This method obtains the actual features."} {"SOURCE": "codesearchnet", "instruction": "Can you generate the documentation for the following Python 3 function\ndef _postprocess(self, est_idxs, est_labels):\n # Make sure we are using the previously input bounds, if any\n if self.in_bound_idxs is not None:\n F = self._preprocess()\n est_labels = U.synchronize_labels(self.in_bound_idxs, est_idxs,\n est_labels, F.shape[0])\n est_idxs = self.in_bound_idxs\n\n # Remove empty segments if needed\n est_idxs, est_labels = U.remove_empty_segments(est_idxs, est_labels)\n\n assert len(est_idxs) - 1 == len(est_labels), \"Number of boundaries \" \\\n \"(%d) and number of labels(%d) don't match\" % (len(est_idxs),\n len(est_labels))\n\n # Make sure the indeces are integers\n est_idxs = np.asarray(est_idxs, dtype=int)\n\n return est_idxs, est_labels", "response": "Post processes the estimations from the algorithm removing empty segments and making sure the lenghts of the boundaries and labels match."} {"SOURCE": "codesearchnet", "instruction": "Here you have a function in Python 3, explain what it does\ndef process(in_path, annot_beats=False, feature=\"mfcc\", framesync=False,\n boundaries_id=\"gt\", labels_id=None, n_jobs=4, config=None):\n \"\"\"Sweeps parameters across the specified algorithm.\"\"\"\n\n results_file = \"results_sweep_boundsE%s_labelsE%s.csv\" % (boundaries_id,\n labels_id)\n\n if labels_id == \"cnmf3\" or boundaries_id == \"cnmf3\":\n config = io.get_configuration(feature, annot_beats, framesync,\n boundaries_id, labels_id)\n\n hh = range(15, 33)\n RR = range(15, 40)\n ranks = range(3, 6)\n RR_labels = range(11, 12)\n ranks_labels = range(6, 7)\n all_results = pd.DataFrame()\n for rank in ranks:\n for h in hh:\n for R in RR:\n for rank_labels in ranks_labels:\n for R_labels in RR_labels:\n config[\"h\"] = h\n config[\"R\"] = R\n config[\"rank\"] = rank\n config[\"rank_labels\"] = rank_labels\n config[\"R_labels\"] = R_labels\n config[\"features\"] = None\n\n # Run process\n msaf.run.process(\n in_path, n_jobs=n_jobs,\n boundaries_id=boundaries_id,\n labels_id=labels_id, config=config)\n\n # Compute evaluations\n results = msaf.eval.process(\n in_path, boundaries_id, labels_id,\n save=True, n_jobs=n_jobs, config=config)\n\n # Save avg results\n new_columns = {\"config_h\": h, \"config_R\": R,\n \"config_rank\": rank,\n \"config_R_labels\": R_labels,\n \"config_rank_labels\": rank_labels}\n results = results.append([new_columns],\n ignore_index=True)\n all_results = all_results.append(results.mean(),\n ignore_index=True)\n all_results.to_csv(results_file)\n\n elif labels_id is None and boundaries_id == \"sf\":\n config = io.get_configuration(feature, annot_beats, framesync,\n boundaries_id, labels_id)\n\n MM = range(20, 32)\n mm = range(3, 4)\n kk = np.arange(0.03, 0.1, 0.01)\n Mpp = range(16, 32)\n ott = np.arange(0.02, 0.1, 0.01)\n all_results = pd.DataFrame()\n for k in kk:\n for ot in ott:\n for m in mm:\n for M in MM:\n for Mp in Mpp:\n config[\"M_gaussian\"] = M\n config[\"m_embedded\"] = m\n config[\"k_nearest\"] = k\n config[\"Mp_adaptive\"] = Mp\n config[\"offset_thres\"] = ot\n config[\"features\"] = None\n\n # Run process\n msaf.run.process(\n in_path, n_jobs=n_jobs,\n boundaries_id=boundaries_id,\n labels_id=labels_id, config=config)\n\n # Compute evaluations\n results = msaf.eval.process(\n in_path, boundaries_id, labels_id,\n save=True, n_jobs=n_jobs, config=config)\n\n # Save avg results\n new_columns = {\"config_M\": M, \"config_m\": m,\n \"config_k\": k, \"config_Mp\": Mp,\n \"config_ot\": ot}\n results = results.append([new_columns],\n ignore_index=True)\n all_results = all_results.append(results.mean(),\n ignore_index=True)\n all_results.to_csv(results_file)\n\n else:\n logging.error(\"Can't sweep parameters for %s algorithm. \"\n \"Implement me! :D\")", "response": "Process the input file and return a pandas DataFrame of the results."} {"SOURCE": "codesearchnet", "instruction": "Explain what the following Python 3 code does\ndef main():\n parser = argparse.ArgumentParser(\n description=\"Runs the speficied algorithm(s) on the MSAF \"\n \"formatted dataset.\",\n formatter_class=argparse.ArgumentDefaultsHelpFormatter)\n parser.add_argument(\"in_path\",\n action=\"store\",\n help=\"Input dataset\")\n parser.add_argument(\"-f\",\n action=\"store\",\n dest=\"feature\",\n default=\"pcp\",\n type=str,\n help=\"Type of features\",\n choices=[\"pcp\", \"tonnetz\", \"mfcc\", \"cqt\", \"tempogram\"])\n parser.add_argument(\"-b\",\n action=\"store_true\",\n dest=\"annot_beats\",\n help=\"Use annotated beats\",\n default=False)\n parser.add_argument(\"-fs\",\n action=\"store_true\",\n dest=\"framesync\",\n help=\"Use frame-synchronous features\",\n default=False)\n parser.add_argument(\"-bid\",\n action=\"store\",\n help=\"Boundary algorithm identifier\",\n dest=\"boundaries_id\",\n default=\"gt\",\n choices=[\"gt\"] +\n io.get_all_boundary_algorithms())\n parser.add_argument(\"-lid\",\n action=\"store\",\n help=\"Label algorithm identifier\",\n dest=\"labels_id\",\n default=None,\n choices=io.get_all_label_algorithms())\n parser.add_argument(\"-j\",\n action=\"store\",\n dest=\"n_jobs\",\n default=4,\n type=int,\n help=\"The number of threads to use\")\n args = parser.parse_args()\n start_time = time.time()\n\n # Run the algorithm(s)\n process(args.in_path, annot_beats=args.annot_beats, feature=args.feature,\n framesync=args.framesync, boundaries_id=args.boundaries_id,\n labels_id=args.labels_id, n_jobs=args.n_jobs)\n\n # Done!\n logging.info(\"Done! Took %.2f seconds.\" % (time.time() - start_time))", "response": "Main function to sweep parameters of a certain algorithm."} {"SOURCE": "codesearchnet", "instruction": "Can you generate the documentation for the following Python 3 function\ndef main():\n parser = argparse.ArgumentParser(\n description=\"Runs the speficied algorithm(s) on the input file and \"\n \"the results using the MIREX format.\",\n formatter_class=argparse.ArgumentDefaultsHelpFormatter)\n parser.add_argument(\"-bid\",\n action=\"store\",\n help=\"Boundary algorithm identifier\",\n dest=\"boundaries_id\",\n default=msaf.config.default_bound_id,\n choices=[\"gt\"] +\n msaf.io.get_all_boundary_algorithms())\n parser.add_argument(\"-lid\",\n action=\"store\",\n help=\"Label algorithm identifier\",\n dest=\"labels_id\",\n default=msaf.config.default_label_id,\n choices=msaf.io.get_all_label_algorithms())\n parser.add_argument(\"-i\",\n action=\"store\",\n dest=\"in_file\",\n help=\"Input audio file\")\n parser.add_argument(\"-o\",\n action=\"store\",\n dest=\"out_file\",\n help=\"Output file with the results\",\n default=\"out.txt\")\n\n args = parser.parse_args()\n start_time = time.time()\n\n # Setup the logger\n logging.basicConfig(format='%(asctime)s: %(levelname)s: %(message)s',\n level=logging.INFO)\n\n # Run MSAF\n params = {\n \"annot_beats\": False,\n \"feature\": \"cqt\",\n \"framesync\": False,\n \"boundaries_id\": args.boundaries_id,\n \"labels_id\": args.labels_id,\n \"n_jobs\": 1,\n \"hier\": False,\n \"sonify_bounds\": False,\n \"plot\": False\n }\n res = msaf.run.process(args.in_file, **params)\n msaf.io.write_mirex(res[0], res[1], args.out_file)\n\n # Done!\n logging.info(\"Done! Took %.2f seconds.\" % (time.time() - start_time))", "response": "Main function to parse the arguments and call the main process."} {"SOURCE": "codesearchnet", "instruction": "Can you generate a brief explanation for the following Python 3 code\ndef print_results(results):\n if len(results) == 0:\n logging.warning(\"No results to print!\")\n return\n res = results.mean()\n logging.info(\"Results:\\n%s\" % res)", "response": "Print all the results."} {"SOURCE": "codesearchnet", "instruction": "Here you have a function in Python 3, explain what it does\ndef compute_results(ann_inter, est_inter, ann_labels, est_labels, bins,\n est_file, weight=0.58):\n \"\"\"Compute the results using all the available evaluations.\n\n Parameters\n ----------\n ann_inter : np.array\n Annotated intervals in seconds.\n est_inter : np.array\n Estimated intervals in seconds.\n ann_labels : np.array\n Annotated labels.\n est_labels : np.array\n Estimated labels\n bins : int\n Number of bins for the information gain.\n est_file : str\n Path to the output file to store results.\n weight: float\n Weight the Precision and Recall values of the hit rate boundaries\n differently (<1 will weight Precision higher, >1 will weight Recall\n higher).\n The default parameter (0.58) is the one proposed in (Nieto et al. 2014)\n\n Return\n ------\n results : dict\n Contains the results of all the evaluations for the given file.\n Keys are the following:\n track_id: Name of the track\n HitRate_3F: F-measure of hit rate at 3 seconds\n HitRate_3P: Precision of hit rate at 3 seconds\n HitRate_3R: Recall of hit rate at 3 seconds\n HitRate_0.5F: F-measure of hit rate at 0.5 seconds\n HitRate_0.5P: Precision of hit rate at 0.5 seconds\n HitRate_0.5R: Recall of hit rate at 0.5 seconds\n HitRate_w3F: F-measure of hit rate at 3 seconds weighted\n HitRate_w0.5F: F-measure of hit rate at 0.5 seconds weighted\n HitRate_wt3F: F-measure of hit rate at 3 seconds weighted and\n trimmed\n HitRate_wt0.5F: F-measure of hit rate at 0.5 seconds weighted\n and trimmed\n HitRate_t3F: F-measure of hit rate at 3 seconds (trimmed)\n HitRate_t3P: Precision of hit rate at 3 seconds (trimmed)\n HitRate_t3F: Recall of hit rate at 3 seconds (trimmed)\n HitRate_t0.5F: F-measure of hit rate at 0.5 seconds (trimmed)\n HitRate_t0.5P: Precision of hit rate at 0.5 seconds (trimmed)\n HitRate_t0.5R: Recall of hit rate at 0.5 seconds (trimmed)\n DevA2E: Median deviation of annotation to estimation\n DevE2A: Median deviation of estimation to annotation\n D: Information gain\n PWF: F-measure of pair-wise frame clustering\n PWP: Precision of pair-wise frame clustering\n PWR: Recall of pair-wise frame clustering\n Sf: F-measure normalized entropy score\n So: Oversegmentation normalized entropy score\n Su: Undersegmentation normalized entropy score\n \"\"\"\n res = {}\n\n # --Boundaries-- #\n # Hit Rate standard\n res[\"HitRate_3P\"], res[\"HitRate_3R\"], res[\"HitRate_3F\"] = \\\n mir_eval.segment.detection(ann_inter, est_inter, window=3, trim=False)\n res[\"HitRate_0.5P\"], res[\"HitRate_0.5R\"], res[\"HitRate_0.5F\"] = \\\n mir_eval.segment.detection(ann_inter, est_inter, window=.5, trim=False)\n\n # Hit rate trimmed\n res[\"HitRate_t3P\"], res[\"HitRate_t3R\"], res[\"HitRate_t3F\"] = \\\n mir_eval.segment.detection(ann_inter, est_inter, window=3, trim=True)\n res[\"HitRate_t0.5P\"], res[\"HitRate_t0.5R\"], res[\"HitRate_t0.5F\"] = \\\n mir_eval.segment.detection(ann_inter, est_inter, window=.5, trim=True)\n\n # Hit rate weighted\n _, _, res[\"HitRate_w3F\"] = mir_eval.segment.detection(\n ann_inter, est_inter, window=3, trim=False, beta=weight)\n _, _, res[\"HitRate_w0.5F\"] = mir_eval.segment.detection(\n ann_inter, est_inter, window=.5, trim=False, beta=weight)\n\n # Hit rate weighted and trimmed\n _, _, res[\"HitRate_wt3F\"] = mir_eval.segment.detection(\n ann_inter, est_inter, window=3, trim=True, beta=weight)\n _, _, res[\"HitRate_wt0.5F\"] = mir_eval.segment.detection(\n ann_inter, est_inter, window=.5, trim=True, beta=weight)\n\n # Information gain\n res[\"D\"] = compute_information_gain(ann_inter, est_inter, est_file,\n bins=bins)\n\n # Median Deviations\n res[\"DevR2E\"], res[\"DevE2R\"] = mir_eval.segment.deviation(\n ann_inter, est_inter, trim=False)\n res[\"DevtR2E\"], res[\"DevtE2R\"] = mir_eval.segment.deviation(\n ann_inter, est_inter, trim=True)\n\n # --Labels-- #\n if est_labels is not None and (\"-1\" in est_labels or \"@\" in est_labels):\n est_labels = None\n if est_labels is not None and len(est_labels) != 0:\n # Align labels with intervals\n ann_labels = list(ann_labels)\n est_labels = list(est_labels)\n ann_inter, ann_labels = mir_eval.util.adjust_intervals(ann_inter,\n ann_labels)\n est_inter, est_labels = mir_eval.util.adjust_intervals(\n est_inter, est_labels, t_min=0.0, t_max=ann_inter.max())\n\n # Pair-wise frame clustering\n res[\"PWP\"], res[\"PWR\"], res[\"PWF\"] = mir_eval.segment.pairwise(\n ann_inter, ann_labels, est_inter, est_labels)\n\n # Normalized Conditional Entropies\n res[\"So\"], res[\"Su\"], res[\"Sf\"] = mir_eval.segment.nce(\n ann_inter, ann_labels, est_inter, est_labels)\n\n # Names\n base = os.path.basename(est_file)\n res[\"track_id\"] = base[:-5]\n res[\"ds_name\"] = base.split(\"_\")[0]\n\n return res", "response": "Compute the results of all the available evaluations for the given file."} {"SOURCE": "codesearchnet", "instruction": "How would you code a function in Python 3 to\ncompute the results by using the ground truth dataset identified by by the annotator parameter.", "response": "def compute_gt_results(est_file, ref_file, boundaries_id, labels_id, config,\n bins=251, annotator_id=0):\n \"\"\"Computes the results by using the ground truth dataset identified by\n the annotator parameter.\n\n Return\n ------\n results : dict\n Dictionary of the results (see function compute_results).\n \"\"\"\n if config[\"hier\"]:\n ref_times, ref_labels, ref_levels = \\\n msaf.io.read_hier_references(\n ref_file, annotation_id=annotator_id,\n exclude_levels=[\"segment_salami_function\"])\n else:\n jam = jams.load(ref_file, validate=False)\n ann = jam.search(namespace='segment_.*')[annotator_id]\n ref_inter, ref_labels = ann.to_interval_values()\n\n # Read estimations with correct configuration\n est_inter, est_labels = io.read_estimations(est_file, boundaries_id,\n labels_id, **config)\n\n # Compute the results and return\n logging.info(\"Evaluating %s\" % os.path.basename(est_file))\n if config[\"hier\"]:\n # Hierarchical\n assert len(est_inter) == len(est_labels), \"Same number of levels \" \\\n \"are required in the boundaries and labels for the hierarchical \" \\\n \"evaluation.\"\n est_times = []\n est_labels = []\n\n # Sort based on how many segments per level\n est_inter = sorted(est_inter, key=lambda level: len(level))\n\n for inter in est_inter:\n est_times.append(msaf.utils.intervals_to_times(inter))\n # Add fake labels (hierarchical eval does not use labels --yet--)\n est_labels.append(np.ones(len(est_times[-1]) - 1) * -1)\n\n # Align the times\n utils.align_end_hierarchies(est_times, ref_times, thres=1)\n\n # To intervals\n est_hier = [utils.times_to_intervals(times) for times in est_times]\n ref_hier = [utils.times_to_intervals(times) for times in ref_times]\n\n # Compute evaluations\n res = {}\n res[\"t_recall10\"], res[\"t_precision10\"], res[\"t_measure10\"] = \\\n mir_eval.hierarchy.tmeasure(ref_hier, est_hier, window=10)\n res[\"t_recall15\"], res[\"t_precision15\"], res[\"t_measure15\"] = \\\n mir_eval.hierarchy.tmeasure(ref_hier, est_hier, window=15)\n\n res[\"track_id\"] = os.path.basename(est_file)[:-5]\n return res\n else:\n # Flat\n return compute_results(ref_inter, est_inter, ref_labels, est_labels,\n bins, est_file)"} {"SOURCE": "codesearchnet", "instruction": "Can you implement a function in Python 3 that\ncomputes the information gain of the est_file from the annotated intervals and the estimated intervals.", "response": "def compute_information_gain(ann_inter, est_inter, est_file, bins):\n \"\"\"Computes the information gain of the est_file from the annotated\n intervals and the estimated intervals.\"\"\"\n ann_times = utils.intervals_to_times(ann_inter)\n est_times = utils.intervals_to_times(est_inter)\n return mir_eval.beat.information_gain(ann_times, est_times, bins=bins)"} {"SOURCE": "codesearchnet", "instruction": "Given the following Python 3 function, write the documentation\ndef process_track(file_struct, boundaries_id, labels_id, config,\n annotator_id=0):\n \"\"\"Processes a single track.\n\n Parameters\n ----------\n file_struct : object (FileStruct) or str\n File struct or full path of the audio file to be evaluated.\n boundaries_id : str\n Identifier of the boundaries algorithm.\n labels_id : str\n Identifier of the labels algorithm.\n config : dict\n Configuration of the algorithms to be evaluated.\n annotator_id : int\n Number identifiying the annotator.\n\n Returns\n -------\n one_res : dict\n Dictionary of the results (see function compute_results).\n \"\"\"\n # Convert to file_struct if string is passed\n if isinstance(file_struct, six.string_types):\n file_struct = io.FileStruct(file_struct)\n\n est_file = file_struct.est_file\n ref_file = file_struct.ref_file\n\n # Sanity check\n assert os.path.basename(est_file)[:-4] == \\\n os.path.basename(ref_file)[:-4], \"File names are different %s --- %s\" \\\n % (os.path.basename(est_file)[:-4], os.path.basename(ref_file)[:-4])\n\n if not os.path.isfile(ref_file):\n raise NoReferencesError(\"Reference file %s does not exist. You must \"\n \"have annotated references to run \"\n \"evaluations.\" % ref_file)\n\n one_res = compute_gt_results(est_file, ref_file, boundaries_id, labels_id,\n config, annotator_id=annotator_id)\n\n return one_res", "response": "Processes a single audio file and returns a dictionary of the results."} {"SOURCE": "codesearchnet", "instruction": "Implement a Python 3 function for\ngetting the file name to store the results.", "response": "def get_results_file_name(boundaries_id, labels_id, config,\n annotator_id):\n \"\"\"Based on the config and the dataset, get the file name to store the\n results.\"\"\"\n utils.ensure_dir(msaf.config.results_dir)\n file_name = os.path.join(msaf.config.results_dir, \"results\")\n file_name += \"_boundsE%s_labelsE%s\" % (boundaries_id, labels_id)\n file_name += \"_annotatorE%d\" % (annotator_id)\n sorted_keys = sorted(config.keys(), key=str.lower)\n for key in sorted_keys:\n file_name += \"_%sE%s\" % (key, str(config[key]).replace(\"/\", \"_\"))\n\n # Check for max file length\n if len(file_name) > 255 - len(msaf.config.results_ext):\n file_name = file_name[:255 - len(msaf.config.results_ext)]\n\n return file_name + msaf.config.results_ext"} {"SOURCE": "codesearchnet", "instruction": "Here you have a function in Python 3, explain what it does\ndef process(in_path, boundaries_id=msaf.config.default_bound_id,\n labels_id=msaf.config.default_label_id, annot_beats=False,\n framesync=False, feature=\"pcp\", hier=False, save=False,\n out_file=None, n_jobs=4, annotator_id=0, config=None):\n \"\"\"Main process to evaluate algorithms' results.\n\n Parameters\n ----------\n in_path : str\n Path to the dataset root folder.\n boundaries_id : str\n Boundaries algorithm identifier (e.g. siplca, cnmf)\n labels_id : str\n Labels algorithm identifier (e.g. siplca, cnmf)\n ds_name : str\n Name of the dataset to be evaluated (e.g. SALAMI). * stands for all.\n annot_beats : boolean\n Whether to use the annotated beats or not.\n framesync: str\n Whether to use framesync features or not (default: False -> beatsync)\n feature: str\n String representing the feature to be used (e.g. pcp, mfcc, tonnetz)\n hier : bool\n Whether to compute a hierarchical or flat segmentation.\n save: boolean\n Whether to save the results into the `out_file` csv file.\n out_file: str\n Path to the csv file to save the results (if `None` and `save = True`\n it will save the results in the default file name obtained by\n calling `get_results_file_name`).\n n_jobs: int\n Number of processes to run in parallel. Only available in collection\n mode.\n annotator_id : int\n Number identifiying the annotator.\n config: dict\n Dictionary containing custom configuration parameters for the\n algorithms. If None, the default parameters are used.\n\n Return\n ------\n results : pd.DataFrame\n DataFrame containing the evaluations for each file.\n \"\"\"\n\n # Set up configuration based on algorithms parameters\n if config is None:\n config = io.get_configuration(feature, annot_beats, framesync,\n boundaries_id, labels_id)\n\n # Hierarchical segmentation\n config[\"hier\"] = hier\n\n # Remove actual features\n config.pop(\"features\", None)\n\n # Get out file in case we want to save results\n if out_file is None:\n out_file = get_results_file_name(boundaries_id, labels_id, config,\n annotator_id)\n\n # If out_file already exists, read and return them\n if os.path.exists(out_file):\n logging.warning(\"Results already exists, reading from file %s\" %\n out_file)\n results = pd.read_csv(out_file)\n print_results(results)\n return results\n\n # Perform actual evaluations\n if os.path.isfile(in_path):\n # Single File mode\n evals = [process_track(in_path, boundaries_id, labels_id, config,\n annotator_id=annotator_id)]\n else:\n # Collection mode\n # Get files\n file_structs = io.get_dataset_files(in_path)\n\n # Evaluate in parallel\n logging.info(\"Evaluating %d tracks...\" % len(file_structs))\n evals = Parallel(n_jobs=n_jobs)(delayed(process_track)(\n file_struct, boundaries_id, labels_id, config,\n annotator_id=annotator_id) for file_struct in file_structs[:])\n\n # Aggregate evaluations in pandas format\n results = pd.DataFrame()\n for e in evals:\n if e != []:\n results = results.append(e, ignore_index=True)\n logging.info(\"%d tracks analyzed\" % len(results))\n\n # Print results\n print_results(results)\n\n # Save all results\n if save:\n logging.info(\"Writing results in %s\" % out_file)\n results.to_csv(out_file)\n\n return results", "response": "This function processes the algorithms in the dataset root folder and returns the results as a pandas DataFrame."} {"SOURCE": "codesearchnet", "instruction": "Can you tell what is the following Python 3 function doing\ndef parse_config_string(config_string, issue_warnings=True):\n config_dict = {}\n my_splitter = shlex.shlex(config_string, posix=True)\n my_splitter.whitespace = ','\n my_splitter.whitespace_split = True\n for kv_pair in my_splitter:\n kv_pair = kv_pair.strip()\n if not kv_pair:\n continue\n kv_tuple = kv_pair.split('=', 1)\n if len(kv_tuple) == 1:\n if issue_warnings:\n MsafConfigWarning.warn(\n (\"Config key '%s' has no value, ignoring it\" %\n kv_tuple[0]), stacklevel=1)\n else:\n k, v = kv_tuple\n # subsequent values for k will override earlier ones\n config_dict[k] = v\n return config_dict", "response": "Parses a config string into a dict."} {"SOURCE": "codesearchnet", "instruction": "How would you code a function in Python 3 to\nfetch the overriding config value for a key.", "response": "def fetch_val_for_key(key, delete_key=False):\n \"\"\"Return the overriding config value for a key.\n A successful search returns a string value.\n An unsuccessful search raises a KeyError\n The (decreasing) priority order is:\n - MSAF_FLAGS\n - ~./msafrc\n \"\"\"\n\n # first try to find it in the FLAGS\n try:\n if delete_key:\n return MSAF_FLAGS_DICT.pop(key)\n return MSAF_FLAGS_DICT[key]\n except KeyError:\n pass\n\n # next try to find it in the config file\n\n # config file keys can be of form option, or section.option\n key_tokens = key.rsplit('.', 1)\n if len(key_tokens) == 2:\n section, option = key_tokens\n else:\n section, option = 'global', key\n try:\n try:\n return msaf_cfg.get(section, option)\n except InterpolationError:\n return msaf_raw_cfg.get(section, option)\n except (NoOptionError, NoSectionError):\n raise KeyError(key)"} {"SOURCE": "codesearchnet", "instruction": "Make a summary of the following Python 3 code\ndef AddConfigVar(name, doc, configparam, root=config):\n\n # This method also performs some of the work of initializing ConfigParam\n # instances\n\n if root is config:\n # only set the name in the first call, not the recursive ones\n configparam.fullname = name\n sections = name.split('.')\n if len(sections) > 1:\n # set up a subobject\n if not hasattr(root, sections[0]):\n # every internal node in the config tree is an instance of its own\n # unique class\n class SubObj(object):\n _i_am_a_config_class = True\n setattr(root.__class__, sections[0], SubObj())\n newroot = getattr(root, sections[0])\n if (not getattr(newroot, '_i_am_a_config_class', False) or\n isinstance(newroot, type)):\n raise TypeError(\n 'Internal config nodes must be config class instances',\n newroot)\n return AddConfigVar('.'.join(sections[1:]), doc, configparam,\n root=newroot)\n else:\n if hasattr(root, name):\n raise AttributeError('This name is already taken',\n configparam.fullname)\n configparam.doc = doc\n # Trigger a read of the value from config files and env vars\n # This allow to filter wrong value from the user.\n if not callable(configparam.default):\n configparam.__get__(root, type(root), delete_key=True)\n else:\n # We do not want to evaluate now the default value\n # when it is a callable.\n try:\n fetch_val_for_key(configparam.fullname)\n # The user provided a value, filter it now.\n configparam.__get__(root, type(root), delete_key=True)\n except KeyError:\n pass\n setattr(root.__class__, sections[0], configparam)\n _config_var_list.append(configparam)", "response": "Add a new variable to the msaf. config object."} {"SOURCE": "codesearchnet", "instruction": "Can you generate a brief explanation for the following Python 3 code\ndef processFlat(self):\n # Preprocess to obtain features (array(n_frames, n_features))\n\n F = self._preprocess()\n F = librosa.util.normalize(F, axis=0)\n F = librosa.feature.stack_memory(F.T).T\n\n self.config[\"hier\"] = False\n my_bounds, my_labels, _ = main.scluster_segment(F, self.config, self.in_bound_idxs)\n\n # Post process estimations\n est_idxs, est_labels = self._postprocess(my_bounds, my_labels)\n\n assert est_idxs[0] == 0 and est_idxs[-1] == F.shape[0] - 1\n # We're done!\n return est_idxs, est_labels", "response": "Main process.\n Returns\n -------\n est_idxs : np.array(N)\n Estimated indeces the segment boundaries in frame indeces.\n est_labels : np.array(N-1)\n Estimated labels for the segments."} {"SOURCE": "codesearchnet", "instruction": "Create a Python 3 function to\ncompute all features for the given file.", "response": "def compute_all_features(file_struct, framesync):\n \"\"\"Computes all features for the given file.\"\"\"\n for feature_id in msaf.features_registry:\n logging.info(\"Computing %s for file %s\" % (feature_id,\n file_struct.audio_file))\n feats = Features.select_features(feature_id, file_struct, False, framesync)\n feats.features"} {"SOURCE": "codesearchnet", "instruction": "Can you write a function in Python 3 where it\ncomputes the features for the selected dataset or file.", "response": "def process(in_path, out_file, n_jobs, framesync):\n \"\"\"Computes the features for the selected dataset or file.\"\"\"\n if os.path.isfile(in_path):\n # Single file mode\n # Get (if they exitst) or compute features\n file_struct = msaf.io.FileStruct(in_path)\n file_struct.features_file = out_file\n compute_all_features(file_struct, framesync)\n else:\n # Collection mode\n file_structs = msaf.io.get_dataset_files(in_path)\n\n # Call in parallel\n return Parallel(n_jobs=n_jobs)(delayed(compute_all_features)(\n file_struct, framesync) for file_struct in file_structs)"} {"SOURCE": "codesearchnet", "instruction": "Given the following Python 3 function, write the documentation\ndef features(file_struct, annot_beats=False, framesync=False):\n '''Feature-extraction for audio segmentation\n Arguments:\n file_struct -- msaf.io.FileStruct\n paths to the input files in the Segmentation dataset\n\n Returns:\n - X -- ndarray\n\n beat-synchronous feature matrix:\n MFCC (mean-aggregated)\n Chroma (median-aggregated)\n Latent timbre repetition\n Latent chroma repetition\n Time index\n Beat index\n\n - dur -- float\n duration of the track in seconds\n\n '''\n def compress_data(X, k):\n Xtemp = X.dot(X.T)\n if len(Xtemp) == 0:\n return None\n e_vals, e_vecs = np.linalg.eig(Xtemp)\n\n e_vals = np.maximum(0.0, np.real(e_vals))\n e_vecs = np.real(e_vecs)\n\n idx = np.argsort(e_vals)[::-1]\n\n e_vals = e_vals[idx]\n e_vecs = e_vecs[:, idx]\n\n # Truncate to k dimensions\n if k < len(e_vals):\n e_vals = e_vals[:k]\n e_vecs = e_vecs[:, :k]\n\n # Normalize by the leading singular value of X\n Z = np.sqrt(e_vals.max())\n\n if Z > 0:\n e_vecs = e_vecs / Z\n\n return e_vecs.T.dot(X)\n\n # Latent factor repetition features\n def repetition(X, metric='euclidean'):\n R = librosa.segment.recurrence_matrix(\n X, k=2 * int(np.ceil(np.sqrt(X.shape[1]))),\n width=REP_WIDTH, metric=metric, sym=False).astype(np.float32)\n\n P = scipy.signal.medfilt2d(librosa.segment.recurrence_to_lag(R),\n [1, REP_FILTER])\n\n # Discard empty rows.\n # This should give an equivalent SVD, but resolves some numerical\n # instabilities.\n P = P[P.any(axis=1)]\n\n return compress_data(P, N_REP)\n\n #########\n # '\\tloading annotations and features of ', audio_path\n pcp_obj = Features.select_features(\"pcp\", file_struct, annot_beats,\n framesync)\n mfcc_obj = Features.select_features(\"mfcc\", file_struct, annot_beats,\n framesync)\n chroma = pcp_obj.features\n mfcc = mfcc_obj.features\n beats = pcp_obj.frame_times\n dur = pcp_obj.dur\n\n # Sampling Rate\n sr = msaf.config.sample_rate\n\n ##########\n # print '\\treading beats'\n B = beats[:chroma.shape[0]]\n # beat_frames = librosa.time_to_frames(B, sr=sr,\n #hop_length=msaf.config.hop_size)\n #print beat_frames, len(beat_frames), uidx\n\n #########\n M = mfcc.T\n #plt.imshow(M, interpolation=\"nearest\", aspect=\"auto\"); plt.show()\n\n #########\n # Get the beat-sync chroma\n C = chroma.T\n C += C.min() + 0.1\n C = C / C.max(axis=0)\n C = 80 * np.log10(C) # Normalize from -80 to 0\n #plt.imshow(C, interpolation=\"nearest\", aspect=\"auto\"); plt.show()\n\n # Time-stamp features\n N = np.arange(float(chroma.shape[0]))\n\n #########\n #print '\\tgenerating structure features'\n\n # TODO: This might fail if audio file (or number of beats) is too small\n R_timbre = repetition(librosa.feature.stack_memory(M))\n R_chroma = repetition(librosa.feature.stack_memory(C))\n if R_timbre is None or R_chroma is None:\n return None, dur\n\n R_timbre += R_timbre.min()\n R_timbre /= R_timbre.max()\n R_chroma += R_chroma.min()\n R_chroma /= R_chroma.max()\n #plt.imshow(R_chroma, interpolation=\"nearest\", aspect=\"auto\"); plt.show()\n\n # Stack it all up\n #print M.shape, C.shape, R_timbre.shape, R_chroma.shape, len(B), len(N)\n X = np.vstack([M, C, R_timbre, R_chroma, B, B / dur, N,\n N / float(chroma.shape[0])])\n\n #plt.imshow(X, interpolation=\"nearest\", aspect=\"auto\"); plt.show()\n\n return X, dur", "response": "Feature - extraction for audio segmentation\n "} {"SOURCE": "codesearchnet", "instruction": "Given the following Python 3 function, write the documentation\ndef gaussian_cost(X):\n '''Return the average log-likelihood of data under a standard normal\n '''\n\n d, n = X.shape\n\n if n < 2:\n return 0\n\n sigma = np.var(X, axis=1, ddof=1)\n\n cost = -0.5 * d * n * np.log(2. * np.pi) - 0.5 * (n - 1.) * np.sum(sigma)\n return cost", "response": "Return the average log - likelihood of data under a standard normal\n "} {"SOURCE": "codesearchnet", "instruction": "Can you implement a function in Python 3 that\nlogs - normalizes features such that each vector is between min_db to 0.", "response": "def lognormalize(F, floor=0.1, min_db=-80):\n \"\"\"Log-normalizes features such that each vector is between min_db to 0.\"\"\"\n assert min_db < 0\n F = min_max_normalize(F, floor=floor)\n F = np.abs(min_db) * np.log10(F) # Normalize from min_db to 0\n return F"} {"SOURCE": "codesearchnet", "instruction": "Explain what the following Python 3 code does\ndef min_max_normalize(F, floor=0.001):\n F += -F.min() + floor\n F = F / F.max(axis=0)\n return F", "response": "Normalizes features such that each vector is between floor to 1."} {"SOURCE": "codesearchnet", "instruction": "Given the following Python 3 function, write the documentation\ndef normalize(X, norm_type, floor=0.0, min_db=-80):\n if isinstance(norm_type, six.string_types):\n if norm_type == \"min_max\":\n return min_max_normalize(X, floor=floor)\n if norm_type == \"log\":\n return lognormalize(X, floor=floor, min_db=min_db)\n return librosa.util.normalize(X, norm=norm_type, axis=1)", "response": "Normalizes the given matrix of features."} {"SOURCE": "codesearchnet", "instruction": "Explain what the following Python 3 code does\ndef get_time_frames(dur, anal):\n n_frames = get_num_frames(dur, anal)\n return np.linspace(0, dur, num=n_frames)", "response": "Gets the time frames and puts them in a numpy array."} {"SOURCE": "codesearchnet", "instruction": "Can you generate the documentation for the following Python 3 function\ndef remove_empty_segments(times, labels):\n assert len(times) - 1 == len(labels)\n inters = times_to_intervals(times)\n new_inters = []\n new_labels = []\n for inter, label in zip(inters, labels):\n if inter[0] < inter[1]:\n new_inters.append(inter)\n new_labels.append(label)\n return intervals_to_times(np.asarray(new_inters)), new_labels", "response": "Removes empty segments if needed."} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 function for\nsynchronizing the labels from the new_bound_idxs to the old_labels.", "response": "def synchronize_labels(new_bound_idxs, old_bound_idxs, old_labels, N):\n \"\"\"Synchronizes the labels from the old_bound_idxs to the new_bound_idxs.\n\n Parameters\n ----------\n new_bound_idxs: np.array\n New indeces to synchronize with.\n old_bound_idxs: np.array\n Old indeces, same shape as labels + 1.\n old_labels: np.array\n Labels associated to the old_bound_idxs.\n N: int\n Total number of frames.\n\n Returns\n -------\n new_labels: np.array\n New labels, synchronized to the new boundary indeces.\n \"\"\"\n assert len(old_bound_idxs) - 1 == len(old_labels)\n\n # Construct unfolded labels array\n unfold_labels = np.zeros(N)\n for i, (bound_idx, label) in enumerate(\n zip(old_bound_idxs[:-1], old_labels)):\n unfold_labels[bound_idx:old_bound_idxs[i + 1]] = label\n\n # Constuct new labels\n new_labels = np.zeros(len(new_bound_idxs) - 1)\n for i, bound_idx in enumerate(new_bound_idxs[:-1]):\n new_labels[i] = np.median(\n unfold_labels[bound_idx:new_bound_idxs[i + 1]])\n\n return new_labels"} {"SOURCE": "codesearchnet", "instruction": "Can you create a Python 3 function that\nprocesses a level of segmentation and converts it into times.", "response": "def process_segmentation_level(est_idxs, est_labels, N, frame_times, dur):\n \"\"\"Processes a level of segmentation, and converts it into times.\n\n Parameters\n ----------\n est_idxs: np.array\n Estimated boundaries in frame indeces.\n est_labels: np.array\n Estimated labels.\n N: int\n Number of frames in the whole track.\n frame_times: np.array\n Time stamp for each frame.\n dur: float\n Duration of the audio track.\n\n Returns\n -------\n est_times: np.array\n Estimated segment boundaries in seconds.\n est_labels: np.array\n Estimated labels for each segment.\n \"\"\"\n assert est_idxs[0] == 0 and est_idxs[-1] == N - 1\n assert len(est_idxs) - 1 == len(est_labels)\n\n # Add silences, if needed\n est_times = np.concatenate(([0], frame_times[est_idxs], [dur]))\n silence_label = np.max(est_labels) + 1\n est_labels = np.concatenate(([silence_label], est_labels, [silence_label]))\n\n # Remove empty segments if needed\n est_times, est_labels = remove_empty_segments(est_times, est_labels)\n\n # Make sure that the first and last times are 0 and duration, respectively\n assert np.allclose([est_times[0]], [0]) and \\\n np.allclose([est_times[-1]], [dur])\n\n return est_times, est_labels"} {"SOURCE": "codesearchnet", "instruction": "Can you implement a function in Python 3 that\naligns the end of the hierarchies such that they end at the same exact value and they have the same duration within a certain threshold.", "response": "def align_end_hierarchies(hier1, hier2, thres=0.5):\n \"\"\"Align the end of the hierarchies such that they end at the same exact\n second as long they have the same duration within a certain threshold.\n\n Parameters\n ----------\n hier1: list\n List containing hierarchical segment boundaries.\n hier2: list\n List containing hierarchical segment boundaries.\n thres: float > 0\n Threshold to decide whether two values are the same.\n \"\"\"\n # Make sure we have correctly formatted hierarchies\n dur_h1 = hier1[0][-1]\n for hier in hier1:\n assert hier[-1] == dur_h1, \"hier1 is not correctly \" \\\n \"formatted {} {}\".format(hier[-1], dur_h1)\n dur_h2 = hier2[0][-1]\n for hier in hier2:\n assert hier[-1] == dur_h2, \"hier2 is not correctly formatted\"\n\n # If durations are different, do nothing\n if abs(dur_h1 - dur_h2) > thres:\n return\n\n # Align h1 with h2\n for hier in hier1:\n hier[-1] = dur_h2"} {"SOURCE": "codesearchnet", "instruction": "Make a summary of the following Python 3 code\ndef _distance(self, idx):\n\n if scipy.sparse.issparse(self.data):\n step = self.data.shape[1]\n else:\n step = 50000\n\n d = np.zeros((self.data.shape[1]))\n if idx == -1:\n # set vec to origin if idx=-1\n vec = np.zeros((self.data.shape[0], 1))\n if scipy.sparse.issparse(self.data):\n vec = scipy.sparse.csc_matrix(vec)\n else:\n vec = self.data[:, idx:idx+1]\n\n self._logger.info('compute distance to node ' + str(idx))\n\n # slice data into smaller chunks\n for idx_start in range(0, self.data.shape[1], step):\n if idx_start + step > self.data.shape[1]:\n idx_end = self.data.shape[1]\n else:\n idx_end = idx_start + step\n\n d[idx_start:idx_end] = self._distfunc(\n self.data[:,idx_start:idx_end], vec)\n self._logger.info('completed:' +\n str(idx_end/(self.data.shape[1]/100.0)) + \"%\")\n return d", "response": "compute distances of a specific data point to all other samples"} {"SOURCE": "codesearchnet", "instruction": "Explain what the following Python 3 code does\ndef estimate_K_xmeans(self, th=0.2, maxK = 10):\n\n # Run initial K-means\n means, labels = self.run_kmeans(self.X, self.init_K)\n\n # Run X-means algorithm\n stop = False\n curr_K = self.init_K\n while not stop:\n stop = True\n final_means = []\n for k in range(curr_K):\n # Find the data that corresponds to the k-th cluster\n D = self.get_clustered_data(self.X, labels, k)\n if len(D) == 0 or D.shape[0] == 1:\n continue\n\n # Whiten and find whitened mean\n stdD = np.std(D, axis=0)\n #D = vq.whiten(D)\n D /= float(stdD) # Same as line above\n mean = D.mean(axis=0)\n\n # Cluster this subspace by half (K=2)\n half_means, half_labels = self.run_kmeans(D, K=2)\n\n # Compute BICs\n bic1 = self.compute_bic(D, [mean], K=1,\n labels=np.zeros(D.shape[0]),\n R=D.shape[0])\n bic2 = self.compute_bic(D, half_means, K=2,\n labels=half_labels, R=D.shape[0])\n\n # Split or not\n max_bic = np.max([np.abs(bic1), np.abs(bic2)])\n norm_bic1 = bic1 / float(max_bic)\n norm_bic2 = bic2 / float(max_bic)\n diff_bic = np.abs(norm_bic1 - norm_bic2)\n\n # Split!\n #print \"diff_bic\", diff_bic\n if diff_bic > th:\n final_means.append(half_means[0] * stdD)\n final_means.append(half_means[1] * stdD)\n curr_K += 1\n stop = False\n # Don't split\n else:\n final_means.append(mean * stdD)\n\n final_means = np.asarray(final_means)\n\n #print \"Estimated K: \", curr_K\n if self.plot:\n plt.scatter(self.X[:, 0], self.X[:, 1])\n plt.scatter(final_means[:, 0], final_means[:, 1], color=\"y\")\n plt.show()\n\n if curr_K >= maxK or self.X.shape[-1] != final_means.shape[-1]:\n stop = True\n else:\n labels, dist = vq.vq(self.X, final_means)\n\n return curr_K", "response": "Estimates K running X - means algorithm."} {"SOURCE": "codesearchnet", "instruction": "Create a Python 3 function for\nestimating the K using K - means and BIC by sweeping various K and choosing the optimal BIC.", "response": "def estimate_K_knee(self, th=.015, maxK=12):\n \"\"\"Estimates the K using K-means and BIC, by sweeping various K and\n choosing the optimal BIC.\"\"\"\n # Sweep K-means\n if self.X.shape[0] < maxK:\n maxK = self.X.shape[0]\n if maxK < 2:\n maxK = 2\n K = np.arange(1, maxK)\n bics = []\n for k in K:\n means, labels = self.run_kmeans(self.X, k)\n bic = self.compute_bic(self.X, means, labels, K=k,\n R=self.X.shape[0])\n bics.append(bic)\n diff_bics = np.diff(bics)\n finalK = K[-1]\n\n if len(bics) == 1:\n finalK = 2\n else:\n # Normalize\n bics = np.asarray(bics)\n bics -= bics.min()\n #bics /= bics.max()\n diff_bics -= diff_bics.min()\n #diff_bics /= diff_bics.max()\n\n #print bics, diff_bics\n\n # Find optimum K\n for i in range(len(K[:-1])):\n #if bics[i] > diff_bics[i]:\n if diff_bics[i] < th and K[i] != 1:\n finalK = K[i]\n break\n\n #print \"Estimated K: \", finalK\n if self.plot:\n plt.subplot(2, 1, 1)\n plt.plot(K, bics, label=\"BIC\")\n plt.plot(K[:-1], diff_bics, label=\"BIC diff\")\n plt.legend(loc=2)\n plt.subplot(2, 1, 2)\n plt.scatter(self.X[:, 0], self.X[:, 1])\n plt.show()\n\n return finalK"} {"SOURCE": "codesearchnet", "instruction": "Can you write a function in Python 3 where it\nreturns the data with a specific label_index using the previously splitted labels.", "response": "def get_clustered_data(self, X, labels, label_index):\n \"\"\"Returns the data with a specific label_index, using the previously\n learned labels.\"\"\"\n D = X[np.argwhere(labels == label_index)]\n return D.reshape((D.shape[0], D.shape[-1]))"} {"SOURCE": "codesearchnet", "instruction": "How would you implement a function in Python 3 that\nruns k - means and returns the labels assigned to the data.", "response": "def run_kmeans(self, X, K):\n \"\"\"Runs k-means and returns the labels assigned to the data.\"\"\"\n wX = vq.whiten(X)\n means, dist = vq.kmeans(wX, K, iter=100)\n labels, dist = vq.vq(wX, means)\n return means, labels"} {"SOURCE": "codesearchnet", "instruction": "Make a summary of the following Python 3 code\ndef compute_bic(self, D, means, labels, K, R):\n D = vq.whiten(D)\n Rn = D.shape[0]\n M = D.shape[1]\n\n if R == K:\n return 1\n\n # Maximum likelihood estimate (MLE)\n mle_var = 0\n for k in range(len(means)):\n X = D[np.argwhere(labels == k)]\n X = X.reshape((X.shape[0], X.shape[-1]))\n for x in X:\n mle_var += distance.euclidean(x, means[k])\n #print x, means[k], mle_var\n mle_var /= float(R - K)\n\n # Log-likelihood of the data\n l_D = - Rn/2. * np.log(2*np.pi) - (Rn * M)/2. * np.log(mle_var) - \\\n (Rn - K) / 2. + Rn * np.log(Rn) - Rn * np.log(R)\n\n # Params of BIC\n p = (K-1) + M * K + mle_var\n\n #print \"BIC:\", l_D, p, R, K\n\n # Return the bic\n return l_D - p / 2. * np.log(R)", "response": "Computes the Bayesian Information Criterion."} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 function that can\ngenerate N * K 2D data points with K means and N data points for each mean and N data points for each mean.", "response": "def generate_2d_data(self, N=100, K=5):\n \"\"\"Generates N*K 2D data points with K means and N data points\n for each mean.\"\"\"\n # Seed the random\n np.random.seed(seed=int(time.time()))\n\n # Amount of spread of the centroids\n spread = 30\n\n # Generate random data\n X = np.empty((0, 2))\n for i in range(K):\n mean = np.array([np.random.random()*spread,\n np.random.random()*spread])\n x = np.random.normal(0.0, scale=1.0, size=(N, 2)) + mean\n X = np.append(X, x, axis=0)\n\n return X"} {"SOURCE": "codesearchnet", "instruction": "Can you generate a brief explanation for the following Python 3 code\ndef factorize(self):\n # compute new coefficients for reconstructing data points\n self.update_w()\n\n # for CHNMF it is sometimes useful to only compute\n # the basis vectors\n if self._compute_h:\n self.update_h()\n\n self.W = self.mdl.W\n self.H = self.mdl.H\n\n self.ferr = np.zeros(1)\n self.ferr[0] = self.mdl.frobenius_norm()\n self._print_cur_status(' Fro:' + str(self.ferr[0]))", "response": "Do factorization s. t. data = dot product of data beta and H"} {"SOURCE": "codesearchnet", "instruction": "Make a summary of the following Python 3 code\ndef resample_mx(X, incolpos, outcolpos):\n noutcols = len(outcolpos)\n Y = np.zeros((X.shape[0], noutcols))\n # assign 'end times' to final columns\n if outcolpos.max() > incolpos.max():\n incolpos = np.concatenate([incolpos,[outcolpos.max()]])\n X = np.concatenate([X, X[:,-1].reshape(X.shape[0],1)], axis=1)\n outcolpos = np.concatenate([outcolpos, [outcolpos[-1]]])\n # durations (default weights) of input columns)\n incoldurs = np.concatenate([np.diff(incolpos), [1]])\n\n for c in range(noutcols):\n firstincol = np.where(incolpos <= outcolpos[c])[0][-1]\n firstincolnext = np.where(incolpos < outcolpos[c+1])[0][-1]\n lastincol = max(firstincol,firstincolnext)\n # default weights\n wts = copy.deepcopy(incoldurs[firstincol:lastincol+1])\n # now fix up by partial overlap at ends\n if len(wts) > 1:\n wts[0] = wts[0] - (outcolpos[c] - incolpos[firstincol])\n wts[-1] = wts[-1] - (incolpos[lastincol+1] - outcolpos[c+1])\n wts = wts * 1. / float(sum(wts))\n Y[:,c] = np.dot(X[:,firstincol:lastincol+1], wts)\n # done\n return Y", "response": "Resample X to the given time boundaries incolpos and outcolpos."} {"SOURCE": "codesearchnet", "instruction": "Explain what the following Python 3 code does\ndef magnitude(X):\n r = np.real(X)\n i = np.imag(X)\n return np.sqrt(r * r + i * i);", "response": "Magnitude of a complex matrix."} {"SOURCE": "codesearchnet", "instruction": "Given the following Python 3 function, write the documentation\ndef json_to_bounds(segments_json):\n f = open(segments_json)\n segments = json.load(f)[\"segments\"]\n bounds = []\n for segment in segments:\n bounds.append(segment[\"start\"])\n bounds.append(bounds[-1] + segments[-1][\"duration\"]) # Add last boundary\n f.close()\n return np.asarray(bounds)", "response": "Extracts the boundaries from a json file and puts them into\n an np array."} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 script for\nextracting the boundaries from a json file and puts them into an np array.", "response": "def json_bounds_to_bounds(bounds_json):\n \"\"\"Extracts the boundaries from a bounds json file and puts them into\n an np array.\"\"\"\n f = open(bounds_json)\n segments = json.load(f)[\"bounds\"]\n bounds = []\n for segment in segments:\n bounds.append(segment[\"start\"])\n f.close()\n return np.asarray(bounds)"} {"SOURCE": "codesearchnet", "instruction": "Make a summary of the following Python 3 code\ndef json_to_labels(segments_json):\n f = open(segments_json)\n segments = json.load(f)[\"segments\"]\n labels = []\n str_labels = []\n for segment in segments:\n if not segment[\"label\"] in str_labels:\n str_labels.append(segment[\"label\"])\n labels.append(len(str_labels)-1)\n else:\n label_idx = np.where(np.asarray(str_labels) == segment[\"label\"])[0][0]\n labels.append(label_idx)\n f.close()\n return np.asarray(labels)", "response": "Extracts the labels from a json file and puts them into a np array."} {"SOURCE": "codesearchnet", "instruction": "Can you write a function in Python 3 where it\nextracts the beats from the beats_json_file and puts them into an np array.", "response": "def json_to_beats(beats_json_file):\n \"\"\"Extracts the beats from the beats_json_file and puts them into\n an np array.\"\"\"\n f = open(beats_json_file, \"r\")\n beats_json = json.load(f)\n beats = []\n for beat in beats_json[\"beats\"]:\n beats.append(beat[\"start\"])\n f.close()\n return np.asarray(beats)"} {"SOURCE": "codesearchnet", "instruction": "Given the following Python 3 function, write the documentation\ndef compute_ffmc2d(X):\n # 2d-fft\n fft2 = scipy.fftpack.fft2(X)\n\n # Magnitude\n fft2m = magnitude(fft2)\n\n # FFTshift and flatten\n fftshift = scipy.fftpack.fftshift(fft2m).flatten()\n\n #cmap = plt.cm.get_cmap('hot')\n #plt.imshow(np.log1p(scipy.fftpack.fftshift(fft2m)).T, interpolation=\"nearest\",\n # aspect=\"auto\", cmap=cmap)\n #plt.show()\n\n # Take out redundant components\n return fftshift[:fftshift.shape[0] // 2 + 1]", "response": "Computes the 2D - Fourier Magnitude Coefficients."} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 script to\nfactorize s. t. WH.", "response": "def factorize(self, niter=10, compute_w=True, compute_h=True,\n compute_err=True, show_progress=False):\n \"\"\" Factorize s.t. WH = data\n\n Parameters\n ----------\n niter : int\n number of iterations.\n show_progress : bool\n print some extra information to stdout.\n compute_h : bool\n iteratively update values for H.\n compute_w : bool\n iteratively update values for W.\n compute_err : bool\n compute Frobenius norm |data-WH| after each update and store\n it to .ferr[k].\n\n Updated Values\n --------------\n .W : updated values for W.\n .H : updated values for H.\n .ferr : Frobenius norm |data-WH| for each iteration.\n \"\"\"\n\n if not hasattr(self,'W'):\n self.init_w()\n\n if not hasattr(self,'H'):\n self.init_h()\n\n def separate_positive(m):\n return (np.abs(m) + m)/2.0\n\n def separate_negative(m):\n return (np.abs(m) - m)/2.0\n\n if show_progress:\n self._logger.setLevel(logging.INFO)\n else:\n self._logger.setLevel(logging.ERROR)\n\n XtX = np.dot(self.data[:,:].T, self.data[:,:])\n XtX_pos = separate_positive(XtX)\n XtX_neg = separate_negative(XtX)\n\n self.ferr = np.zeros(niter)\n # iterate over W and H\n\n for i in range(niter):\n # update H\n XtX_neg_x_W = np.dot(XtX_neg, self.G)\n XtX_pos_x_W = np.dot(XtX_pos, self.G)\n\n if compute_h:\n H_x_WT = np.dot(self.H.T, self.G.T)\n ha = XtX_pos_x_W + np.dot(H_x_WT, XtX_neg_x_W)\n hb = XtX_neg_x_W + np.dot(H_x_WT, XtX_pos_x_W) + 10**-9\n self.H = (self.H.T*np.sqrt(ha/hb)).T\n\n # update W\n if compute_w:\n HT_x_H = np.dot(self.H, self.H.T)\n wa = np.dot(XtX_pos, self.H.T) + np.dot(XtX_neg_x_W, HT_x_H)\n wb = np.dot(XtX_neg, self.H.T) + np.dot(XtX_pos_x_W, HT_x_H) + 10**-9\n\n self.G *= np.sqrt(wa/wb)\n self.W = np.dot(self.data[:,:], self.G)\n\n if compute_err:\n self.ferr[i] = self.frobenius_norm()\n self._logger.info('Iteration ' + str(i+1) + '/' + str(niter) +\n ' FN:' + str(self.ferr[i]))\n else:\n self._logger.info('Iteration ' + str(i+1) + '/' + str(niter))\n\n if i > 1 and compute_err:\n if self.converged(i):\n self.ferr = self.ferr[:i]\n break"} {"SOURCE": "codesearchnet", "instruction": "Can you tell what is the following Python 3 function doing\ndef cnmf(S, rank, niter=500, hull=False):\n if hull:\n nmf_mdl = pymf.CHNMF(S, num_bases=rank)\n else:\n nmf_mdl = pymf.CNMF(S, num_bases=rank)\n nmf_mdl.factorize(niter=niter)\n F = np.asarray(nmf_mdl.W)\n G = np.asarray(nmf_mdl.H)\n return F, G", "response": "Convex Non - Negative Matrix Factorization."} {"SOURCE": "codesearchnet", "instruction": "Here you have a function in Python 3, explain what it does\ndef compute_labels(X, rank, R, bound_idxs, niter=300):\n\n try:\n F, G = cnmf(X, rank, niter=niter, hull=False)\n except:\n return [1]\n\n label_frames = filter_activation_matrix(G.T, R)\n label_frames = np.asarray(label_frames, dtype=int)\n\n #labels = [label_frames[0]]\n labels = []\n bound_inters = zip(bound_idxs[:-1], bound_idxs[1:])\n for bound_inter in bound_inters:\n if bound_inter[1] - bound_inter[0] <= 0:\n labels.append(np.max(label_frames) + 1)\n else:\n labels.append(most_frequent(\n label_frames[bound_inter[0]: bound_inter[1]]))\n #print bound_inter, labels[-1]\n #labels.append(label_frames[-1])\n\n return labels", "response": "Computes the labels using the bounds."} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 script for\nfiltering the activation matrix G and returns a flattened copy.", "response": "def filter_activation_matrix(G, R):\n \"\"\"Filters the activation matrix G, and returns a flattened copy.\"\"\"\n\n #import pylab as plt\n #plt.imshow(G, interpolation=\"nearest\", aspect=\"auto\")\n #plt.show()\n\n idx = np.argmax(G, axis=1)\n max_idx = np.arange(G.shape[0])\n max_idx = (max_idx, idx.flatten())\n G[:, :] = 0\n G[max_idx] = idx + 1\n\n # TODO: Order matters?\n G = np.sum(G, axis=1)\n G = median_filter(G[:, np.newaxis], R)\n\n return G.flatten()"} {"SOURCE": "codesearchnet", "instruction": "Can you generate the documentation for the following Python 3 function\ndef get_segmentation(X, rank, R, rank_labels, R_labels, niter=300,\n bound_idxs=None, in_labels=None):\n \"\"\"\n Gets the segmentation (boundaries and labels) from the factorization\n matrices.\n\n Parameters\n ----------\n X: np.array()\n Features matrix (e.g. chromagram)\n rank: int\n Rank of decomposition\n R: int\n Size of the median filter for activation matrix\n niter: int\n Number of iterations for k-means\n bound_idxs : list\n Use previously found boundaries (None to detect them)\n in_labels : np.array()\n List of input labels (None to compute them)\n\n Returns\n -------\n bounds_idx: np.array\n Bound indeces found\n labels: np.array\n Indeces of the labels representing the similarity between segments.\n \"\"\"\n\n #import pylab as plt\n #plt.imshow(X, interpolation=\"nearest\", aspect=\"auto\")\n #plt.show()\n\n # Find non filtered boundaries\n compute_bounds = True if bound_idxs is None else False\n while True:\n if bound_idxs is None:\n try:\n F, G = cnmf(X, rank, niter=niter, hull=False)\n except:\n return np.empty(0), [1]\n\n # Filter G\n G = filter_activation_matrix(G.T, R)\n if bound_idxs is None:\n bound_idxs = np.where(np.diff(G) != 0)[0] + 1\n\n # Increase rank if we found too few boundaries\n if compute_bounds and len(np.unique(bound_idxs)) <= 2:\n rank += 1\n bound_idxs = None\n else:\n break\n\n # Add first and last boundary\n bound_idxs = np.concatenate(([0], bound_idxs, [X.shape[1] - 1]))\n bound_idxs = np.asarray(bound_idxs, dtype=int)\n if in_labels is None:\n labels = compute_labels(X, rank_labels, R_labels, bound_idxs,\n niter=niter)\n else:\n labels = np.ones(len(bound_idxs) - 1)\n\n #plt.imshow(G[:, np.newaxis], interpolation=\"nearest\", aspect=\"auto\")\n #for b in bound_idxs:\n #plt.axvline(b, linewidth=2.0, color=\"k\")\n #plt.show()\n\n return bound_idxs, labels", "response": "Returns the segmentation of a factorization matrix X with rank rank_labels R_labels and labels in in_labels."} {"SOURCE": "codesearchnet", "instruction": "Create a Python 3 function to\nprocess the flat data.", "response": "def processFlat(self):\n \"\"\"Main process.\n Returns\n -------\n est_idxs : np.array(N)\n Estimated indeces for the segment boundaries in frames.\n est_labels : np.array(N-1)\n Estimated labels for the segments.\n \"\"\"\n # C-NMF params\n niter = self.config[\"niters\"] # Iterations for the MF and clustering\n\n # Preprocess to obtain features, times, and input boundary indeces\n F = self._preprocess()\n\n # Normalize\n F = U.normalize(F, norm_type=self.config[\"norm_feats\"])\n\n if F.shape[0] >= self.config[\"h\"]:\n # Median filter\n F = median_filter(F, M=self.config[\"h\"])\n #plt.imshow(F.T, interpolation=\"nearest\", aspect=\"auto\"); plt.show()\n\n # Find the boundary indices and labels using matrix factorization\n est_idxs, est_labels = get_segmentation(\n F.T, self.config[\"rank\"], self.config[\"R\"],\n self.config[\"rank_labels\"], self.config[\"R_labels\"],\n niter=niter, bound_idxs=self.in_bound_idxs, in_labels=None)\n\n # Remove empty segments if needed\n est_idxs, est_labels = U.remove_empty_segments(est_idxs, est_labels)\n else:\n # The track is too short. We will only output the first and last\n # time stamps\n if self.in_bound_idxs is None:\n est_idxs = np.array([0, F.shape[0] - 1])\n est_labels = [1]\n else:\n est_idxs = self.in_bound_idxs\n est_labels = [1] * (len(est_idxs) + 1)\n\n # Make sure that the first and last boundaries are included\n assert est_idxs[0] == 0 and est_idxs[-1] == F.shape[0] - 1\n\n # Post process estimations\n est_idxs, est_labels = self._postprocess(est_idxs, est_labels)\n\n return est_idxs, est_labels"} {"SOURCE": "codesearchnet", "instruction": "Here you have a function in Python 3, explain what it does\ndef get_boundaries_module(boundaries_id):\n if boundaries_id == \"gt\":\n return None\n try:\n module = eval(algorithms.__name__ + \".\" + boundaries_id)\n except AttributeError:\n raise RuntimeError(\"Algorithm %s can not be found in msaf!\" %\n boundaries_id)\n if not module.is_boundary_type:\n raise RuntimeError(\"Algorithm %s can not identify boundaries!\" %\n boundaries_id)\n return module", "response": "Returns the object containing the selected boundary module."} {"SOURCE": "codesearchnet", "instruction": "Create a Python 3 function for\nreturning the label module given a label algorithm identificator.", "response": "def get_labels_module(labels_id):\n \"\"\"Obtains the label module given a label algorithm identificator.\n\n Parameters\n ----------\n labels_id: str\n Label algorithm identificator (e.g., fmc2d, cnmf).\n\n Returns\n -------\n module: object\n Object containing the selected label module.\n None for not computing the labeling part of music segmentation.\n \"\"\"\n if labels_id is None:\n return None\n try:\n module = eval(algorithms.__name__ + \".\" + labels_id)\n except AttributeError:\n raise RuntimeError(\"Algorithm %s can not be found in msaf!\" %\n labels_id)\n if not module.is_label_type:\n raise RuntimeError(\"Algorithm %s can not label segments!\" %\n labels_id)\n return module"} {"SOURCE": "codesearchnet", "instruction": "Implement a function in Python 3 to\nrun hierarchical algorithms with the specified identifiers on the audio_file.", "response": "def run_hierarchical(audio_file, bounds_module, labels_module, frame_times,\n config, annotator_id=0):\n \"\"\"Runs hierarchical algorithms with the specified identifiers on the\n audio_file. See run_algorithm for more information.\n \"\"\"\n # Sanity check\n if bounds_module is None:\n raise NoHierBoundaryError(\"A boundary algorithm is needed when using \"\n \"hierarchical segmentation.\")\n\n # Get features to make code nicer\n features = config[\"features\"].features\n\n # Compute boundaries\n S = bounds_module.Segmenter(audio_file, **config)\n est_idxs, est_labels = S.processHierarchical()\n\n # Compute labels if needed\n if labels_module is not None and \\\n bounds_module.__name__ != labels_module.__name__:\n # Compute labels for each level in the hierarchy\n flat_config = deepcopy(config)\n flat_config[\"hier\"] = False\n for i, level_idxs in enumerate(est_idxs):\n S = labels_module.Segmenter(audio_file,\n in_bound_idxs=level_idxs,\n **flat_config)\n est_labels[i] = S.processFlat()[1]\n\n # Make sure the first and last boundaries are included for each\n # level in the hierarchy\n est_times = []\n cleaned_est_labels = []\n for level in range(len(est_idxs)):\n est_level_times, est_level_labels = \\\n utils.process_segmentation_level(\n est_idxs[level], est_labels[level], features.shape[0],\n frame_times, config[\"features\"].dur)\n est_times.append(est_level_times)\n cleaned_est_labels.append(est_level_labels)\n est_labels = cleaned_est_labels\n\n return est_times, est_labels"} {"SOURCE": "codesearchnet", "instruction": "Can you create a Python 3 function that\nruns the flat algorithm with the specified identifiers on the the audio_file.", "response": "def run_flat(file_struct, bounds_module, labels_module, frame_times, config,\n annotator_id):\n \"\"\"Runs the flat algorithms with the specified identifiers on the\n audio_file. See run_algorithm for more information.\n \"\"\"\n # Get features to make code nicer\n features = config[\"features\"].features\n\n # Segment using the specified boundaries and labels\n # Case when boundaries and labels algorithms are the same\n if bounds_module is not None and labels_module is not None and \\\n bounds_module.__name__ == labels_module.__name__:\n S = bounds_module.Segmenter(file_struct, **config)\n est_idxs, est_labels = S.processFlat()\n # Different boundary and label algorithms\n else:\n # Identify segment boundaries\n if bounds_module is not None:\n S = bounds_module.Segmenter(file_struct, in_labels=[], **config)\n est_idxs, est_labels = S.processFlat()\n else:\n try:\n # Ground-truth boundaries\n est_times, est_labels = io.read_references(\n file_struct.audio_file, annotator_id=annotator_id)\n est_idxs = io.align_times(est_times, frame_times)\n if est_idxs[0] != 0:\n est_idxs = np.concatenate(([0], est_idxs))\n except IOError:\n logging.warning(\"No references found for file: %s\" %\n file_struct.audio_file)\n return [], []\n\n # Label segments\n if labels_module is not None:\n if len(est_idxs) == 2:\n est_labels = np.array([0])\n else:\n S = labels_module.Segmenter(file_struct,\n in_bound_idxs=est_idxs,\n **config)\n est_labels = S.processFlat()[1]\n\n # Make sure the first and last boundaries are included\n est_times, est_labels = utils.process_segmentation_level(\n est_idxs, est_labels, features.shape[0], frame_times,\n config[\"features\"].dur)\n\n return est_times, est_labels"} {"SOURCE": "codesearchnet", "instruction": "Given the following Python 3 function, write the documentation\ndef run_algorithms(file_struct, boundaries_id, labels_id, config,\n annotator_id=0):\n \"\"\"Runs the algorithms with the specified identifiers on the audio_file.\n\n Parameters\n ----------\n file_struct: `msaf.io.FileStruct`\n Object with the file paths.\n boundaries_id: str\n Identifier of the boundaries algorithm to use (\"gt\" for ground truth).\n labels_id: str\n Identifier of the labels algorithm to use (None for not labeling).\n config: dict\n Dictionary containing the custom parameters of the algorithms to use.\n annotator_id: int\n Annotator identificator in the ground truth.\n\n Returns\n -------\n est_times: np.array or list\n List of estimated times for the segment boundaries.\n If `list`, it will be a list of np.arrays, sorted by segmentation\n layer.\n est_labels: np.array or list\n List of all the labels associated segments.\n If `list`, it will be a list of np.arrays, sorted by segmentation\n layer.\n \"\"\"\n # Check that there are enough audio frames\n if config[\"features\"].features.shape[0] <= msaf.config.minimum_frames:\n logging.warning(\"Audio file too short, or too many few beats \"\n \"estimated. Returning empty estimations.\")\n return np.asarray([0, config[\"features\"].dur]), \\\n np.asarray([0], dtype=int)\n\n # Get the corresponding modules\n bounds_module = get_boundaries_module(boundaries_id)\n labels_module = get_labels_module(labels_id)\n\n # Get the correct frame times\n frame_times = config[\"features\"].frame_times\n\n # Segment audio based on type of segmentation\n run_fun = run_hierarchical if config[\"hier\"] else run_flat\n est_times, est_labels = run_fun(file_struct, bounds_module, labels_module,\n frame_times, config, annotator_id)\n\n return est_times, est_labels", "response": "Runs the algorithms with the specified identifiers on the audio_file."} {"SOURCE": "codesearchnet", "instruction": "Implement a Python 3 function for\npreparing the parameters runs the algorithms and saves the results in the estimator file.", "response": "def process_track(file_struct, boundaries_id, labels_id, config,\n annotator_id=0):\n \"\"\"Prepares the parameters, runs the algorithms, and saves results.\n\n Parameters\n ----------\n file_struct: `msaf.io.FileStruct`\n FileStruct containing the paths of the input files (audio file,\n features file, reference file, output estimation file).\n boundaries_id: str\n Identifier of the boundaries algorithm to use (\"gt\" for ground truth).\n labels_id: str\n Identifier of the labels algorithm to use (None for not labeling).\n config: dict\n Dictionary containing the custom parameters of the algorithms to use.\n annotator_id: int\n Annotator identificator in the ground truth.\n\n Returns\n -------\n est_times: np.array\n List of estimated times for the segment boundaries.\n est_labels: np.array\n List of all the labels associated segments.\n \"\"\"\n logging.info(\"Segmenting %s\" % file_struct.audio_file)\n\n # Get features\n config[\"features\"] = Features.select_features(\n config[\"feature\"], file_struct, config[\"annot_beats\"],\n config[\"framesync\"])\n\n # Get estimations\n est_times, est_labels = run_algorithms(file_struct,\n boundaries_id, labels_id, config,\n annotator_id=annotator_id)\n\n # Save\n logging.info(\"Writing results in: %s\" % file_struct.est_file)\n io.save_estimations(file_struct, est_times, est_labels,\n boundaries_id, labels_id, **config)\n\n return est_times, est_labels"} {"SOURCE": "codesearchnet", "instruction": "Create a Python 3 function to\nupdate W under the convexity of the class", "response": "def update_w(self):\n \"\"\" alternating least squares step, update W under the convexity\n constraint \"\"\"\n def update_single_w(i):\n \"\"\" compute single W[:,i] \"\"\"\n # optimize beta using qp solver from cvxopt\n FB = base.matrix(np.float64(np.dot(-self.data.T, W_hat[:,i])))\n be = solvers.qp(HB, FB, INQa, INQb, EQa, EQb)\n self.beta[i,:] = np.array(be['x']).reshape((1, self._num_samples))\n\n # float64 required for cvxopt\n HB = base.matrix(np.float64(np.dot(self.data[:,:].T, self.data[:,:])))\n EQb = base.matrix(1.0, (1, 1))\n W_hat = np.dot(self.data, pinv(self.H))\n INQa = base.matrix(-np.eye(self._num_samples))\n INQb = base.matrix(0.0, (self._num_samples, 1))\n EQa = base.matrix(1.0, (1, self._num_samples))\n\n for i in range(self._num_bases):\n update_single_w(i)\n\n self.W = np.dot(self.beta, self.data.T).T"} {"SOURCE": "codesearchnet", "instruction": "Can you tell what is the following Python 3 function doing\ndef main():\n '''\n Main Entry point for translator and argument parser\n '''\n args = command_line()\n translate = partial(translator, args.source, args.dest,\n version=' '.join([__version__, __build__]))\n\n return source(spool(set_task(translate, translit=args.translit)), args.text)", "response": "Main entry point for translator and argument parser\n "} {"SOURCE": "codesearchnet", "instruction": "Implement a Python 3 function for\ninitializing coroutine essentially priming it to the yield statement. Used as a decorator over functions that generate coroutines. .. code-block:: python # Basic coroutine producer/consumer pattern from translate import coroutine @coroutine def coroutine_foo(bar): try: while True: baz = (yield) bar.send(baz) except GeneratorExit: bar.close() :param func: Unprimed Generator :type func: Function :return: Initialized Coroutine :rtype: Function", "response": "def coroutine(func):\n \"\"\"\n Initializes coroutine essentially priming it to the yield statement.\n Used as a decorator over functions that generate coroutines.\n\n .. code-block:: python\n\n # Basic coroutine producer/consumer pattern\n from translate import coroutine\n\n @coroutine\n def coroutine_foo(bar):\n try:\n while True:\n baz = (yield)\n bar.send(baz)\n\n except GeneratorExit:\n bar.close()\n\n :param func: Unprimed Generator\n :type func: Function\n\n :return: Initialized Coroutine\n :rtype: Function\n \"\"\"\n\n @wraps(func)\n def initialization(*args, **kwargs):\n\n start = func(*args, **kwargs)\n next(start)\n\n return start\n\n return initialization"} {"SOURCE": "codesearchnet", "instruction": "Explain what the following Python 3 code does\ndef accumulator(init, update):\n return (\n init + len(update)\n if isinstance(init, int) else\n init + update\n )", "response": "Generic accumulator function.\n\n .. code-block:: python\n\n # Simplest Form\n >>> a = 'this' + ' '\n >>> b = 'that'\n >>> c = functools.reduce(accumulator, a, b)\n >>> c\n 'this that'\n\n # The type of the initial value determines output type.\n >>> a = 5\n >>> b = Hello\n >>> c = functools.reduce(accumulator, a, b)\n >>> c\n 10\n\n :param init: Initial Value\n :param update: Value to accumulate\n\n :return: Combined Values"} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 script for\nwriting a stream of text to stdout.", "response": "def write_stream(script, output='trans'):\n \"\"\"\n :param script: Translated Text\n :type script: Iterable\n\n :param output: Output Type (either 'trans' or 'translit')\n :type output: String\n \"\"\"\n first = operator.itemgetter(0)\n sentence, _ = script\n printer = partial(print, file=sys.stdout, end='')\n\n for line in sentence:\n if isinstance(first(line), str):\n printer(first(line))\n else:\n printer(first(line).encode('UTF-8'))\n\n printer('\\n')\n\n return sys.stdout.flush()"} {"SOURCE": "codesearchnet", "instruction": "Can you tell what is the following Python 3 function doing\ndef set_task(translator, translit=False):\n # Initialize Task Queue\n task = str()\n queue = list()\n\n # Function Partial\n output = ('translit' if translit else 'trans')\n stream = partial(write_stream, output=output)\n workers = ThreadPoolExecutor(max_workers=8)\n\n try:\n while True:\n\n task = yield\n queue.append(task)\n\n except GeneratorExit:\n list(map(stream, workers.map(translator, queue)))", "response": "This function sets the Task of the next available language."} {"SOURCE": "codesearchnet", "instruction": "Implement a Python 3 function for\nreading the entire sequence of words from the given iterable and spools them together for more io efficient processes.", "response": "def spool(iterable, maxlen=1250):\n \"\"\"\n Consumes text streams and spools them together for more io\n efficient processes.\n\n :param iterable: Sends text stream for further processing\n :type iterable: Coroutine\n\n :param maxlen: Maximum query string size\n :type maxlen: Integer\n \"\"\"\n words = int()\n text = str()\n\n try:\n while True:\n\n while words < maxlen:\n stream = yield\n text = reduce(accumulator, stream, text)\n words = reduce(accumulator, stream, words)\n\n iterable.send(text)\n words = int()\n text = str()\n\n except GeneratorExit:\n iterable.send(text)\n iterable.close()"} {"SOURCE": "codesearchnet", "instruction": "Make a summary of the following Python 3 code\ndef source(target, inputstream=sys.stdin):\n for line in inputstream:\n\n while len(line) > 600:\n init, sep, line = line.partition(' ')\n assert len(init) <= 600\n target.send(''.join([init, sep]))\n\n target.send(line)\n\n inputstream.close()\n\n return target.close()", "response": "Coroutine starting point. Produces text stream and forwards to consumers\n "} {"SOURCE": "codesearchnet", "instruction": "Make a summary of the following Python 3 code\ndef push_url(interface):\n '''\n Decorates a function returning the url of translation API.\n Creates and maintains HTTP connection state\n\n Returns a dict response object from the server containing the translated\n text and metadata of the request body\n\n :param interface: Callable Request Interface\n :type interface: Function\n '''\n\n @functools.wraps(interface)\n def connection(*args, **kwargs):\n \"\"\"\n Extends and wraps a HTTP interface.\n\n :return: Response Content\n :rtype: Dictionary\n \"\"\"\n session = Session()\n session.mount('http://', HTTPAdapter(max_retries=2))\n session.mount('https://', HTTPAdapter(max_retries=2))\n\n request = Request(**interface(*args, **kwargs))\n prepare = session.prepare_request(request)\n response = session.send(prepare, verify=True)\n\n if response.status_code != requests.codes.ok:\n response.raise_for_status()\n\n cleanup = re.subn(r',(?=,)', '', response.content.decode('utf-8'))[0]\n\n return json.loads(cleanup.replace(r'\\xA0', r' ').replace('[,', '[1,'), encoding='UTF-8')\n\n return connection", "response": "Decorator that creates a HTTP connection to the translation API and returns the url of the translation API."} {"SOURCE": "codesearchnet", "instruction": "How would you code a function in Python 3 to\ntranslate a source language code into a target language code.", "response": "def translator(source, target, phrase, version='0.0 test', charset='utf-8'):\n \"\"\"\n Returns the url encoded string that will be pushed to the translation\n server for parsing.\n\n List of acceptable language codes for source and target languages\n can be found as a JSON file in the etc directory.\n\n Some source languages are limited in scope of the possible target languages\n that are available.\n\n .. code-block:: python\n\n >>> from translate import translator\n >>> translator('en', 'zh-TW', 'Hello World!')\n '\u4f60\u597d\u4e16\u754c\uff01'\n\n :param source: Language code for translation source\n :type source: String\n\n :param target: Language code that source will be translate into\n :type target: String\n\n :param phrase: Text body string that will be url encoded and translated\n :type phrase: String\n\n :return: Request Interface\n :rtype: Dictionary\n \"\"\"\n\n url = 'https://translate.google.com/translate_a/single'\n agent = 'User-Agent', 'py-translate v{}'.format(version)\n content = 'Content-Type', 'application/json; charset={}'.format(charset)\n\n params = {'client': 'a', 'ie': charset, 'oe': charset,\n 'dt': 't', 'sl': source, 'tl': target, 'q': phrase}\n\n request = {'method': 'GET',\n 'url': url,\n 'params': params,\n 'headers': dict([agent, content])}\n\n return request"} {"SOURCE": "codesearchnet", "instruction": "Can you write a function in Python 3 where it\nreads in the language codes and prints them out.", "response": "def translation_table(language, filepath='supported_translations.json'):\n '''\n Opens up file located under the etc directory containing language\n codes and prints them out.\n\n :param file: Path to location of json file\n :type file: str\n\n :return: language codes\n :rtype: dict\n '''\n fullpath = abspath(join(dirname(__file__), 'etc', filepath))\n\n if not isfile(fullpath):\n raise IOError('File does not exist at {0}'.format(fullpath))\n\n with open(fullpath, 'rt') as fp:\n raw_data = json.load(fp).get(language, None)\n assert(raw_data is not None)\n\n return dict((code['language'], code['name']) for code in raw_data)"} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 function for\ngenerating a formatted table of language codes", "response": "def print_table(language):\n '''\n Generates a formatted table of language codes\n '''\n table = translation_table(language)\n\n for code, name in sorted(table.items(), key=operator.itemgetter(0)):\n print(u'{language:<8} {name:\\u3000<20}'.format(\n name=name, language=code\n ))\n\n return None"} {"SOURCE": "codesearchnet", "instruction": "Here you have a function in Python 3, explain what it does\ndef remove_nodes(network, rm_nodes):\n rm_nodes = set(rm_nodes)\n ndf = network.nodes_df\n edf = network.edges_df\n\n nodes_to_keep = ~ndf.index.isin(rm_nodes)\n edges_to_keep = ~(edf['from'].isin(rm_nodes) | edf['to'].isin(rm_nodes))\n\n return ndf.loc[nodes_to_keep], edf.loc[edges_to_keep]", "response": "Create DataFrames of nodes and edges that do not include specified nodes."} {"SOURCE": "codesearchnet", "instruction": "Can you tell what is the following Python 3 function doing\ndef network_to_pandas_hdf5(network, filename, rm_nodes=None):\n if rm_nodes is not None:\n nodes, edges = remove_nodes(network, rm_nodes)\n else:\n nodes, edges = network.nodes_df, network.edges_df\n\n with pd.HDFStore(filename, mode='w') as store:\n store['nodes'] = nodes\n store['edges'] = edges\n\n store['two_way'] = pd.Series([network._twoway])\n store['impedance_names'] = pd.Series(network.impedance_names)", "response": "Save a Network to a Pandas HDFStore."} {"SOURCE": "codesearchnet", "instruction": "Create a Python 3 function for\nbuilding a Network from data in a Pandas HDF5 file.", "response": "def network_from_pandas_hdf5(cls, filename):\n \"\"\"\n Build a Network from data in a Pandas HDFStore.\n\n Parameters\n ----------\n cls : class\n Class to instantiate, usually pandana.Network.\n filename : str\n\n Returns\n -------\n network : pandana.Network\n\n \"\"\"\n with pd.HDFStore(filename) as store:\n nodes = store['nodes']\n edges = store['edges']\n two_way = store['two_way'][0]\n imp_names = store['impedance_names'].tolist()\n\n return cls(\n nodes['x'], nodes['y'], edges['from'], edges['to'], edges[imp_names],\n twoway=two_way)"} {"SOURCE": "codesearchnet", "instruction": "Implement a function in Python 3 to\nreturn the bounding box for the nodes in this network", "response": "def bbox(self):\n \"\"\"\n The bounding box for nodes in this network [xmin, ymin, xmax, ymax]\n \"\"\"\n return [self.nodes_df.x.min(), self.nodes_df.y.min(),\n self.nodes_df.x.max(), self.nodes_df.y.max()]"} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 script for\ncharacterizing urban space with a variable that is related to nodes in the network. Parameters ---------- node_ids : Pandas Series, int A series of node_ids which are usually computed using get_node_ids on this object. variable : Pandas Series, numeric, optional A series which represents some variable defined in urban space. It could be the location of buildings, or the income of all households - just about anything can be aggregated using the network queries provided here and this provides the api to set the variable at its disaggregate locations. Note that node_id and variable should have the same index (although the index is not actually used). If variable is not set, then it is assumed that the variable is all \"ones\" at the location specified by node_ids. This could be, for instance, the location of all coffee shops which don't really have a variable to aggregate. The variable is connected to the closest node in the Pandana network which assumes no impedance between the location of the variable and the location of the closest network node. name : string, optional Name the variable. This is optional in the sense that if you don't specify it, the default name will be used. Since the same default name is used by aggregate on this object, you can alternate between characterize and aggregate calls without setting names. Returns ------- Nothing", "response": "def set(self, node_ids, variable=None, name=\"tmp\"):\n \"\"\"\n Characterize urban space with a variable that is related to nodes in\n the network.\n\n Parameters\n ----------\n node_ids : Pandas Series, int\n A series of node_ids which are usually computed using\n get_node_ids on this object.\n variable : Pandas Series, numeric, optional\n A series which represents some variable defined in urban space.\n It could be the location of buildings, or the income of all\n households - just about anything can be aggregated using the\n network queries provided here and this provides the api to set\n the variable at its disaggregate locations. Note that node_id\n and variable should have the same index (although the index is\n not actually used). If variable is not set, then it is assumed\n that the variable is all \"ones\" at the location specified by\n node_ids. This could be, for instance, the location of all\n coffee shops which don't really have a variable to aggregate. The\n variable is connected to the closest node in the Pandana network\n which assumes no impedance between the location of the variable\n and the location of the closest network node.\n name : string, optional\n Name the variable. This is optional in the sense that if you don't\n specify it, the default name will be used. Since the same\n default name is used by aggregate on this object, you can\n alternate between characterize and aggregate calls without\n setting names.\n\n Returns\n -------\n Nothing\n \"\"\"\n\n if variable is None:\n variable = pd.Series(np.ones(len(node_ids)), index=node_ids.index)\n\n df = pd.DataFrame({name: variable,\n \"node_idx\": self._node_indexes(node_ids)})\n\n length = len(df)\n df = df.dropna(how=\"any\")\n newl = len(df)\n if length-newl > 0:\n print(\n \"Removed %d rows because they contain missing values\" %\n (length-newl))\n\n self.variable_names.add(name)\n\n self.net.initialize_access_var(name.encode('utf-8'),\n df.node_idx.values.astype('int'),\n df[name].values.astype('double'))"} {"SOURCE": "codesearchnet", "instruction": "Make a summary of the following Python 3 code\ndef aggregate(self, distance, type=\"sum\", decay=\"linear\", imp_name=None,\n name=\"tmp\"):\n \"\"\"\n Aggregate information for every source node in the network - this is\n really the main purpose of this library. This allows you to touch\n the data specified by calling set and perform some aggregation on it\n within the specified distance. For instance, summing the population\n within 1000 meters.\n\n Parameters\n ----------\n distance : float\n The maximum distance to aggregate data within. 'distance' can\n represent any impedance unit that you have set as your edge\n weight. This will usually be a distance unit in meters however\n if you have customized the impedance this could be in other\n units such as utility or time etc.\n type : string\n The type of aggregation, can be one of \"ave\", \"sum\", \"std\",\n \"count\", and now \"min\", \"25pct\", \"median\", \"75pct\", and \"max\" will\n compute the associated quantiles. (Quantiles are computed by\n sorting so might be slower than the others.)\n decay : string\n The type of decay to apply, which makes things that are further\n away count less in the aggregation - must be one of \"linear\",\n \"exponential\" or \"flat\" (which means no decay). Linear is the\n fastest computation to perform. When performing an \"ave\",\n the decay is typically \"flat\"\n imp_name : string, optional\n The impedance name to use for the aggregation on this network.\n Must be one of the impedance names passed in the constructor of\n this object. If not specified, there must be only one impedance\n passed in the constructor, which will be used.\n name : string, optional\n The variable to aggregate. This variable will have been created\n and named by a call to set. If not specified, the default\n variable name will be used so that the most recent call to set\n without giving a name will be the variable used.\n\n Returns\n -------\n agg : Pandas Series\n Returns a Pandas Series for every origin node in the network,\n with the index which is the same as the node_ids passed to the\n init method and the values are the aggregations for each source\n node in the network.\n \"\"\"\n\n imp_num = self._imp_name_to_num(imp_name)\n type = type.lower()\n if type == \"ave\":\n type = \"mean\" # changed generic ave to mean\n\n assert name in self.variable_names, \"A variable with that name \" \\\n \"has not yet been initialized\"\n\n res = self.net.get_all_aggregate_accessibility_variables(distance,\n name.encode('utf-8'),\n type.encode('utf-8'),\n decay.encode('utf-8'),\n imp_num)\n\n return pd.Series(res, index=self.node_ids)", "response": "Aggregate the data for every source node in the network within the specified distance."} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 function for\nassigning node_ids to data specified by x_col and y_col Parameters ---------- x_col : Pandas series (float) A Pandas Series where values specify the x (e.g. longitude) location of dataset. y_col : Pandas series (float) A Pandas Series where values specify the y (e.g. latitude) location of dataset. x_col and y_col should use the same index. mapping_distance : float, optional The maximum distance that will be considered a match between the x, y data and the nearest node in the network. This will usually be a distance unit in meters however if you have customized the impedance this could be in other units such as utility or time etc. If not specified, every x, y coordinate will be mapped to the nearest node. Returns ------- node_ids : Pandas series (int) Returns a Pandas Series of node_ids for each x, y in the input data. The index is the same as the indexes of the x, y input data, and the values are the mapped node_ids. If mapping distance is not passed and if there are no nans in the x, y data, this will be the same length as the x, y data. If the mapping is imperfect, this function returns all the input x, y's that were successfully mapped to node_ids.", "response": "def get_node_ids(self, x_col, y_col, mapping_distance=None):\n \"\"\"\n Assign node_ids to data specified by x_col and y_col\n\n Parameters\n ----------\n x_col : Pandas series (float)\n A Pandas Series where values specify the x (e.g. longitude)\n location of dataset.\n y_col : Pandas series (float)\n A Pandas Series where values specify the y (e.g. latitude)\n location of dataset. x_col and y_col should use the same index.\n mapping_distance : float, optional\n The maximum distance that will be considered a match between the\n x, y data and the nearest node in the network. This will usually\n be a distance unit in meters however if you have customized the\n impedance this could be in other units such as utility or time\n etc. If not specified, every x, y coordinate will be mapped to\n the nearest node.\n\n Returns\n -------\n node_ids : Pandas series (int)\n Returns a Pandas Series of node_ids for each x, y in the\n input data. The index is the same as the indexes of the\n x, y input data, and the values are the mapped node_ids.\n If mapping distance is not passed and if there are no nans in the\n x, y data, this will be the same length as the x, y data.\n If the mapping is imperfect, this function returns all the\n input x, y's that were successfully mapped to node_ids.\n \"\"\"\n xys = pd.DataFrame({'x': x_col, 'y': y_col})\n\n distances, indexes = self.kdtree.query(xys.as_matrix())\n indexes = np.transpose(indexes)[0]\n distances = np.transpose(distances)[0]\n\n node_ids = self.nodes_df.iloc[indexes].index\n\n df = pd.DataFrame({\"node_id\": node_ids, \"distance\": distances},\n index=xys.index)\n\n if mapping_distance is not None:\n df = df[df.distance <= mapping_distance]\n\n return df.node_id"} {"SOURCE": "codesearchnet", "instruction": "How would you explain what the following Python 3 function does\ndef plot(\n self, data, bbox=None, plot_type='scatter',\n fig_kwargs=None, bmap_kwargs=None, plot_kwargs=None,\n cbar_kwargs=None):\n \"\"\"\n Plot an array of data on a map using matplotlib and Basemap,\n automatically matching the data to the Pandana network node positions.\n\n Keyword arguments are passed to the plotting routine.\n\n Parameters\n ----------\n data : pandas.Series\n Numeric data with the same length and index as the nodes\n in the network.\n bbox : tuple, optional\n (lat_min, lng_min, lat_max, lng_max)\n plot_type : {'hexbin', 'scatter'}, optional\n fig_kwargs : dict, optional\n Keyword arguments that will be passed to\n matplotlib.pyplot.subplots. Use this to specify things like\n figure size or background color.\n bmap_kwargs : dict, optional\n Keyword arguments that will be passed to the Basemap constructor.\n This can be used to specify a projection or coastline resolution.\n plot_kwargs : dict, optional\n Keyword arguments that will be passed to the matplotlib plotting\n command used. Use this to control plot styles and color maps used.\n cbar_kwargs : dict, optional\n Keyword arguments passed to the Basemap.colorbar method.\n Use this to control color bar location and label.\n\n Returns\n -------\n bmap : Basemap\n fig : matplotlib.Figure\n ax : matplotlib.Axes\n\n \"\"\"\n from mpl_toolkits.basemap import Basemap\n\n fig_kwargs = fig_kwargs or {}\n bmap_kwargs = bmap_kwargs or {}\n plot_kwargs = plot_kwargs or {}\n cbar_kwargs = cbar_kwargs or {}\n\n if not bbox:\n bbox = (\n self.nodes_df.y.min(),\n self.nodes_df.x.min(),\n self.nodes_df.y.max(),\n self.nodes_df.x.max())\n\n fig, ax = plt.subplots(**fig_kwargs)\n\n bmap = Basemap(\n bbox[1], bbox[0], bbox[3], bbox[2], ax=ax, **bmap_kwargs)\n bmap.drawcoastlines()\n bmap.drawmapboundary()\n\n x, y = bmap(self.nodes_df.x.values, self.nodes_df.y.values)\n\n if plot_type == 'scatter':\n plot = bmap.scatter(\n x, y, c=data.values, **plot_kwargs)\n elif plot_type == 'hexbin':\n plot = bmap.hexbin(\n x, y, C=data.values, **plot_kwargs)\n\n bmap.colorbar(plot, **cbar_kwargs)\n\n return bmap, fig, ax", "response": "Plots an array of data on a map using matplotlib and Basemap."} {"SOURCE": "codesearchnet", "instruction": "Can you tell what is the following Python 3 function doing\ndef set_pois(self, category, maxdist, maxitems, x_col, y_col):\n if category not in self.poi_category_names:\n self.poi_category_names.append(category)\n\n self.max_pois = maxitems\n\n node_ids = self.get_node_ids(x_col, y_col)\n\n self.poi_category_indexes[category] = node_ids.index\n\n node_idx = self._node_indexes(node_ids)\n\n self.net.initialize_category(maxdist, maxitems, category.encode('utf-8'), node_idx.values)", "response": "This method sets the location of all the pois of this category."} {"SOURCE": "codesearchnet", "instruction": "Can you generate a brief explanation for the following Python 3 code\ndef nearest_pois(self, distance, category, num_pois=1, max_distance=None,\n imp_name=None, include_poi_ids=False):\n \"\"\"\n Find the distance to the nearest pois from each source node. The\n bigger values in this case mean less accessibility.\n\n Parameters\n ----------\n distance : float\n The maximum distance to look for pois. This will usually be a\n distance unit in meters however if you have customized the\n impedance this could be in other units such as utility or time\n etc.\n category : string\n The name of the category of poi to look for\n num_pois : int\n The number of pois to look for, this also sets the number of\n columns in the DataFrame that gets returned\n max_distance : float, optional\n The value to set the distance to if there is NO poi within the\n specified distance - if not specified, gets set to distance. This\n will usually be a distance unit in meters however if you have\n customized the impedance this could be in other units such as\n utility or time etc.\n imp_name : string, optional\n The impedance name to use for the aggregation on this network.\n Must be one of the impedance names passed in the constructor of\n this object. If not specified, there must be only one impedance\n passed in the constructor, which will be used.\n include_poi_ids : bool, optional\n If this flag is set to true, the call will add columns to the\n return DataFrame - instead of just returning the distance for\n the nth POI, it will also return the id of that POI. The names\n of the columns with the poi ids will be poi1, poi2, etc - it\n will take roughly twice as long to include these ids as to not\n include them\n\n Returns\n -------\n d : Pandas DataFrame\n Like aggregate, this series has an index of all the node ids for\n the network. Unlike aggregate, this method returns a dataframe\n with the number of columns equal to the distances to the Nth\n closest poi. For instance, if you ask for the 10 closest poi to\n each node, column d[1] wil be the distance to the 1st closest poi\n of that category while column d[2] will be the distance to the 2nd\n closest poi, and so on.\n \"\"\"\n if max_distance is None:\n max_distance = distance\n\n if category not in self.poi_category_names:\n assert 0, \"Need to call set_pois for this category\"\n\n if num_pois > self.max_pois:\n assert 0, \"Asking for more pois than set in init_pois\"\n\n imp_num = self._imp_name_to_num(imp_name)\n\n dists, poi_ids = self.net.find_all_nearest_pois(\n distance,\n num_pois,\n category.encode('utf-8'),\n imp_num)\n dists[dists == -1] = max_distance\n\n df = pd.DataFrame(dists, index=self.node_ids)\n df.columns = list(range(1, num_pois+1))\n\n if include_poi_ids:\n df2 = pd.DataFrame(poi_ids, index=self.node_ids)\n df2.columns = [\"poi%d\" % i for i in range(1, num_pois+1)]\n for col in df2.columns:\n # if this is still all working according to plan at this point\n # the great magic trick is now to turn the integer position of\n # the poi, which is painstakingly returned from the c++ code,\n # and turn it into the actual index that was used when it was\n # initialized as a pandas series - this really is pandas-like\n # thinking. it's complicated on the inside, but quite\n # intuitive to the user I think\n s = df2[col].astype('int')\n df2[col] = self.poi_category_indexes[category].values[s]\n df2.loc[s == -1, col] = np.nan\n\n df = pd.concat([df, df2], axis=1)\n\n return df", "response": "This method returns a dataframe that contains the distance to the nearest POI from each source node."} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 function for\nreturning a list of low connectivity nodes in the network that are connected to more than some threshold within a given distance.", "response": "def low_connectivity_nodes(self, impedance, count, imp_name=None):\n \"\"\"\n Identify nodes that are connected to fewer than some threshold\n of other nodes within a given distance.\n\n Parameters\n ----------\n impedance : float\n Distance within which to search for other connected nodes. This\n will usually be a distance unit in meters however if you have\n customized the impedance this could be in other units such as\n utility or time etc.\n count : int\n Threshold for connectivity. If a node is connected to fewer\n than this many nodes within `impedance` it will be identified\n as \"low connectivity\".\n imp_name : string, optional\n The impedance name to use for the aggregation on this network.\n Must be one of the impedance names passed in the constructor of\n this object. If not specified, there must be only one impedance\n passed in the constructor, which will be used.\n\n Returns\n -------\n node_ids : array\n List of \"low connectivity\" node IDs.\n\n \"\"\"\n # set a counter variable on all nodes\n self.set(self.node_ids.to_series(), name='counter')\n\n # count nodes within impedance range\n agg = self.aggregate(\n impedance, type='count', imp_name=imp_name, name='counter')\n\n return np.array(agg[agg < count].index)"} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 script to\ncreate a Pandana network from a bounding box.", "response": "def pdna_network_from_bbox(\n lat_min=None, lng_min=None, lat_max=None, lng_max=None, bbox=None,\n network_type='walk', two_way=True,\n timeout=180, memory=None, max_query_area_size=50 * 1000 * 50 * 1000):\n \"\"\"\n Make a Pandana network from a bounding lat/lon box\n request to the Overpass API. Distance will be in the default units meters.\n\n Parameters\n ----------\n lat_min, lng_min, lat_max, lng_max : float\n bbox : tuple\n Bounding box formatted as a 4 element tuple:\n (lng_max, lat_min, lng_min, lat_max)\n network_type : {'walk', 'drive'}, optional\n Specify whether the network will be used for walking or driving.\n A value of 'walk' attempts to exclude things like freeways,\n while a value of 'drive' attempts to exclude things like\n bike and walking paths.\n two_way : bool, optional\n Whether the routes are two-way. If True, node pairs will only\n occur once.\n timeout : int, optional\n the timeout interval for requests and to pass to Overpass API\n memory : int, optional\n server memory allocation size for the query, in bytes.\n If none, server will use its default allocation size\n max_query_area_size : float, optional\n max area for any part of the geometry, in the units the geometry is in\n\n Returns\n -------\n network : pandana.Network\n\n \"\"\"\n\n nodes, edges = network_from_bbox(lat_min=lat_min, lng_min=lng_min,\n lat_max=lat_max, lng_max=lng_max,\n bbox=bbox, network_type=network_type,\n two_way=two_way, timeout=timeout,\n memory=memory,\n max_query_area_size=max_query_area_size)\n\n return Network(\n nodes['x'], nodes['y'],\n edges['from'], edges['to'], edges[['distance']])"} {"SOURCE": "codesearchnet", "instruction": "Can you tell what is the following Python 3 function doing\ndef process_node(e):\n\n uninteresting_tags = {\n 'source',\n 'source_ref',\n 'source:ref',\n 'history',\n 'attribution',\n 'created_by',\n 'tiger:tlid',\n 'tiger:upload_uuid',\n }\n\n node = {\n 'id': e['id'],\n 'lat': e['lat'],\n 'lon': e['lon']\n }\n\n if 'tags' in e:\n for t, v in list(e['tags'].items()):\n if t not in uninteresting_tags:\n node[t] = v\n\n return node", "response": "Process a node element entry into a dict suitable for going into a Pandas DataFrame."} {"SOURCE": "codesearchnet", "instruction": "Make a summary of the following Python 3 code\ndef make_osm_query(query):\n osm_url = 'http://www.overpass-api.de/api/interpreter'\n req = requests.get(osm_url, params={'data': query})\n req.raise_for_status()\n\n return req.json()", "response": "Make a request to OSM and return the parsed JSON."} {"SOURCE": "codesearchnet", "instruction": "How would you explain what the following Python 3 function does\ndef build_node_query(lat_min, lng_min, lat_max, lng_max, tags=None):\n if tags is not None:\n if isinstance(tags, str):\n tags = [tags]\n tags = ''.join('[{}]'.format(t) for t in tags)\n else:\n tags = ''\n\n query_fmt = (\n '[out:json];'\n '('\n ' node'\n ' {tags}'\n ' ({lat_min},{lng_min},{lat_max},{lng_max});'\n ');'\n 'out;')\n\n return query_fmt.format(\n lat_min=lat_min, lng_min=lng_min, lat_max=lat_max, lng_max=lng_max,\n tags=tags)", "response": "Builds a string for a node - based OSM query."} {"SOURCE": "codesearchnet", "instruction": "Can you implement a function in Python 3 that\nsearches for OSM nodes within a bounding box that match given tags.", "response": "def node_query(lat_min, lng_min, lat_max, lng_max, tags=None):\n \"\"\"\n Search for OSM nodes within a bounding box that match given tags.\n\n Parameters\n ----------\n lat_min, lng_min, lat_max, lng_max : float\n tags : str or list of str, optional\n Node tags that will be used to filter the search.\n See http://wiki.openstreetmap.org/wiki/Overpass_API/Language_Guide\n for information about OSM Overpass queries\n and http://wiki.openstreetmap.org/wiki/Map_Features\n for a list of tags.\n\n Returns\n -------\n nodes : pandas.DataFrame\n Will have 'lat' and 'lon' columns, plus other columns for the\n tags associated with the node (these will vary based on the query).\n Index will be the OSM node IDs.\n\n \"\"\"\n node_data = make_osm_query(build_node_query(\n lat_min, lng_min, lat_max, lng_max, tags=tags))\n\n if len(node_data['elements']) == 0:\n raise RuntimeError('OSM query results contain no data.')\n\n nodes = [process_node(n) for n in node_data['elements']]\n return pd.DataFrame.from_records(nodes, index='id')"} {"SOURCE": "codesearchnet", "instruction": "Can you generate the documentation for the following Python 3 function\ndef equal(x, y):\n if PY_3:\n return test_case().assertEqual(x, y) or True\n\n assert x == y", "response": "Shortcut function for unittest. TestCase. assertEqual."} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 function for\nreturning True if x matches y.", "response": "def matches(x, y, regex_expr=False):\n \"\"\"\n Tries to match a regular expression value ``x`` against ``y``.\n Aliast``unittest.TestCase.assertEqual()``\n\n Arguments:\n x (regex|str): regular expression to test.\n y (str): value to match.\n regex_expr (bool): enables regex string based expression matching.\n\n Raises:\n AssertionError: in case of mismatching.\n\n Returns:\n bool\n \"\"\"\n # Parse regex expression, if needed\n x = strip_regex(x) if regex_expr and isregex_expr(x) else x\n\n # Run regex assertion\n if PY_3:\n # Retrieve original regex pattern\n x = x.pattern if isregex(x) else x\n # Assert regular expression via unittest matchers\n return test_case().assertRegex(y, x) or True\n\n # Primitive regex matching for Python 2.7\n if isinstance(x, str):\n x = re.compile(x, re.IGNORECASE)\n\n assert x.match(y) is not None"} {"SOURCE": "codesearchnet", "instruction": "How would you code a function in Python 3 to\nreturn True if the given expression value is a regular expression with prefix re and suffix re and otherwise False.", "response": "def isregex_expr(expr):\n \"\"\"\n Returns ``True`` is the given expression value is a regular expression\n like string with prefix ``re/`` and suffix ``/``, otherwise ``False``.\n\n Arguments:\n expr (mixed): expression value to test.\n\n Returns:\n bool\n \"\"\"\n if not isinstance(expr, str):\n return False\n\n return all([\n len(expr) > 3,\n expr.startswith('re/'),\n expr.endswith('/')\n ])"} {"SOURCE": "codesearchnet", "instruction": "Create a Python 3 function for\nreturning True if the input value is a native regular expression object otherwise False.", "response": "def isregex(value):\n \"\"\"\n Returns ``True`` if the input argument object is a native\n regular expression object, otherwise ``False``.\n\n Arguments:\n value (mixed): input value to test.\n\n Returns:\n bool\n \"\"\"\n if not value:\n return False\n return any((isregex_expr(value), isinstance(value, retype)))"} {"SOURCE": "codesearchnet", "instruction": "How would you code a function in Python 3 to\ncompare two values with regular expression matching support.", "response": "def compare(self, value, expectation, regex_expr=False):\n \"\"\"\n Compares two values with regular expression matching support.\n\n Arguments:\n value (mixed): value to compare.\n expectation (mixed): value to match.\n regex_expr (bool, optional): enables string based regex matching.\n\n Returns:\n bool\n \"\"\"\n return compare(value, expectation, regex_expr=regex_expr)"} {"SOURCE": "codesearchnet", "instruction": "Given the following Python 3 function, write the documentation\ndef fluent(fn):\n @functools.wraps(fn)\n def wrapper(self, *args, **kw):\n # Trigger method proxy\n result = fn(self, *args, **kw)\n # Return self instance or method result\n return self if result is None else result\n return wrapper", "response": "Decorator allowing easy method chaining."} {"SOURCE": "codesearchnet", "instruction": "Explain what the following Python 3 code does\ndef compare(expr, value, regex_expr=False):\n # Strict equality comparison\n if expr == value:\n return True\n\n # Infer negate expression to match, if needed\n negate = False\n if isinstance(expr, str):\n negate = expr.startswith(NEGATE)\n expr = strip_negate(expr) if negate else expr\n\n try:\n # RegExp or strict equality comparison\n test(expr, value, regex_expr=regex_expr)\n except Exception as err:\n if negate:\n return True\n else:\n raise err\n\n return True", "response": "Compares an expression against a given value."} {"SOURCE": "codesearchnet", "instruction": "Can you generate a brief explanation for the following Python 3 code\ndef trigger_methods(instance, args):\n # Start the magic\n for name in sorted(args):\n value = args[name]\n target = instance\n\n # If response attibutes\n if name.startswith('response_') or name.startswith('reply_'):\n name = name.replace('response_', '').replace('reply_', '')\n # If instance has response attribute, use it\n if hasattr(instance, '_response'):\n target = instance._response\n\n # Retrieve class member for inspection and future use\n member = getattr(target, name, None)\n\n # Is attribute\n isattr = name in dir(target)\n iscallable = ismethod(member) and not isfunction(member)\n\n if not iscallable and not isattr:\n raise PookInvalidArgument('Unsupported argument: {}'.format(name))\n\n # Set attribute or trigger method\n if iscallable:\n member(value)\n else:\n setattr(target, name, value)", "response": "Triggers specific class methods using a simple reflection mechanism based on the input dictionary params."} {"SOURCE": "codesearchnet", "instruction": "Create a Python 3 function to\nmatch the given HTTP request instance against the registered sequence of registered trust classes.", "response": "def match(self, request):\n \"\"\"\n Match the given HTTP request instance against the registered\n matcher functions in the current engine.\n\n Arguments:\n request (pook.Request): outgoing request to match.\n\n Returns:\n tuple(bool, list[Exception]): ``True`` if all matcher tests\n passes, otherwise ``False``. Also returns an optional list\n of error exceptions.\n \"\"\"\n errors = []\n\n def match(matcher):\n try:\n return matcher.match(request)\n except Exception as err:\n err = '{}: {}'.format(type(matcher).__name__, err)\n errors.append(err)\n return False\n\n return all([match(matcher) for matcher in self]), errors"} {"SOURCE": "codesearchnet", "instruction": "Implement a function in Python 3 to\nreturn a matcher instance by class or alias name.", "response": "def get(name):\n \"\"\"\n Returns a matcher instance by class or alias name.\n\n Arguments:\n name (str): matcher class name or alias.\n\n Returns:\n matcher: found matcher instance, otherwise ``None``.\n \"\"\"\n for matcher in matchers:\n if matcher.__name__ == name or getattr(matcher, 'name', None) == name:\n return matcher"} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 function for\ninitializing a new matcher instance passing variadic arguments to its constructor. Acts as a delegator proxy.", "response": "def init(name, *args):\n \"\"\"\n Initializes a matcher instance passing variadic arguments to\n its constructor. Acts as a delegator proxy.\n\n Arguments:\n name (str): matcher class name or alias to execute.\n *args (mixed): variadic argument\n\n Returns:\n matcher: matcher instance.\n\n Raises:\n ValueError: if matcher was not found.\n \"\"\"\n matcher = get(name)\n if not matcher:\n raise ValueError('Cannot find matcher: {}'.format(name))\n return matcher(*args)"} {"SOURCE": "codesearchnet", "instruction": "Here you have a function in Python 3, explain what it does\ndef header(self, key, value):\n if type(key) is tuple:\n key, value = str(key[0]), key[1]\n\n headers = {key: value}\n self._headers.extend(headers)", "response": "Defines a new response header."} {"SOURCE": "codesearchnet", "instruction": "Implement a function in Python 3 to\ndefine the response body data.", "response": "def body(self, body):\n \"\"\"\n Defines response body data.\n\n Arguments:\n body (str|bytes): response body to use.\n\n Returns:\n self: ``pook.Response`` current instance.\n \"\"\"\n if isinstance(body, bytes):\n body = body.decode('utf-8')\n\n self._body = body"} {"SOURCE": "codesearchnet", "instruction": "Can you generate a brief explanation for the following Python 3 code\ndef json(self, data):\n self._headers['Content-Type'] = 'application/json'\n if not isinstance(data, str):\n data = json.dumps(data, indent=4)\n self._body = data", "response": "Defines the mock response JSON body."} {"SOURCE": "codesearchnet", "instruction": "Make a summary of the following Python 3 code\ndef set(self, key, val):\n key_lower = key.lower()\n new_vals = key, val\n # Keep the common case aka no item present as fast as possible\n vals = self._container.setdefault(key_lower, new_vals)\n if new_vals is not vals:\n self._container[key_lower] = [vals[0], vals[1], val]", "response": "Sets a header field with the given value removing previous values."} {"SOURCE": "codesearchnet", "instruction": "Can you generate a brief explanation for the following Python 3 code\ndef _append_funcs(target, items):\n [target.append(item) for item in items\n if isfunction(item) or ismethod(item)]", "response": "Helper function to append functions into a given list."} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 script for\ntriggering request mock definition methods dynamically based on input keyword arguments passed to pook. Mock constructor.", "response": "def _trigger_request(instance, request):\n \"\"\"\n Triggers request mock definition methods dynamically based on input\n keyword arguments passed to `pook.Mock` constructor.\n\n This is used to provide a more Pythonic interface vs chainable API\n approach.\n \"\"\"\n if not isinstance(request, Request):\n raise TypeError('request must be instance of pook.Request')\n\n # Register request matchers\n for key in request.keys:\n if hasattr(instance, key):\n getattr(instance, key)(getattr(request, key))"} {"SOURCE": "codesearchnet", "instruction": "Create a Python 3 function to\nset the mock URL to match.", "response": "def url(self, url):\n \"\"\"\n Defines the mock URL to match.\n It can be a full URL with path and query params.\n\n Protocol schema is optional, defaults to ``http://``.\n\n Arguments:\n url (str): mock URL to match. E.g: ``server.com/api``.\n\n Returns:\n self: current Mock instance.\n \"\"\"\n self._request.url = url\n self.add_matcher(matcher('URLMatcher', url))"} {"SOURCE": "codesearchnet", "instruction": "Explain what the following Python 3 code does\ndef method(self, method):\n self._request.method = method\n self.add_matcher(matcher('MethodMatcher', method))", "response": "Sets the HTTP method to match."} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 script for\ndefining a URL path to match.", "response": "def path(self, path):\n \"\"\"\n Defines a URL path to match.\n\n Only call this method if the URL has no path already defined.\n\n Arguments:\n path (str): URL path value to match. E.g: ``/api/users``.\n\n Returns:\n self: current Mock instance.\n \"\"\"\n url = furl(self._request.rawurl)\n url.path = path\n self._request.url = url.url\n self.add_matcher(matcher('PathMatcher', path))"} {"SOURCE": "codesearchnet", "instruction": "Make a summary of the following Python 3 code\ndef header(self, name, value):\n headers = {name: value}\n self._request.headers = headers\n self.add_matcher(matcher('HeadersMatcher', headers))", "response": "Defines a header name and value to match."} {"SOURCE": "codesearchnet", "instruction": "Explain what the following Python 3 code does\ndef headers(self, headers=None, **kw):\n headers = kw if kw else headers\n self._request.headers = headers\n self.add_matcher(matcher('HeadersMatcher', headers))", "response": "Sets the headers of the current request."} {"SOURCE": "codesearchnet", "instruction": "How would you code a function in Python 3 to\ndefine a new header matcher expectation that must be present in the current Mock instance.", "response": "def header_present(self, *names):\n \"\"\"\n Defines a new header matcher expectation that must be present in the\n outgoing request in order to be satisfied, no matter what value it\n hosts.\n\n Header keys are case insensitive.\n\n Arguments:\n *names (str): header or headers names to match.\n\n Returns:\n self: current Mock instance.\n\n Example::\n\n (pook.get('server.com/api')\n .header_present('content-type'))\n \"\"\"\n for name in names:\n headers = {name: re.compile('(.*)')}\n self.add_matcher(matcher('HeadersMatcher', headers))"} {"SOURCE": "codesearchnet", "instruction": "Can you tell what is the following Python 3 function doing\ndef headers_present(self, headers):\n headers = {name: re.compile('(.*)') for name in headers}\n self.add_matcher(matcher('HeadersMatcher', headers))", "response": "Sets the matcher for the set of headers that must be present in the current Mock instance."} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 script for\ndefining the Content - Type outgoing header value to match.", "response": "def content(self, value):\n \"\"\"\n Defines the ``Content-Type`` outgoing header value to match.\n\n You can pass one of the following type aliases instead of the full\n MIME type representation:\n\n - ``json`` = ``application/json``\n - ``xml`` = ``application/xml``\n - ``html`` = ``text/html``\n - ``text`` = ``text/plain``\n - ``urlencoded`` = ``application/x-www-form-urlencoded``\n - ``form`` = ``application/x-www-form-urlencoded``\n - ``form-data`` = ``application/x-www-form-urlencoded``\n\n Arguments:\n value (str): type alias or header value to match.\n\n Returns:\n self: current Mock instance.\n \"\"\"\n header = {'Content-Type': TYPES.get(value, value)}\n self._request.headers = header\n self.add_matcher(matcher('HeadersMatcher', header))"} {"SOURCE": "codesearchnet", "instruction": "Make a summary of the following Python 3 code\ndef params(self, params):\n url = furl(self._request.rawurl)\n url = url.add(params)\n self._request.url = url.url\n self.add_matcher(matcher('QueryMatcher', params))", "response": "Defines a set of URL query params to match."} {"SOURCE": "codesearchnet", "instruction": "Implement a Python 3 function for\nsetting the body data to match.", "response": "def body(self, body):\n \"\"\"\n Defines the body data to match.\n\n ``body`` argument can be a ``str``, ``binary`` or a regular expression.\n\n Arguments:\n body (str|binary|regex): body data to match.\n\n Returns:\n self: current Mock instance.\n \"\"\"\n self._request.body = body\n self.add_matcher(matcher('BodyMatcher', body))"} {"SOURCE": "codesearchnet", "instruction": "Given the following Python 3 function, write the documentation\ndef json(self, json):\n self._request.json = json\n self.add_matcher(matcher('JSONMatcher', json))", "response": "Sets the JSON body to match."} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 script for\nsetting the current request s XML body value.", "response": "def xml(self, xml):\n \"\"\"\n Defines a XML body value to match.\n\n Arguments:\n xml (str|regex): body XML to match.\n\n Returns:\n self: current Mock instance.\n \"\"\"\n self._request.xml = xml\n self.add_matcher(matcher('XMLMatcher', xml))"} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 function that can\nread the body from a disk file.", "response": "def file(self, path):\n \"\"\"\n Reads the body to match from a disk file.\n\n Arguments:\n path (str): relative or absolute path to file to read from.\n\n Returns:\n self: current Mock instance.\n \"\"\"\n with open(path, 'r') as f:\n self.body(str(f.read()))"} {"SOURCE": "codesearchnet", "instruction": "Can you create a Python 3 function that\nenables persistent mode for the current mock instance.", "response": "def persist(self, status=None):\n \"\"\"\n Enables persistent mode for the current mock.\n\n Returns:\n self: current Mock instance.\n \"\"\"\n self._persist = status if type(status) is bool else True"} {"SOURCE": "codesearchnet", "instruction": "Can you generate a brief explanation for the following Python 3 code\ndef error(self, error):\n self._error = RuntimeError(error) if isinstance(error, str) else error", "response": "Defines a simulated exception that will be raised."} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 function that can\ndefine the mock response for this instance.", "response": "def reply(self, status=200, new_response=False, **kw):\n \"\"\"\n Defines the mock response.\n\n Arguments:\n status (int, optional): response status code. Defaults to ``200``.\n **kw (dict): optional keyword arguments passed to ``pook.Response``\n constructor.\n\n Returns:\n pook.Response: mock response definition instance.\n \"\"\"\n # Use or create a Response mock instance\n res = Response(**kw) if new_response else self._response\n # Define HTTP mandatory response status\n res.status(status or res._status)\n # Expose current mock instance in response for self-reference\n res.mock = self\n # Define mock response\n self._response = res\n # Return response\n return res"} {"SOURCE": "codesearchnet", "instruction": "Can you write a function in Python 3 where it\nmatches an incoming HTTP request against the current mock matchers.", "response": "def match(self, request):\n \"\"\"\n Matches an outgoing HTTP request against the current mock matchers.\n\n This method acts like a delegator to `pook.MatcherEngine`.\n\n Arguments:\n request (pook.Request): request instance to match.\n\n Raises:\n Exception: if the mock has an exception defined.\n\n Returns:\n tuple(bool, list[Exception]): ``True`` if the mock matches\n the outgoing HTTP request, otherwise ``False``. Also returns\n an optional list of error exceptions.\n \"\"\"\n # If mock already expired, fail it\n if self._times <= 0:\n raise PookExpiredMock('Mock expired')\n\n # Trigger mock filters\n for test in self.filters:\n if not test(request, self):\n return False, []\n\n # Trigger mock mappers\n for mapper in self.mappers:\n request = mapper(request, self)\n if not request:\n raise ValueError('map function must return a request object')\n\n # Match incoming request against registered mock matchers\n matches, errors = self.matchers.match(request)\n\n # If not matched, return False\n if not matches:\n return False, errors\n\n # Register matched request for further inspecion and reference\n self._calls.append(request)\n\n # Increase mock call counter\n self._matches += 1\n if not self._persist:\n self._times -= 1\n\n # Raise simulated error\n if self._error:\n raise self._error\n\n # Trigger callback when matched\n for callback in self.callbacks:\n callback(request, self)\n\n return True, []"} {"SOURCE": "codesearchnet", "instruction": "Explain what the following Python 3 code does\ndef activate_async(fn, _engine):\n @coroutine\n @functools.wraps(fn)\n def wrapper(*args, **kw):\n _engine.activate()\n try:\n if iscoroutinefunction(fn):\n yield from fn(*args, **kw) # noqa\n else:\n fn(*args, **kw)\n finally:\n _engine.disable()\n\n return wrapper", "response": "A decorator that activates the given function asynchronously."} {"SOURCE": "codesearchnet", "instruction": "Implement a function in Python 3 to\nset a custom mock interceptor engine that will be used to generate the HTTP traffic for the specified object.", "response": "def set_mock_engine(self, engine):\n \"\"\"\n Sets a custom mock engine, replacing the built-in one.\n\n This is particularly useful if you want to replace the built-in\n HTTP traffic mock interceptor engine with your custom one.\n\n For mock engine implementation details, see `pook.MockEngine`.\n\n Arguments:\n engine (pook.MockEngine): custom mock engine to use.\n \"\"\"\n if not engine:\n raise TypeError('engine must be a valid object')\n\n # Instantiate mock engine\n mock_engine = engine(self)\n\n # Validate minimum viable interface\n methods = ('activate', 'disable')\n if not all([hasattr(mock_engine, method) for method in methods]):\n raise NotImplementedError('engine must implementent the '\n 'required methods')\n\n # Use the custom mock engine\n self.mock_engine = mock_engine\n\n # Enable mock engine, if needed\n if self.active:\n self.mock_engine.activate()"} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 script for\nenabling real networking mode for the specified list of hostnames.", "response": "def enable_network(self, *hostnames):\n \"\"\"\n Enables real networking mode, optionally passing one or multiple\n hostnames that would be used as filter.\n\n If at least one hostname matches with the outgoing traffic, the\n request will be executed via the real network.\n\n Arguments:\n *hostnames: optional list of host names to enable real network\n against them. hostname value can be a regular expression.\n \"\"\"\n def hostname_filter(hostname, req):\n if isregex(hostname):\n return hostname.match(req.url.hostname)\n return req.url.hostname == hostname\n\n for hostname in hostnames:\n self.use_network_filter(partial(hostname_filter, hostname))\n\n self.networking = True"} {"SOURCE": "codesearchnet", "instruction": "How would you code a function in Python 3 to\ncreate and registers a new HTTP mock instance for the current engine.", "response": "def mock(self, url=None, **kw):\n \"\"\"\n Creates and registers a new HTTP mock in the current engine.\n\n Arguments:\n url (str): request URL to mock.\n activate (bool): force mock engine activation.\n Defaults to ``False``.\n **kw (mixed): variadic keyword arguments for ``Mock`` constructor.\n\n Returns:\n pook.Mock: new mock instance.\n \"\"\"\n # Activate mock engine, if explicitly requested\n if kw.get('activate'):\n kw.pop('activate')\n self.activate()\n\n # Create the new HTTP mock expectation\n mock = Mock(url=url, **kw)\n # Expose current engine instance via mock\n mock._engine = self\n # Register the mock in the current engine\n self.add_mock(mock)\n\n # Return it for consumer satisfaction\n return mock"} {"SOURCE": "codesearchnet", "instruction": "Can you create a Python 3 function that\nremoves a specific mock instance by object reference.", "response": "def remove_mock(self, mock):\n \"\"\"\n Removes a specific mock instance by object reference.\n\n Arguments:\n mock (pook.Mock): mock instance to remove.\n \"\"\"\n self.mocks = [m for m in self.mocks if m is not mock]"} {"SOURCE": "codesearchnet", "instruction": "Can you create a Python 3 function that\nactivates the registered interceptors in the mocking engine.", "response": "def activate(self):\n \"\"\"\n Activates the registered interceptors in the mocking engine.\n\n This means any HTTP traffic captures by those interceptors will\n trigger the HTTP mock matching engine in order to determine if a given\n HTTP transaction should be mocked out or not.\n \"\"\"\n if self.active:\n return None\n\n # Activate mock engine\n self.mock_engine.activate()\n # Enable engine state\n self.active = True"} {"SOURCE": "codesearchnet", "instruction": "Can you write a function in Python 3 where it\ndisables interceptors and stops intercepting any outgoing HTTP traffic.", "response": "def disable(self):\n \"\"\"\n Disables interceptors and stops intercepting any outgoing HTTP traffic.\n \"\"\"\n if not self.active:\n return None\n\n # Disable current mock engine\n self.mock_engine.disable()\n # Disable engine state\n self.active = False"} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 function that can\nverify if real networking mode should be used for the given HTTP request.", "response": "def should_use_network(self, request):\n \"\"\"\n Verifies if real networking mode should be used for the given\n request, passing it to the registered network filters.\n\n Arguments:\n request (pook.Request): outgoing HTTP request to test.\n\n Returns:\n bool\n \"\"\"\n return (self.networking and\n all((fn(request) for fn in self.network_filters)))"} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 script for\nmatching a given request instance contract against the registered mock definitions.", "response": "def match(self, request):\n \"\"\"\n Matches a given Request instance contract against the registered mocks.\n\n If a mock passes all the matchers, its response will be returned.\n\n Arguments:\n request (pook.Request): Request contract to match.\n\n Raises:\n pook.PookNoMatches: if networking is disabled and no mock matches\n with the given request contract.\n\n Returns:\n pook.Response: the mock response to be used by the interceptor.\n \"\"\"\n # Trigger engine-level request filters\n for test in self.filters:\n if not test(request, self):\n return False\n\n # Trigger engine-level request mappers\n for mapper in self.mappers:\n request = mapper(request, self)\n if not request:\n raise ValueError('map function must return a request object')\n\n # Store list of mock matching errors for further debugging\n match_errors = []\n\n # Try to match the request against registered mock definitions\n for mock in self.mocks[:]:\n try:\n # Return the first matched HTTP request mock\n matches, errors = mock.match(request.copy())\n if len(errors):\n match_errors += errors\n if matches:\n return mock\n except PookExpiredMock:\n # Remove the mock if already expired\n self.mocks.remove(mock)\n\n # Validate that we have a mock\n if not self.should_use_network(request):\n msg = 'pook error!\\n\\n'\n\n msg += (\n '=> Cannot match any mock for the '\n 'following request:\\n{}'.format(request)\n )\n\n # Compose unmatch error details, if debug mode is enabled\n if self.debug:\n err = '\\n\\n'.join([str(err) for err in match_errors])\n if err:\n msg += '\\n\\n=> Detailed matching errors:\\n{}\\n'.format(err)\n\n # Raise no matches exception\n raise PookNoMatches(msg)\n\n # Register unmatched request\n self.unmatched_reqs.append(request)"} {"SOURCE": "codesearchnet", "instruction": "Make a summary of the following Python 3 code\ndef copy(self):\n req = type(self)()\n req.__dict__ = self.__dict__.copy()\n req._headers = self.headers.copy()\n return req", "response": "Returns a copy of the current Request instance for side - effects purposes."} {"SOURCE": "codesearchnet", "instruction": "How would you explain what the following Python 3 function does\ndef activate(fn=None):\n # If not used as decorator, activate the engine and exit\n if not isfunction(fn):\n _engine.activate()\n return None\n\n # If used as decorator for an async coroutine, wrap it\n if iscoroutinefunction is not None and iscoroutinefunction(fn):\n return activate_async(fn, _engine)\n\n @functools.wraps(fn)\n def wrapper(*args, **kw):\n _engine.activate()\n try:\n fn(*args, **kw)\n finally:\n _engine.disable()\n\n return wrapper", "response": "Decorator for activate a new object."} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 function for\ncreating a new isolated mock engine to be used via context manager. Example:: with pook.use() as engine: pook.mock('server.com/foo').reply(404) res = requests.get('server.com/foo') assert res.status_code == 404", "response": "def use(network=False):\n \"\"\"\n Creates a new isolated mock engine to be used via context manager.\n\n Example::\n\n with pook.use() as engine:\n pook.mock('server.com/foo').reply(404)\n\n res = requests.get('server.com/foo')\n assert res.status_code == 404\n \"\"\"\n global _engine\n\n # Create temporal engine\n __engine = _engine\n activated = __engine.active\n if activated:\n __engine.disable()\n\n _engine = Engine(network=network)\n _engine.activate()\n\n # Yield enfine to be used by the context manager\n yield _engine\n\n # Restore engine state\n _engine.disable()\n if network:\n _engine.disable_network()\n\n # Restore previous engine\n _engine = __engine\n if activated:\n _engine.activate()"} {"SOURCE": "codesearchnet", "instruction": "Can you tell what is the following Python 3 function doing\ndef regex(expression, flags=re.IGNORECASE):\n return re.compile(expression, flags=flags)", "response": "Returns a regular expression that matches the given string."} {"SOURCE": "codesearchnet", "instruction": "Can you generate a brief explanation for the following Python 3 code\ndef add_interceptor(self, *interceptors):\n for interceptor in interceptors:\n self.interceptors.append(interceptor(self.engine))", "response": "Adds one or multiple HTTP traffic interceptors to the current mocking engine."} {"SOURCE": "codesearchnet", "instruction": "How would you code a function in Python 3 to\nremove an interceptor by name.", "response": "def remove_interceptor(self, name):\n \"\"\"\n Removes a specific interceptor by name.\n\n Arguments:\n name (str): interceptor name to disable.\n\n Returns:\n bool: `True` if the interceptor was disabled, otherwise `False`.\n \"\"\"\n for index, interceptor in enumerate(self.interceptors):\n matches = (\n type(interceptor).__name__ == name or\n getattr(interceptor, 'name') == name\n )\n if matches:\n self.interceptors.pop(index)\n return True\n return False"} {"SOURCE": "codesearchnet", "instruction": "Can you tell what is the following Python 3 function doing\ndef get_setting(connection, key):\n if key in connection.settings_dict:\n return connection.settings_dict[key]\n else:\n return getattr(settings, key)", "response": "Get key from connection or default to settings."} {"SOURCE": "codesearchnet", "instruction": "Can you generate a brief explanation for the following Python 3 code\ndef as_sql(self, compiler, connection):\n sql, params = super(DecryptedCol, self).as_sql(compiler, connection)\n sql = self.target.get_decrypt_sql(connection) % (sql, self.target.get_cast_sql())\n return sql, params", "response": "Build SQL with decryption and casting."} {"SOURCE": "codesearchnet", "instruction": "Here you have a function in Python 3, explain what it does\ndef pre_save(self, model_instance, add):\n if self.original:\n original_value = getattr(model_instance, self.original)\n setattr(model_instance, self.attname, original_value)\n\n return super(HashMixin, self).pre_save(model_instance, add)", "response": "Save the original value."} {"SOURCE": "codesearchnet", "instruction": "Can you generate a brief explanation for the following Python 3 code\ndef get_placeholder(self, value=None, compiler=None, connection=None):\n if value is None or value.startswith('\\\\x'):\n return '%s'\n\n return self.get_encrypt_sql(connection)", "response": "Return the placeholder for the current value."} {"SOURCE": "codesearchnet", "instruction": "How would you code a function in Python 3 to\nget the decryption for col.", "response": "def get_col(self, alias, output_field=None):\n \"\"\"Get the decryption for col.\"\"\"\n if output_field is None:\n output_field = self\n if alias != self.model._meta.db_table or output_field != self:\n return DecryptedCol(\n alias,\n self,\n output_field\n )\n else:\n return self.cached_col"} {"SOURCE": "codesearchnet", "instruction": "How would you code a function in Python 3 to\ntell postgres to encrypt this field using PGP.", "response": "def get_placeholder(self, value=None, compiler=None, connection=None):\n \"\"\"Tell postgres to encrypt this field using PGP.\"\"\"\n return self.encrypt_sql.format(get_setting(connection, 'PUBLIC_PGP_KEY'))"} {"SOURCE": "codesearchnet", "instruction": "Can you generate a brief explanation for the following Python 3 code\ndef base_regression(Q, slope=None):\n if slope is None:\n slope = (Q[dtavgii] - Q[tavgii]*Q[davgii]/Q[sii]) \\\n /(Q[tsqii] - Q[tavgii]**2/Q[sii])\n only_intercept=False\n else:\n only_intercept=True\n\n intercept = (Q[davgii] - Q[tavgii]*slope)/Q[sii]\n\n if only_intercept:\n return {'slope':slope, 'intercept':intercept,\n 'chisq': 0.5*(Q[dsqii]/Q[sii] - Q[davgii]**2/Q[sii]**2)}\n\n chisq = 0.5*(Q[dsqii] - Q[davgii]**2/Q[sii]\n - (Q[dtavgii] - Q[davgii]*Q[tavgii]/Q[sii])**2/(Q[tsqii]\n - Q[tavgii]**2/Q[sii]))\n\n estimator_hessian = np.array([[Q[tsqii], Q[tavgii]], [Q[tavgii], Q[sii]]])\n\n return {'slope':slope, 'intercept':intercept,\n 'chisq':chisq, 'hessian':estimator_hessian,\n 'cov':np.linalg.inv(estimator_hessian)}", "response": "This function calculates the regression coefficients for a\n given vector containing averages of tip and branch\n quantities."} {"SOURCE": "codesearchnet", "instruction": "Here you have a function in Python 3, explain what it does\ndef Cov(self):\n # accumulate the covariance matrix by adding 'squares'\n M = np.zeros((self.N, self.N))\n for n in self.tree.find_clades():\n if n == self.tree.root:\n continue\n M[np.meshgrid(n._ii, n._ii)] += self.branch_variance(n)\n return M", "response": "Calculates the covariance matrix of the tips assuming variance\n has accumulated along branches of the tree accoriding to the provided\n ."} {"SOURCE": "codesearchnet", "instruction": "Can you generate a brief explanation for the following Python 3 code\ndef CovInv(self):\n self.recurse(full_matrix=True)\n return self.tree.root.cinv", "response": "Returns the inverse of the covariance matrix."} {"SOURCE": "codesearchnet", "instruction": "Explain what the following Python 3 code does\ndef recurse(self, full_matrix=False):\n for n in self.tree.get_nonterminals(order='postorder'):\n n_leaves = len(n._ii)\n if full_matrix: M = np.zeros((n_leaves, n_leaves), dtype=float)\n r = np.zeros(n_leaves, dtype=float)\n c_count = 0\n for c in n:\n ssq = self.branch_variance(c)\n nc = len(c._ii)\n if c.is_terminal():\n if full_matrix:\n M[c_count, c_count] = 1.0/ssq\n r[c_count] = 1.0/ssq\n else:\n if full_matrix:\n M[c_count:c_count+nc, c_count:c_count+nc] = c.cinv - ssq*np.outer(c.r,c.r)/(1+ssq*c.s)\n r[c_count:c_count+nc] = c.r/(1+ssq*c.s)\n c_count += nc\n\n if full_matrix: n.cinv = M\n n.r = r #M.sum(axis=1)\n n.s = n.r.sum()", "response": "This function recursively calculates the inverse covariance matrix of the tree."} {"SOURCE": "codesearchnet", "instruction": "How would you explain what the following Python 3 function does\ndef _calculate_averages(self):\n for n in self.tree.get_nonterminals(order='postorder'):\n Q = np.zeros(6, dtype=float)\n for c in n:\n tv = self.tip_value(c)\n bv = self.branch_value(c)\n var = self.branch_variance(c)\n Q += self.propagate_averages(c, tv, bv, var)\n n.Q=Q\n\n for n in self.tree.find_clades(order='preorder'):\n O = np.zeros(6, dtype=float)\n if n==self.tree.root:\n n.Qtot = n.Q\n continue\n\n for c in n.up:\n if c==n:\n continue\n\n tv = self.tip_value(c)\n bv = self.branch_value(c)\n var = self.branch_variance(c)\n O += self.propagate_averages(c, tv, bv, var)\n\n if n.up!=self.tree.root:\n c = n.up\n tv = self.tip_value(c)\n bv = self.branch_value(c)\n var = self.branch_variance(c)\n O += self.propagate_averages(c, tv, bv, var, outgroup=True)\n n.O = O\n\n if not n.is_terminal():\n tv = self.tip_value(n)\n bv = self.branch_value(n)\n var = self.branch_variance(n)\n n.Qtot = n.Q + self.propagate_averages(n, tv, bv, var, outgroup=True)", "response": "calculate the weighted sums of the tip and branch values and their second moments."} {"SOURCE": "codesearchnet", "instruction": "Explain what the following Python 3 code does\ndef propagate_averages(self, n, tv, bv, var, outgroup=False):\n if n.is_terminal() and outgroup==False:\n if tv is None or np.isinf(tv) or np.isnan(tv):\n res = np.array([0, 0, 0, 0, 0, 0])\n elif var==0:\n res = np.array([np.inf, np.inf, np.inf, np.inf, np.inf, np.inf])\n else:\n res = np.array([\n tv/var,\n bv/var,\n tv**2/var,\n bv*tv/var,\n bv**2/var,\n 1.0/var], dtype=float)\n else:\n tmpQ = n.O if outgroup else n.Q\n denom = 1.0/(1+var*tmpQ[sii])\n res = np.array([\n tmpQ[tavgii]*denom,\n (tmpQ[davgii] + bv*tmpQ[sii])*denom,\n tmpQ[tsqii] - var*tmpQ[tavgii]**2*denom,\n tmpQ[dtavgii] + tmpQ[tavgii]*bv - var*tmpQ[tavgii]*(tmpQ[davgii] + bv*tmpQ[sii])*denom,\n tmpQ[dsqii] + 2*bv*tmpQ[davgii] + bv**2*tmpQ[sii] - var*(tmpQ[davgii]**2 + 2*bv*tmpQ[davgii]*tmpQ[sii] + bv**2*tmpQ[sii]**2)*denom,\n tmpQ[sii]*denom]\n )\n\n return res", "response": "This function operates on the means variance and covariances along a branch."} {"SOURCE": "codesearchnet", "instruction": "Create a Python 3 function for\ncalculating standard explained variance of the root - to - tip distance and time", "response": "def explained_variance(self):\n \"\"\"calculate standard explained variance\n\n Returns\n -------\n float\n r-value of the root-to-tip distance and time.\n independent of regression model, but dependent on root choice\n \"\"\"\n self.tree.root._v=0\n for n in self.tree.get_nonterminals(order='preorder'):\n for c in n:\n c._v = n._v + self.branch_value(c)\n raw = np.array([(self.tip_value(n), n._v) for n in self.tree.get_terminals()\n if self.tip_value(n) is not None])\n return np.corrcoef(raw.T)[0,1]"} {"SOURCE": "codesearchnet", "instruction": "Can you write a function in Python 3 where it\nregresses tip values against branch values", "response": "def regression(self, slope=None):\n \"\"\"regress tip values against branch values\n\n Parameters\n ----------\n slope : None, optional\n if given, the slope isn't optimized\n\n Returns\n -------\n dict\n regression parameters\n \"\"\"\n self._calculate_averages()\n\n clock_model = base_regression(self.tree.root.Q, slope)\n clock_model['r_val'] = self.explained_variance()\n\n return clock_model"} {"SOURCE": "codesearchnet", "instruction": "Explain what the following Python 3 code does\ndef find_best_root(self, force_positive=True, slope=None):\n self._calculate_averages()\n best_root = {\"chisq\": np.inf}\n for n in self.tree.find_clades():\n if n==self.tree.root:\n continue\n\n tv = self.tip_value(n)\n bv = self.branch_value(n)\n var = self.branch_variance(n)\n x, chisq = self._optimal_root_along_branch(n, tv, bv, var, slope=slope)\n\n if (chisq=0 or (force_positive==False):\n best_root = {\"node\":n, \"split\":x}\n best_root.update(reg)\n\n if 'node' not in best_root:\n print(\"TreeRegression.find_best_root: No valid root found!\", force_positive)\n return None\n\n if 'hessian' in best_root:\n # calculate differentials with respect to x\n deriv = []\n n = best_root[\"node\"]\n tv = self.tip_value(n)\n bv = self.branch_value(n)\n var = self.branch_variance(n)\n for dx in [-0.001, 0.001]:\n y = min(1.0, max(0.0, best_root[\"split\"]+dx))\n tmpQ = self.propagate_averages(n, tv, bv*y, var*y) \\\n + self.propagate_averages(n, tv, bv*(1-y), var*(1-y), outgroup=True)\n reg = base_regression(tmpQ, slope=slope)\n deriv.append([y,reg['chisq'], tmpQ[tavgii], tmpQ[davgii]])\n\n estimator_hessian = np.zeros((3,3))\n estimator_hessian[:2,:2] = best_root['hessian']\n estimator_hessian[2,2] = (deriv[0][1] + deriv[1][1] - 2.0*best_root['chisq'])/(deriv[0][0] - deriv[1][0])**2\n # estimator_hessian[2,0] = (deriv[0][2] - deriv[1][2])/(deriv[0][0] - deriv[1][0])\n # estimator_hessian[2,1] = (deriv[0][3] - deriv[1][3])/(deriv[0][0] - deriv[1][0])\n estimator_hessian[0,2] = estimator_hessian[2,0]\n estimator_hessian[1,2] = estimator_hessian[2,1]\n best_root['hessian'] = estimator_hessian\n best_root['cov'] = np.linalg.inv(estimator_hessian)\n\n return best_root", "response": "Find the best root node for the tree."} {"SOURCE": "codesearchnet", "instruction": "Implement a Python 3 function for\ndetermining the best root and reroot the tree to this value. Note that this can change the parent child relations of the tree and values associated with branches rather than nodes (e.g. confidence) might need to be re-evaluated afterwards Parameters ---------- force_positive : bool, optional if True, the search for a root will only consider positive rate estimates slope : float, optional if given, it will find the optimal root given a fixed rate. If slope==0, this corresponds to minimal root-to-tip variance rooting (min_dev) Returns ------- dict regression parameters", "response": "def optimal_reroot(self, force_positive=True, slope=None):\n \"\"\"\n determine the best root and reroot the tree to this value.\n Note that this can change the parent child relations of the tree\n and values associated with branches rather than nodes\n (e.g. confidence) might need to be re-evaluated afterwards\n\n Parameters\n ----------\n force_positive : bool, optional\n if True, the search for a root will only consider positive rate estimates\n\n slope : float, optional\n if given, it will find the optimal root given a fixed rate. If slope==0, this\n corresponds to minimal root-to-tip variance rooting (min_dev)\n\n Returns\n -------\n dict\n regression parameters\n \"\"\"\n best_root = self.find_best_root(force_positive=force_positive, slope=slope)\n best_node = best_root[\"node\"]\n\n x = best_root[\"split\"]\n if x<1e-5:\n new_node = best_node\n elif x>1.0-1e-5:\n new_node = best_node.up\n else:\n # create new node in the branch and root the tree to it\n new_node = Phylo.BaseTree.Clade()\n\n # insert the new node in the middle of the branch\n # by simple re-wiring the links on the both sides of the branch\n # and fix the branch lengths\n new_node.branch_length = best_node.branch_length*(1-x)\n new_node.up = best_node.up\n new_node.clades = [best_node]\n new_node.up.clades = [k if k!=best_node else new_node\n for k in best_node.up.clades]\n\n best_node.branch_length *= x\n best_node.up = new_node\n\n new_node.rtt_regression = best_root\n self.tree.root_with_outgroup(new_node)\n\n self.tree.ladderize()\n for n in self.tree.get_nonterminals(order='postorder'):\n for c in n:\n c.up=n\n return best_root"} {"SOURCE": "codesearchnet", "instruction": "Implement a Python 3 function for\nplotting the root - to - tip distance vs time as a basic time - tree diagnostic.", "response": "def clock_plot(self, add_internal=False, ax=None, regression=None,\n confidence=True, n_sigma = 2, fs=14):\n \"\"\"Plot root-to-tip distance vs time as a basic time-tree diagnostic\n\n Parameters\n ----------\n add_internal : bool, optional\n add internal nodes. this will only work if the tree has been dated already\n ax : None, optional\n an matplotlib axis to plot into. if non provided, a new figure is opened\n regression : None, optional\n a dict containing parameters of a root-to-tip vs time regression as\n returned by the function base_regression\n confidence : bool, optional\n add confidence area to the regression line\n n_sigma : int, optional\n number of standard deviations for the confidence area.\n fs : int, optional\n fontsize\n\n \"\"\"\n import matplotlib.pyplot as plt\n if ax is None:\n plt.figure()\n ax=plt.subplot(111)\n\n self.tree.root._v=0\n for n in self.tree.get_nonterminals(order='preorder'):\n for c in n:\n c._v = n._v + self.branch_value(c)\n\n tips = self.tree.get_terminals()\n internal = self.tree.get_nonterminals()\n\n # get values of terminals\n xi = np.array([self.tip_value(n) for n in tips])\n yi = np.array([n._v for n in tips])\n ind = np.array([n.bad_branch if hasattr(n, 'bad_branch') else False for n in tips])\n if add_internal:\n xi_int = np.array([n.numdate for n in internal])\n yi_int = np.array([n._v for n in internal])\n ind_int = np.array([n.bad_branch if hasattr(n, 'bad_branch') else False for n in internal])\n\n if regression:\n # plot regression line\n t_mrca = -regression['intercept']/regression['slope']\n if add_internal:\n time_span = np.max(xi_int[~ind_int]) - np.min(xi_int[~ind_int])\n x_vals = np.array([max(np.min(xi_int[~ind_int]), t_mrca) - 0.1*time_span, np.max(xi[~ind])+0.05*time_span])\n else:\n time_span = np.max(xi[~ind]) - np.min(xi[~ind])\n x_vals = np.array([max(np.min(xi[~ind]), t_mrca) - 0.1*time_span, np.max(xi[~ind]+0.05*time_span)])\n\n # plot confidence interval\n if confidence and 'cov' in regression:\n x_vals = np.linspace(x_vals[0], x_vals[1], 100)\n y_vals = regression['slope']*x_vals + regression['intercept']\n dev = n_sigma*np.array([np.sqrt(regression['cov'][:2,:2].dot(np.array([x, 1])).dot(np.array([x,1]))) for x in x_vals])\n dev_slope = n_sigma*np.sqrt(regression['cov'][0,0])\n ax.fill_between(x_vals, y_vals-dev, y_vals+dev, alpha=0.2)\n dp = np.array([regression['intercept']/regression['slope']**2,-1./regression['slope']])\n dev_rtt = n_sigma*np.sqrt(regression['cov'][:2,:2].dot(dp).dot(dp))\n\n else:\n dev_rtt = None\n dev_slope = None\n\n ax.plot(x_vals, regression['slope']*x_vals + regression['intercept'],\n label = r\"$y=\\alpha + \\beta t$\"+\"\\n\"+\n r\"$\\beta=$%1.2e\"%(regression[\"slope\"])\n + (\"+/- %1.e\"%dev_slope if dev_slope else \"\") +\n \"\\nroot date: %1.1f\"%(-regression['intercept']/regression['slope']) +\n (\"+/- %1.2f\"%dev_rtt if dev_rtt else \"\"))\n\n\n ax.scatter(xi[~ind], yi[~ind], label=(\"tips\" if add_internal else None))\n if ind.sum():\n try:\n # note: this is treetime specific\n tmp_x = np.array([np.mean(n.raw_date_constraint) if n.raw_date_constraint else None\n for n in self.tree.get_terminals()])\n ax.scatter(tmp_x[ind], yi[ind], label=\"ignored tips\", c='r')\n except:\n pass\n if add_internal:\n ax.scatter(xi_int[~ind_int], yi_int[~ind_int], label=\"internal nodes\")\n\n ax.set_ylabel('root-to-tip distance', fontsize=fs)\n ax.set_xlabel('date', fontsize=fs)\n ax.ticklabel_format(useOffset=False)\n ax.tick_params(labelsize=fs*0.8)\n ax.set_ylim([0, 1.1*np.max(yi)])\n plt.tight_layout()\n plt.legend(fontsize=fs*0.8)"} {"SOURCE": "codesearchnet", "instruction": "Given the following Python 3 function, write the documentation\ndef F81(mu=1.0, pi=None, alphabet=\"nuc\", **kwargs):\n\n if pi is None:\n pi=0.25*np.ones(4, dtype=float)\n num_chars = len(alphabets[alphabet])\n\n pi = np.array(pi, dtype=float)\n\n if num_chars != len(pi) :\n pi = np.ones((num_chars, ), dtype=float)\n print (\"GTR: Warning!The number of the characters in the alphabet does not match the \"\n \"shape of the vector of equilibrium frequencies Pi -- assuming equal frequencies for all states.\")\n\n W = np.ones((num_chars,num_chars))\n pi /= (1.0 * np.sum(pi))\n gtr = GTR(alphabet=alphabets[alphabet])\n gtr.assign_rates(mu=mu, pi=pi, W=W)\n return gtr", "response": "F90 model for the Seismic Algebraic Tree Algorithm."} {"SOURCE": "codesearchnet", "instruction": "Here you have a function in Python 3, explain what it does\ndef T92(mu=1.0, pi_GC=0.5, kappa=0.1, **kwargs):\n\n W = _create_transversion_transition_W(kappa)\n # A C G T\n if pi_CG >=1.:\n raise ValueError(\"The relative CG content specified is larger than 1.0!\")\n pi = np.array([(1.-pi_CG)*0.5, pi_CG*0.5, pi_CG*0.5, (1-pi_CG)*0.5])\n gtr = GTR(alphabet=alphabets['nuc_nogap'])\n gtr.assign_rates(mu=mu, pi=pi, W=W)\n return gtr", "response": "T92 model for the base class of a base class"} {"SOURCE": "codesearchnet", "instruction": "Can you tell what is the following Python 3 function doing\ndef _create_transversion_transition_W(kappa):\n W = np.ones((4,4))\n W[0, 2]=W[1, 3]=W[2, 0]=W[3,1]=kappa\n return W", "response": "Create transition matrix for the transversion of a given kappa."} {"SOURCE": "codesearchnet", "instruction": "Create a Python 3 function for\ninitializing the merger model with a coalescent time.", "response": "def set_Tc(self, Tc, T=None):\n '''\n initialize the merger model with a coalescent time\n\n Args:\n - Tc: a float or an iterable, if iterable another argument T of same shape is required\n - T: an array like of same shape as Tc that specifies the time pivots corresponding to Tc\n Returns:\n - None\n '''\n if isinstance(Tc, Iterable):\n if len(Tc)==len(T):\n x = np.concatenate(([-ttconf.BIG_NUMBER], T, [ttconf.BIG_NUMBER]))\n y = np.concatenate(([Tc[0]], Tc, [Tc[-1]]))\n self.Tc = interp1d(x,y)\n else:\n self.logger(\"need Tc values and Timepoints of equal length\",2,warn=True)\n self.Tc = interp1d([-ttconf.BIG_NUMBER, ttconf.BIG_NUMBER], [1e-5, 1e-5])\n else:\n self.Tc = interp1d([-ttconf.BIG_NUMBER, ttconf.BIG_NUMBER],\n [Tc+ttconf.TINY_NUMBER, Tc+ttconf.TINY_NUMBER])\n self.calc_integral_merger_rate()"} {"SOURCE": "codesearchnet", "instruction": "Can you generate the documentation for the following Python 3 function\ndef calc_branch_count(self):\n '''\n calculates an interpolation object that maps time to the number of\n concurrent branches in the tree. The result is stored in self.nbranches\n '''\n\n # make a list of (time, merger or loss event) by root first iteration\n self.tree_events = np.array(sorted([(n.time_before_present, len(n.clades)-1)\n for n in self.tree.find_clades() if not n.bad_branch],\n key=lambda x:-x[0]))\n\n # collapse multiple events at one time point into sum of changes\n from collections import defaultdict\n dn_branch = defaultdict(int)\n for (t, dn) in self.tree_events:\n dn_branch[t]+=dn\n unique_mergers = np.array(sorted(dn_branch.items(), key = lambda x:-x[0]))\n\n # calculate the branch count at each point summing the delta branch counts\n nbranches = [[ttconf.BIG_NUMBER, 1], [unique_mergers[0,0]+ttconf.TINY_NUMBER, 1]]\n for ti, (t, dn) in enumerate(unique_mergers[:-1]):\n new_n = nbranches[-1][1]+dn\n next_t = unique_mergers[ti+1,0]+ttconf.TINY_NUMBER\n nbranches.append([t, new_n])\n nbranches.append([next_t, new_n])\n\n new_n += unique_mergers[-1,1]\n nbranches.append([next_t, new_n])\n nbranches.append([-ttconf.BIG_NUMBER, new_n])\n nbranches=np.array(nbranches)\n\n self.nbranches = interp1d(nbranches[:,0], nbranches[:,1], kind='linear')", "response": "Calculates an interpolation object that maps time to the number of concurrent branches in the tree. The result is stored in self. nbranches."} {"SOURCE": "codesearchnet", "instruction": "How would you implement a function in Python 3 that\ncalculates the integral int_0^t ( k - 1 ) - 1)/2Tc ( t ) dt and stores it as self. integral_merger_rate.", "response": "def calc_integral_merger_rate(self):\n '''\n calculates the integral int_0^t (k(t')-1)/2Tc(t') dt' and stores it as\n self.integral_merger_rate. This differences of this quantity evaluated at\n different times points are the cost of a branch.\n '''\n # integrate the piecewise constant branch count function.\n tvals = np.unique(self.nbranches.x[1:-1])\n rate = self.branch_merger_rate(tvals)\n avg_rate = 0.5*(rate[1:] + rate[:-1])\n cost = np.concatenate(([0],np.cumsum(np.diff(tvals)*avg_rate)))\n # make interpolation objects for the branch count and its integral\n # the latter is scaled by 0.5/Tc\n # need to add extra point at very large time before present to\n # prevent 'out of interpolation range' errors\n self.integral_merger_rate = interp1d(np.concatenate(([-ttconf.BIG_NUMBER], tvals,[ttconf.BIG_NUMBER])),\n np.concatenate(([cost[0]], cost,[cost[-1]])), kind='linear')"} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 function that can\nreturn the cost associated with a branch starting at t_node with a branch starting at t_node with a branch length of branch_length.", "response": "def cost(self, t_node, branch_length, multiplicity=2.0):\n '''\n returns the cost associated with a branch starting at t_node\n t_node is time before present, the branch goes back in time\n\n Args:\n - t_node: time of the node\n - branch_length: branch length, determines when this branch merges with sister\n - multiplicity: 2 if merger is binary, higher if this is a polytomy\n '''\n merger_time = t_node+branch_length\n return self.integral_merger_rate(merger_time) - self.integral_merger_rate(t_node)\\\n - np.log(self.total_merger_rate(merger_time))*(multiplicity-1.0)/multiplicity"} {"SOURCE": "codesearchnet", "instruction": "Implement a function in Python 3 to\nattach the merger cost to each branch length interpolator in the tree.", "response": "def attach_to_tree(self):\n '''\n attaches the the merger cost to each branch length interpolator in the tree.\n '''\n for clade in self.tree.find_clades():\n if clade.up is not None:\n clade.branch_length_interpolator.merger_cost = self.cost"} {"SOURCE": "codesearchnet", "instruction": "Here you have a function in Python 3, explain what it does\ndef optimize_Tc(self):\n '''\n determines the coalescent time scale that optimizes the coalescent likelihood of the tree\n '''\n from scipy.optimize import minimize_scalar\n initial_Tc = self.Tc\n def cost(Tc):\n self.set_Tc(Tc)\n return -self.total_LH()\n\n sol = minimize_scalar(cost, bounds=[ttconf.TINY_NUMBER,10.0])\n if \"success\" in sol and sol[\"success\"]:\n self.set_Tc(sol['x'])\n else:\n self.logger(\"merger_models:optimze_Tc: optimization of coalescent time scale failed: \" + str(sol), 0, warn=True)\n self.set_Tc(initial_Tc.y, T=initial_Tc.x)", "response": "This function determines the coalescent time scale that optimizes the coalescent likelihood of the tree\n ."} {"SOURCE": "codesearchnet", "instruction": "Here you have a function in Python 3, explain what it does\ndef optimize_skyline(self, n_points=20, stiffness=2.0, method = 'SLSQP',\n tol=0.03, regularization=10.0, **kwarks):\n '''\n optimize the trajectory of the merger rate 1./T_c to maximize the\n coalescent likelihood.\n parameters:\n n_points -- number of pivots of the Tc interpolation object\n stiffness -- penalty for rapid changes in log(Tc)\n methods -- method used to optimize\n tol -- optimization tolerance\n regularization -- cost of moving logTc outsize of the range [-100,0]\n merger rate is measured in branch length units, no\n plausible rates should never be outside this window\n '''\n self.logger(\"Coalescent:optimize_skyline:... current LH: %f\"%self.total_LH(),2)\n from scipy.optimize import minimize\n initial_Tc = self.Tc\n tvals = np.linspace(self.tree_events[0,0], self.tree_events[-1,0], n_points)\n def cost(logTc):\n # cap log Tc to avoid under or overflow and nan in logs\n self.set_Tc(np.exp(np.maximum(-200,np.minimum(100,logTc))), tvals)\n neglogLH = -self.total_LH() + stiffness*np.sum(np.diff(logTc)**2) \\\n + np.sum((logTc>0)*logTc*regularization)\\\n - np.sum((logTc<-100)*logTc*regularization)\n return neglogLH\n\n sol = minimize(cost, np.ones_like(tvals)*np.log(self.Tc.y.mean()), method=method, tol=tol)\n if \"success\" in sol and sol[\"success\"]:\n dlogTc = 0.1\n opt_logTc = sol['x']\n dcost = []\n for ii in range(len(opt_logTc)):\n tmp = opt_logTc.copy()\n tmp[ii]+=dlogTc\n cost_plus = cost(tmp)\n tmp[ii]-=2*dlogTc\n cost_minus = cost(tmp)\n dcost.append([cost_minus, cost_plus])\n\n dcost = np.array(dcost)\n optimal_cost = cost(opt_logTc)\n self.confidence = -dlogTc/(2*optimal_cost - dcost[:,0] - dcost[:,1])\n self.logger(\"Coalescent:optimize_skyline:...done. new LH: %f\"%self.total_LH(),2)\n else:\n self.set_Tc(initial_Tc.y, T=initial_Tc.x)\n self.logger(\"Coalescent:optimize_skyline:...failed:\"+str(sol),0, warn=True)", "response": "Optimize the trajectory of the merger rate 1. / T_c to maximize the coalescent likelihood."} {"SOURCE": "codesearchnet", "instruction": "Here you have a function in Python 3, explain what it does\ndef skyline_empirical(self, gen=1.0, n_points = 20):\n '''\n returns the skyline, i.e., an estimate of the inverse rate of coalesence.\n Here, the skyline is estimated from a sliding window average of the observed\n mergers, i.e., without reference to the coalescence likelihood.\n parameters:\n gen -- number of generations per year.\n '''\n\n mergers = self.tree_events[:,1]>0\n merger_tvals = self.tree_events[mergers,0]\n nlineages = self.nbranches(merger_tvals-ttconf.TINY_NUMBER)\n expected_merger_density = nlineages*(nlineages-1)*0.5\n\n nmergers = len(mergers)\n et = merger_tvals\n ev = 1.0/expected_merger_density\n # reduce the window size if there are few events in the tree\n if 2*n_points>len(expected_merger_density):\n n_points = len(ev)//4\n\n # smoothes with a sliding window over data points\n avg = np.sum(ev)/np.abs(et[0]-et[-1])\n dt = et[0]-et[-1]\n mid_points = np.concatenate(([et[0]-0.5*(et[1]-et[0])],\n 0.5*(et[1:] + et[:-1]),\n [et[-1]+0.5*(et[-1]-et[-2])]))\n\n # this smoothes the ratio of expected and observed merger rate\n self.Tc_inv = interp1d(mid_points[n_points:-n_points],\n [np.sum(ev[(et>=l)&(et=randnum, axis=0)\n else:\n idx = tmp_profile.argmax(axis=1)\n seq = gtr.alphabet[idx] # max LH over the alphabet\n\n prof_values = tmp_profile[np.arange(tmp_profile.shape[0]), idx]\n\n return seq, prof_values, idx"} {"SOURCE": "codesearchnet", "instruction": "Create a Python 3 function to\nreturn a normalized version of a profile matrix", "response": "def normalize_profile(in_profile, log=False, return_offset = True):\n \"\"\"return a normalized version of a profile matrix\n\n Parameters\n ----------\n in_profile : np.array\n shape Lxq, will be normalized to one across each row\n log : bool, optional\n treat the input as log probabilities\n return_offset : bool, optional\n return the log of the scale factor for each row\n\n Returns\n -------\n tuple\n normalized profile (fresh np object) and offset (if return_offset==True)\n \"\"\"\n if log:\n tmp_prefactor = in_profile.max(axis=1)\n tmp_prof = np.exp(in_profile.T - tmp_prefactor).T\n else:\n tmp_prefactor = 0.0\n tmp_prof = in_profile\n\n norm_vector = tmp_prof.sum(axis=1)\n return (np.copy(np.einsum('ai,a->ai',tmp_prof,1.0/norm_vector)),\n (np.log(norm_vector) + tmp_prefactor) if return_offset else None)"} {"SOURCE": "codesearchnet", "instruction": "Can you implement a function in Python 3 that\nprints log message msg to stdout.", "response": "def logger(self, msg, level, warn=False):\n \"\"\"\n Print log message *msg* to stdout.\n\n Parameters\n -----------\n\n msg : str\n String to print on the screen\n\n level : int\n Log-level. Only the messages with a level higher than the\n current verbose level will be shown.\n\n warn : bool\n Warning flag. If True, the message will be displayed\n regardless of its log-level.\n\n \"\"\"\n if level10:\n self.logger('WARNING: almost exclusively ACGT-N in alignment. Really a protein alignment?', 1, warn=True)\n\n if hasattr(self, '_tree') and (self.tree is not None):\n self._attach_sequences_to_nodes()\n else:\n self.logger(\"TreeAnc.aln: sequences not yet attached to tree\", 3, warn=True)", "response": "Reads in the alignment and sets tree - related parameters and attaches sequences\n to the tree nodes."} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 function that can\nset the length of the uncompressed sequence.", "response": "def seq_len(self,L):\n \"\"\"set the length of the uncompressed sequence. its inverse 'one_mutation'\n is frequently used as a general length scale. This can't be changed once\n it is set.\n\n Parameters\n ----------\n L : int\n length of the sequence alignment\n \"\"\"\n if (not hasattr(self, '_seq_len')) or self._seq_len is None:\n if L:\n self._seq_len = int(L)\n else:\n self.logger(\"TreeAnc: one_mutation and sequence length can't be reset\",1)"} {"SOURCE": "codesearchnet", "instruction": "Can you tell what is the following Python 3 function doing\ndef _attach_sequences_to_nodes(self):\n '''\n For each node of the tree, check whether there is a sequence available\n in the alignment and assign this sequence as a character array\n '''\n failed_leaves= 0\n if self.is_vcf:\n # if alignment is specified as difference from ref\n dic_aln = self.aln\n else:\n # if full alignment is specified\n dic_aln = {k.name: seq2array(k.seq, fill_overhangs=self.fill_overhangs,\n ambiguous_character=self.gtr.ambiguous)\n for k in self.aln} #\n\n # loop over leaves and assign multiplicities of leaves (e.g. number of identical reads)\n for l in self.tree.get_terminals():\n if l.name in self.seq_multiplicity:\n l.count = self.seq_multiplicity[l.name]\n else:\n l.count = 1.0\n\n\n # loop over tree, and assign sequences\n for l in self.tree.find_clades():\n if l.name in dic_aln:\n l.sequence= dic_aln[l.name]\n elif l.is_terminal():\n self.logger(\"***WARNING: TreeAnc._attach_sequences_to_nodes: NO SEQUENCE FOR LEAF: %s\" % l.name, 0, warn=True)\n failed_leaves += 1\n l.sequence = seq2array(self.gtr.ambiguous*self.seq_len, fill_overhangs=self.fill_overhangs,\n ambiguous_character=self.gtr.ambiguous)\n if failed_leaves > self.tree.count_terminals()/3:\n self.logger(\"ERROR: At least 30\\\\% terminal nodes cannot be assigned with a sequence!\\n\", 0, warn=True)\n self.logger(\"Are you sure the alignment belongs to the tree?\", 2, warn=True)\n break\n else: # could not assign sequence for internal node - is OK\n pass\n\n if failed_leaves:\n self.logger(\"***WARNING: TreeAnc: %d nodes don't have a matching sequence in the alignment.\"\n \" POSSIBLE ERROR.\"%failed_leaves, 0, warn=True)\n\n # extend profile to contain additional unknown characters\n self.extend_profile()\n return self.make_reduced_alignment()", "response": "This function is used to assign sequences to nodes in the tree."} {"SOURCE": "codesearchnet", "instruction": "How would you explain what the following Python 3 function does\ndef make_reduced_alignment(self):\n\n self.logger(\"TreeAnc: making reduced alignment...\", 1)\n\n # bind positions in real sequence to that of the reduced (compressed) sequence\n self.full_to_reduced_sequence_map = np.zeros(self.seq_len, dtype=int)\n\n # bind position in reduced sequence to the array of positions in real (expanded) sequence\n self.reduced_to_full_sequence_map = {}\n\n #if is a dict, want to be efficient and not iterate over a bunch of const_sites\n #so pre-load alignment_patterns with the location of const sites!\n #and get the sites that we want to iterate over only!\n if self.is_vcf:\n tmp_reduced_aln, alignment_patterns, positions = self.process_alignment_dict()\n seqNames = self.aln.keys() #store seqName order to put back on tree\n elif self.reduce_alignment:\n # transpose real alignment, for ease of iteration\n alignment_patterns = {}\n tmp_reduced_aln = []\n # NOTE the order of tree traversal must be the same as below\n # for assigning the cseq attributes to the nodes.\n seqs = [n.sequence for n in self.tree.find_clades() if hasattr(n, 'sequence')]\n if len(np.unique([len(x) for x in seqs]))>1:\n self.logger(\"TreeAnc: Sequences differ in in length! ABORTING\",0, warn=True)\n aln_transpose = None\n return\n else:\n aln_transpose = np.array(seqs).T\n positions = range(aln_transpose.shape[0])\n else:\n self.multiplicity = np.ones(self.seq_len, dtype=float)\n self.full_to_reduced_sequence_map = np.arange(self.seq_len)\n self.reduced_to_full_sequence_map = {p:np.array([p]) for p in np.arange(self.seq_len)}\n for n in self.tree.find_clades():\n if hasattr(n, 'sequence'):\n n.original_cseq = np.copy(n.sequence)\n n.cseq = np.copy(n.sequence)\n return ttconf.SUCCESS\n\n for pi in positions:\n if self.is_vcf:\n pattern = [ self.aln[k][pi] if pi in self.aln[k].keys()\n else self.ref[pi] for k,v in self.aln.items() ]\n else:\n pattern = aln_transpose[pi]\n\n str_pat = \"\".join(pattern)\n # if the column contains only one state and ambiguous nucleotides, replace\n # those with the state in other strains right away\n unique_letters = list(np.unique(pattern))\n if hasattr(self.gtr, \"ambiguous\"):\n if len(unique_letters)==2 and self.gtr.ambiguous in unique_letters:\n other = [c for c in unique_letters if c!=self.gtr.ambiguous][0]\n str_pat = str_pat.replace(self.gtr.ambiguous, other)\n unique_letters = [other]\n # if there is a mutation in this column, give it its private pattern\n # this is required when sampling mutations from reconstructed profiles.\n # otherwise, all mutations corresponding to the same pattern will be coupled.\n if len(unique_letters)>1:\n str_pat += '_%d'%pi\n\n # if the pattern is not yet seen,\n if str_pat not in alignment_patterns:\n # bind the index in the reduced aln, index in sequence to the pattern string\n alignment_patterns[str_pat] = (len(tmp_reduced_aln), [pi])\n # append this pattern to the reduced alignment\n tmp_reduced_aln.append(pattern)\n else:\n # if the pattern is already seen, append the position in the real\n # sequence to the reduced aln<->sequence_pos_indexes map\n alignment_patterns[str_pat][1].append(pi)\n\n # add constant alignment column not in the alignment. We don't know where they\n # are, so just add them to the end. First, determine sequence composition.\n if self.additional_constant_sites:\n character_counts = {c:np.sum(aln_transpose==c) for c in self.gtr.alphabet\n if c not in [self.gtr.ambiguous, '-']}\n total = np.sum(list(character_counts.values()))\n additional_columns = [(c,int(np.round(self.additional_constant_sites*n/total)))\n for c, n in character_counts.items()]\n columns_left = self.additional_constant_sites\n pi = len(positions)\n for c,n in additional_columns:\n if c==additional_columns[-1][0]: # make sure all additions add up to the correct number to avoid rounding\n n = columns_left\n str_pat = c*len(self.aln)\n pos_list = list(range(pi, pi+n))\n\n if str_pat in alignment_patterns:\n alignment_patterns[str_pat][1].extend(pos_list)\n else:\n alignment_patterns[str_pat] = (len(tmp_reduced_aln), pos_list)\n tmp_reduced_aln.append(np.array(list(str_pat)))\n pi += n\n columns_left -= n\n\n\n # count how many times each column is repeated in the real alignment\n self.multiplicity = np.zeros(len(alignment_patterns))\n for p, pos in alignment_patterns.values():\n self.multiplicity[p]=len(pos)\n\n # create the reduced alignment as np array\n self.reduced_alignment = np.array(tmp_reduced_aln).T\n\n # create map to compress a sequence\n for p, pos in alignment_patterns.values():\n self.full_to_reduced_sequence_map[np.array(pos)]=p\n\n # create a map to reconstruct full sequence from the reduced (compressed) sequence\n for p, val in alignment_patterns.items():\n self.reduced_to_full_sequence_map[val[0]]=np.array(val[1], dtype=int)\n\n # assign compressed sequences to all nodes of the tree, which have sequence assigned\n # for dict we cannot assume this is in the same order, as it does below!\n # so do it explicitly\n #\n # sequences are overwritten during reconstruction and\n # ambiguous sites change. Keep orgininals for reference\n if self.is_vcf:\n seq_reduce_align = {n:self.reduced_alignment[i]\n for i, n in enumerate(seqNames)}\n for n in self.tree.find_clades():\n if hasattr(n, 'sequence'):\n n.original_cseq = seq_reduce_align[n.name]\n n.cseq = np.copy(n.original_cseq)\n else:\n # NOTE the order of tree traversal must be the same as above to catch the\n # index in the reduced alignment correctly\n seq_count = 0\n for n in self.tree.find_clades():\n if hasattr(n, 'sequence'):\n n.original_cseq = self.reduced_alignment[seq_count]\n n.cseq = np.copy(n.original_cseq)\n seq_count+=1\n\n self.logger(\"TreeAnc: constructed reduced alignment...\", 1)\n return ttconf.SUCCESS", "response": "Create the reduced alignment from the full sequences attached to the tree nodes."} {"SOURCE": "codesearchnet", "instruction": "Can you generate the documentation for the following Python 3 function\ndef process_alignment_dict(self):\n\n # number of sequences in alignment\n nseq = len(self.aln)\n\n inv_map = defaultdict(list)\n for k,v in self.aln.items():\n for pos, bs in v.items():\n inv_map[pos].append(bs)\n\n self.nonref_positions = np.sort(list(inv_map.keys()))\n self.inferred_const_sites = []\n\n ambiguous_char = self.gtr.ambiguous\n nonref_const = []\n nonref_alleles = []\n ambiguous_const = []\n variable_pos = []\n for pos, bs in inv_map.items(): #loop over positions and patterns\n bases = \"\".join(np.unique(bs))\n if len(bs) == nseq:\n if (len(bases)<=2 and ambiguous_char in bases) or len(bases)==1:\n # all sequences different from reference, but only one state\n # (other than ambiguous_char) in column\n nonref_const.append(pos)\n nonref_alleles.append(bases.replace(ambiguous_char, ''))\n if ambiguous_char in bases: #keep track of sites 'made constant'\n self.inferred_const_sites.append(pos)\n else:\n # at least two non-reference alleles\n variable_pos.append(pos)\n else:\n # not every sequence different from reference\n if bases==ambiguous_char:\n ambiguous_const.append(pos)\n self.inferred_const_sites.append(pos) #keep track of sites 'made constant'\n else:\n # at least one non ambiguous non-reference allele not in\n # every sequence\n variable_pos.append(pos)\n\n refMod = np.array(list(self.ref))\n # place constant non reference positions by their respective allele\n refMod[nonref_const] = nonref_alleles\n # mask variable positions\n states = self.gtr.alphabet\n # maybe states = np.unique(refMod)\n refMod[variable_pos] = '.'\n\n # for each base in the gtr, make constant alignment pattern and\n # assign it to all const positions in the modified reference sequence\n reduced_alignment_const = []\n alignment_patterns_const = {}\n for base in states:\n p = base*nseq\n pos = list(np.where(refMod==base)[0])\n #if the alignment doesn't have a const site of this base, don't add! (ex: no '----' site!)\n if len(pos):\n alignment_patterns_const[p] = [len(reduced_alignment_const), pos]\n reduced_alignment_const.append(list(p))\n\n\n return reduced_alignment_const, alignment_patterns_const, variable_pos", "response": "process the alignment dictionary for non - variable postitions and non - reference alleles and non - reference positions"} {"SOURCE": "codesearchnet", "instruction": "Explain what the following Python 3 code does\ndef prepare_tree(self):\n self.tree.root.branch_length = 0.001\n self.tree.root.mutation_length = self.tree.root.branch_length\n self.tree.root.mutations = []\n self.tree.ladderize()\n self._prepare_nodes()\n self._leaves_lookup = {node.name:node for node in self.tree.get_terminals()}", "response": "Prepare the tree for the tree."} {"SOURCE": "codesearchnet", "instruction": "How would you implement a function in Python 3 that\nsets auxilliary parameters to every node of the tree.", "response": "def _prepare_nodes(self):\n \"\"\"\n Set auxilliary parameters to every node of the tree.\n \"\"\"\n self.tree.root.up = None\n self.tree.root.bad_branch=self.tree.root.bad_branch if hasattr(self.tree.root, 'bad_branch') else False\n internal_node_count = 0\n for clade in self.tree.get_nonterminals(order='preorder'): # parents first\n internal_node_count+=1\n if clade.name is None:\n clade.name = \"NODE_\" + format(self._internal_node_count, '07d')\n self._internal_node_count += 1\n for c in clade.clades:\n if c.is_terminal():\n c.bad_branch = c.bad_branch if hasattr(c, 'bad_branch') else False\n c.up = clade\n\n for clade in self.tree.get_nonterminals(order='postorder'): # parents first\n clade.bad_branch = all([c.bad_branch for c in clade])\n\n self._calc_dist2root()\n self._internal_node_count = max(internal_node_count, self._internal_node_count)"} {"SOURCE": "codesearchnet", "instruction": "Make a summary of the following Python 3 code\ndef _calc_dist2root(self):\n self.tree.root.dist2root = 0.0\n for clade in self.tree.get_nonterminals(order='preorder'): # parents first\n for c in clade.clades:\n if not hasattr(c, 'mutation_length'):\n c.mutation_length=c.branch_length\n c.dist2root = c.up.dist2root + c.mutation_length", "response": "Calculates the distance between the root and the root node."} {"SOURCE": "codesearchnet", "instruction": "Explain what the following Python 3 code does\ndef infer_gtr(self, print_raw=False, marginal=False, normalized_rate=True,\n fixed_pi=None, pc=5.0, **kwargs):\n \"\"\"\n Calculates a GTR model given the multiple sequence alignment and the tree.\n It performs ancestral sequence inferrence (joint or marginal), followed by\n the branch lengths optimization. Then, the numbers of mutations are counted\n in the optimal tree and related to the time within the mutation happened.\n From these statistics, the relative state transition probabilities are inferred,\n and the transition matrix is computed.\n\n The result is used to construct the new GTR model of type 'custom'.\n The model is assigned to the TreeAnc and is used in subsequent analysis.\n\n Parameters\n -----------\n\n print_raw : bool\n If True, print the inferred GTR model\n\n marginal : bool\n If True, use marginal sequence reconstruction\n\n normalized_rate : bool\n If True, sets the mutation rate prefactor to 1.0.\n\n fixed_pi : np.array\n Provide the equilibrium character concentrations.\n If None is passed, the concentrations will be inferred from the alignment.\n\n pc: float\n Number of pseudo counts to use in gtr inference\n\n Returns\n -------\n\n gtr : GTR\n The inferred GTR model\n \"\"\"\n\n # decide which type of the Maximum-likelihood reconstruction use\n # (marginal) or (joint)\n if marginal:\n _ml_anc = self._ml_anc_marginal\n else:\n _ml_anc = self._ml_anc_joint\n\n self.logger(\"TreeAnc.infer_gtr: inferring the GTR model from the tree...\", 1)\n if (self.tree is None) or (self.aln is None):\n self.logger(\"TreeAnc.infer_gtr: ERROR, alignment or tree are missing\", 0)\n return ttconf.ERROR\n\n _ml_anc(final=True, **kwargs) # call one of the reconstruction types\n alpha = list(self.gtr.alphabet)\n n=len(alpha)\n # matrix of mutations n_{ij}: i = derived state, j=ancestral state\n nij = np.zeros((n,n))\n Ti = np.zeros(n)\n\n self.logger(\"TreeAnc.infer_gtr: counting mutations...\", 2)\n for node in self.tree.find_clades():\n if hasattr(node,'mutations'):\n for a,pos, d in node.mutations:\n i,j = alpha.index(d), alpha.index(a)\n nij[i,j]+=1\n Ti[j] += 0.5*self._branch_length_to_gtr(node)\n Ti[i] -= 0.5*self._branch_length_to_gtr(node)\n for ni,nuc in enumerate(node.cseq):\n i = alpha.index(nuc)\n Ti[i] += self._branch_length_to_gtr(node)*self.multiplicity[ni]\n self.logger(\"TreeAnc.infer_gtr: counting mutations...done\", 3)\n if print_raw:\n print('alphabet:',alpha)\n print('n_ij:', nij, nij.sum())\n print('T_i:', Ti, Ti.sum())\n root_state = np.array([np.sum((self.tree.root.cseq==nuc)*self.multiplicity) for nuc in alpha])\n\n self._gtr = GTR.infer(nij, Ti, root_state, fixed_pi=fixed_pi, pc=pc,\n alphabet=self.gtr.alphabet, logger=self.logger,\n prof_map = self.gtr.profile_map)\n\n if normalized_rate:\n self.logger(\"TreeAnc.infer_gtr: setting overall rate to 1.0...\", 2)\n self._gtr.mu=1.0\n return self._gtr", "response": "This function calculates the GTR model from the tree and the tree s alignment and the tree s tree and the tree s pseudo counts."} {"SOURCE": "codesearchnet", "instruction": "How would you implement a function in Python 3 that\nreconstructs the ancestral sequences for a particular tree and anomaly level.", "response": "def reconstruct_anc(self, method='probabilistic', infer_gtr=False,\n marginal=False, **kwargs):\n \"\"\"Reconstruct ancestral sequences\n\n Parameters\n ----------\n method : str\n Method to use. Supported values are \"fitch\" and \"ml\"\n\n infer_gtr : bool\n Infer a GTR model before reconstructing the sequences\n\n marginal : bool\n Assign sequences that are most likely after averaging over all other nodes\n instead of the jointly most likely sequences.\n **kwargs\n additional keyword arguments that are passed down to :py:meth:`TreeAnc.infer_gtr` and :py:meth:`TreeAnc._ml_anc`\n\n Returns\n -------\n N_diff : int\n Number of nucleotides different from the previous\n reconstruction. If there were no pre-set sequences, returns N*L\n\n \"\"\"\n self.logger(\"TreeAnc.infer_ancestral_sequences with method: %s, %s\"%(method, 'marginal' if marginal else 'joint'), 1)\n if (self.tree is None) or (self.aln is None):\n self.logger(\"TreeAnc.infer_ancestral_sequences: ERROR, alignment or tree are missing\", 0)\n return ttconf.ERROR\n\n if method in ['ml', 'probabilistic']:\n if marginal:\n _ml_anc = self._ml_anc_marginal\n else:\n _ml_anc = self._ml_anc_joint\n else:\n _ml_anc = self._fitch_anc\n\n if infer_gtr:\n tmp = self.infer_gtr(marginal=marginal, **kwargs)\n if tmp==ttconf.ERROR:\n return tmp\n N_diff = _ml_anc(**kwargs)\n else:\n N_diff = _ml_anc(**kwargs)\n\n return N_diff"} {"SOURCE": "codesearchnet", "instruction": "Create a Python 3 function for\ngetting the mutations on a tree branch.", "response": "def get_mutations(self, node, keep_var_ambigs=False):\n \"\"\"\n Get the mutations on a tree branch. Take compressed sequences from both sides\n of the branch (attached to the node), compute mutations between them, and\n expand these mutations to the positions in the real sequences.\n\n Parameters\n ----------\n node : PhyloTree.Clade\n Tree node, which is the child node attached to the branch.\n\n keep_var_ambigs : boolean\n If true, generates mutations based on the *original* compressed sequence, which\n may include ambiguities. Note sites that only have 1 unambiguous base and ambiguous\n bases (\"AAAAANN\") are stripped of ambiguous bases *before* compression, so ambiguous\n bases will **not** be preserved.\n\n Returns\n -------\n muts : list\n List of mutations. Each mutation is represented as tuple of\n :code:`(parent_state, position, child_state)`.\n \"\"\"\n\n # if ambiguous site are to be restored and node is terminal,\n # assign original sequence, else reconstructed cseq\n node_seq = node.cseq\n if keep_var_ambigs and hasattr(node, \"original_cseq\") and node.is_terminal():\n node_seq = node.original_cseq\n\n muts = []\n diff_pos = np.where(node.up.cseq!=node_seq)[0]\n for p in diff_pos:\n anc = node.up.cseq[p]\n der = node_seq[p]\n # expand to the positions in real sequence\n muts.extend([(anc, pos, der) for pos in self.reduced_to_full_sequence_map[p]])\n\n #sort by position\n return sorted(muts, key=lambda x:x[1])"} {"SOURCE": "codesearchnet", "instruction": "Can you generate a brief explanation for the following Python 3 code\ndef get_branch_mutation_matrix(self, node, full_sequence=False):\n pp,pc = self.marginal_branch_profile(node)\n\n # calculate pc_i [e^Qt]_ij pp_j for each site\n expQt = self.gtr.expQt(self._branch_length_to_gtr(node))\n if len(expQt.shape)==3: # site specific model\n mut_matrix_stack = np.einsum('ai,aj,ija->aij', pc, pp, expQt)\n else:\n mut_matrix_stack = np.einsum('ai,aj,ij->aij', pc, pp, expQt)\n\n # normalize this distribution\n normalizer = mut_matrix_stack.sum(axis=2).sum(axis=1)\n mut_matrix_stack = np.einsum('aij,a->aij', mut_matrix_stack, 1.0/normalizer)\n\n # expand to full sequence if requested\n if full_sequence:\n return mut_matrix_stack[self.full_to_reduced_sequence_map]\n else:\n return mut_matrix_stack", "response": "uses results from marginal ancestral inference to return a joint mutation matrix of the sequence states at both ends of the branch."} {"SOURCE": "codesearchnet", "instruction": "Can you generate the documentation for the following Python 3 function\ndef expanded_sequence(self, node, include_additional_constant_sites=False):\n if include_additional_constant_sites:\n L = self.seq_len\n else:\n L = self.seq_len - self.additional_constant_sites\n\n return node.cseq[self.full_to_reduced_sequence_map[:L]]", "response": "Expand a nodes compressed sequence into the real sequence"} {"SOURCE": "codesearchnet", "instruction": "Make a summary of the following Python 3 code\ndef dict_sequence(self, node, keep_var_ambigs=False):\n seq = {}\n\n node_seq = node.cseq\n if keep_var_ambigs and hasattr(node, \"original_cseq\") and node.is_terminal():\n node_seq = node.original_cseq\n\n for pos in self.nonref_positions:\n cseqLoc = self.full_to_reduced_sequence_map[pos]\n base = node_seq[cseqLoc]\n if self.ref[pos] != base:\n seq[pos] = base\n\n return seq", "response": "Returns a dict of all the basepairs of the node s variants and their positions."} {"SOURCE": "codesearchnet", "instruction": "Can you tell what is the following Python 3 function doing\ndef _fitch_anc(self, **kwargs):\n # set fitch profiiles to each terminal node\n\n for l in self.tree.get_terminals():\n l.state = [[k] for k in l.cseq]\n\n L = len(self.tree.get_terminals()[0].cseq)\n\n self.logger(\"TreeAnc._fitch_anc: Walking up the tree, creating the Fitch profiles\",2)\n for node in self.tree.get_nonterminals(order='postorder'):\n node.state = [self._fitch_state(node, k) for k in range(L)]\n\n ambs = [i for i in range(L) if len(self.tree.root.state[i])>1]\n if len(ambs) > 0:\n for amb in ambs:\n self.logger(\"Ambiguous state of the root sequence \"\n \"in the position %d: %s, \"\n \"choosing %s\" % (amb, str(self.tree.root.state[amb]),\n self.tree.root.state[amb][0]), 4)\n self.tree.root.cseq = np.array([k[np.random.randint(len(k)) if len(k)>1 else 0]\n for k in self.tree.root.state])\n\n if self.is_vcf:\n self.tree.root.sequence = self.dict_sequence(self.tree.root)\n else:\n self.tree.root.sequence = self.expanded_sequence(self.tree.root)\n\n\n self.logger(\"TreeAnc._fitch_anc: Walking down the self.tree, generating sequences from the \"\n \"Fitch profiles.\", 2)\n N_diff = 0\n for node in self.tree.get_nonterminals(order='preorder'):\n if node.up != None: # not root\n sequence = np.array([node.up.cseq[i]\n if node.up.cseq[i] in node.state[i]\n else node.state[i][0] for i in range(L)])\n if hasattr(node, 'sequence'):\n N_diff += (sequence!=node.cseq).sum()\n else:\n N_diff += L\n node.cseq = sequence\n if self.is_vcf:\n node.sequence = self.dict_sequence(node)\n else:\n node.sequence = self.expanded_sequence(node)\n node.mutations = self.get_mutations(node)\n\n node.profile = seq2prof(node.cseq, self.gtr.profile_map)\n del node.state # no need to store Fitch states\n self.logger(\"Done ancestral state reconstruction\",3)\n for node in self.tree.get_terminals():\n node.profile = seq2prof(node.original_cseq, self.gtr.profile_map)\n return N_diff", "response": "This method reconstructs the ancestral states using Fitch s algorithm."} {"SOURCE": "codesearchnet", "instruction": "How would you explain what the following Python 3 function does\ndef _fitch_state(self, node, pos):\n state = self._fitch_intersect([k.state[pos] for k in node.clades])\n if len(state) == 0:\n state = np.concatenate([k.state[pos] for k in node.clades])\n return state", "response": "Returns the Fitch state for a single character of the given node at the given position."} {"SOURCE": "codesearchnet", "instruction": "Can you implement a function in Python 3 that\nfinds the intersection of any number of 1D arrays.", "response": "def _fitch_intersect(self, arrays):\n \"\"\"\n Find the intersection of any number of 1D arrays.\n Return the sorted, unique values that are in all of the input arrays.\n Adapted from numpy.lib.arraysetops.intersect1d\n \"\"\"\n def pairwise_intersect(arr1, arr2):\n s2 = set(arr2)\n b3 = [val for val in arr1 if val in s2]\n return b3\n\n arrays = list(arrays) # allow assignment\n N = len(arrays)\n while N > 1:\n arr1 = arrays.pop()\n arr2 = arrays.pop()\n arr = pairwise_intersect(arr1, arr2)\n arrays.append(arr)\n N = len(arrays)\n\n return arrays[0]"} {"SOURCE": "codesearchnet", "instruction": "Given the following Python 3 function, write the documentation\ndef sequence_LH(self, pos=None, full_sequence=False):\n if not hasattr(self.tree, \"total_sequence_LH\"):\n self.logger(\"TreeAnc.sequence_LH: you need to run marginal ancestral inference first!\", 1)\n self.infer_ancestral_sequences(marginal=True)\n if pos is not None:\n if full_sequence:\n compressed_pos = self.full_to_reduced_sequence_map[pos]\n else:\n compressed_pos = pos\n return self.tree.sequence_LH[compressed_pos]\n else:\n return self.tree.total_sequence_LH", "response": "return the likelihood of the observed sequences given the tree"} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 script to\ncalculate the likelihood of the given realization of the sequences in the tree returns the likelihood of the given realization of the sequences in the tree", "response": "def ancestral_likelihood(self):\n \"\"\"\n Calculate the likelihood of the given realization of the sequences in\n the tree\n\n Returns\n -------\n\n log_lh : float\n The tree likelihood given the sequences\n \"\"\"\n log_lh = np.zeros(self.multiplicity.shape[0])\n for node in self.tree.find_clades(order='postorder'):\n\n if node.up is None: # root node\n # 0-1 profile\n profile = seq2prof(node.cseq, self.gtr.profile_map)\n # get the probabilities to observe each nucleotide\n profile *= self.gtr.Pi\n profile = profile.sum(axis=1)\n log_lh += np.log(profile) # product over all characters\n continue\n\n t = node.branch_length\n\n indices = np.array([(np.argmax(self.gtr.alphabet==a),\n np.argmax(self.gtr.alphabet==b)) for a, b in zip(node.up.cseq, node.cseq)])\n\n logQt = np.log(self.gtr.expQt(t))\n lh = logQt[indices[:, 1], indices[:, 0]]\n log_lh += lh\n\n return log_lh"} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 function for\nsetting the branch lengths to either mutation lengths of given branch lengths.", "response": "def _branch_length_to_gtr(self, node):\n \"\"\"\n Set branch lengths to either mutation lengths of given branch lengths.\n The assigend values are to be used in the following ML analysis.\n \"\"\"\n if self.use_mutation_length:\n return max(ttconf.MIN_BRANCH_LENGTH*self.one_mutation, node.mutation_length)\n else:\n return max(ttconf.MIN_BRANCH_LENGTH*self.one_mutation, node.branch_length)"} {"SOURCE": "codesearchnet", "instruction": "Create a Python 3 function to\nperform marginal ML reconstruction of the ancestral states. In contrast to joint reconstructions, this needs to access the probabilities rather than only log probabilities and is hence handled by a separate function. Parameters ---------- store_compressed : bool, default True attach a reduced representation of sequence changed to each branch final : bool, default True stop full length by expanding sites with identical alignment patterns sample_from_profile : bool or str assign sequences probabilistically according to the inferred probabilities of ancestral states instead of to their ML value. This parameter can also take the value 'root' in which case probabilistic sampling will happen at the root but at no other node.", "response": "def _ml_anc_marginal(self, store_compressed=False, final=True, sample_from_profile=False,\n debug=False, **kwargs):\n \"\"\"\n Perform marginal ML reconstruction of the ancestral states. In contrast to\n joint reconstructions, this needs to access the probabilities rather than only\n log probabilities and is hence handled by a separate function.\n\n Parameters\n ----------\n\n store_compressed : bool, default True\n attach a reduced representation of sequence changed to each branch\n\n final : bool, default True\n stop full length by expanding sites with identical alignment patterns\n\n sample_from_profile : bool or str\n assign sequences probabilistically according to the inferred probabilities\n of ancestral states instead of to their ML value. This parameter can also\n take the value 'root' in which case probabilistic sampling will happen\n at the root but at no other node.\n\n \"\"\"\n\n tree = self.tree\n # number of nucleotides changed from prev reconstruction\n N_diff = 0\n\n L = self.multiplicity.shape[0]\n n_states = self.gtr.alphabet.shape[0]\n self.logger(\"TreeAnc._ml_anc_marginal: type of reconstruction: Marginal\", 2)\n\n self.logger(\"Attaching sequence profiles to leafs... \", 3)\n # set the leaves profiles\n for leaf in tree.get_terminals():\n # in any case, set the profile\n leaf.marginal_subtree_LH = seq2prof(leaf.original_cseq, self.gtr.profile_map)\n leaf.marginal_subtree_LH_prefactor = np.zeros(L)\n\n self.logger(\"Walking up the tree, computing likelihoods... \", 3)\n # propagate leaves --> root, set the marginal-likelihood messages\n for node in tree.get_nonterminals(order='postorder'): #leaves -> root\n # regardless of what was before, set the profile to ones\n tmp_log_subtree_LH = np.zeros((L,n_states), dtype=float)\n node.marginal_subtree_LH_prefactor = np.zeros(L, dtype=float)\n for ch in node.clades:\n ch.marginal_log_Lx = self.gtr.propagate_profile(ch.marginal_subtree_LH,\n self._branch_length_to_gtr(ch), return_log=True) # raw prob to transfer prob up\n tmp_log_subtree_LH += ch.marginal_log_Lx\n node.marginal_subtree_LH_prefactor += ch.marginal_subtree_LH_prefactor\n\n node.marginal_subtree_LH, offset = normalize_profile(tmp_log_subtree_LH, log=True)\n node.marginal_subtree_LH_prefactor += offset # and store log-prefactor\n\n self.logger(\"Computing root node sequence and total tree likelihood...\",3)\n # Msg to the root from the distant part (equ frequencies)\n if len(self.gtr.Pi.shape)==1:\n tree.root.marginal_outgroup_LH = np.repeat([self.gtr.Pi], tree.root.marginal_subtree_LH.shape[0], axis=0)\n else:\n tree.root.marginal_outgroup_LH = np.copy(self.gtr.Pi.T)\n\n tree.root.marginal_profile, pre = normalize_profile(tree.root.marginal_outgroup_LH*tree.root.marginal_subtree_LH)\n marginal_LH_prefactor = tree.root.marginal_subtree_LH_prefactor + pre\n\n # choose sequence characters from this profile.\n # treat root node differently to avoid piling up mutations on the longer branch\n if sample_from_profile=='root':\n root_sample_from_profile = True\n other_sample_from_profile = False\n elif isinstance(sample_from_profile, bool):\n root_sample_from_profile = sample_from_profile\n other_sample_from_profile = sample_from_profile\n\n seq, prof_vals, idxs = prof2seq(tree.root.marginal_profile,\n self.gtr, sample_from_prof=root_sample_from_profile, normalize=False)\n\n self.tree.sequence_LH = marginal_LH_prefactor\n self.tree.total_sequence_LH = (self.tree.sequence_LH*self.multiplicity).sum()\n self.tree.root.cseq = seq\n gc.collect()\n\n if final:\n if self.is_vcf:\n self.tree.root.sequence = self.dict_sequence(self.tree.root)\n else:\n self.tree.root.sequence = self.expanded_sequence(self.tree.root)\n\n self.logger(\"Walking down the tree, computing maximum likelihood sequences...\",3)\n # propagate root -->> leaves, reconstruct the internal node sequences\n # provided the upstream message + the message from the complementary subtree\n for node in tree.find_clades(order='preorder'):\n if node.up is None: # skip if node is root\n continue\n\n # integrate the information coming from parents with the information\n # of all children my multiplying it to the prev computed profile\n node.marginal_outgroup_LH, pre = normalize_profile(np.log(node.up.marginal_profile) - node.marginal_log_Lx,\n log=True, return_offset=False)\n tmp_msg_from_parent = self.gtr.evolve(node.marginal_outgroup_LH,\n self._branch_length_to_gtr(node), return_log=False)\n\n node.marginal_profile, pre = normalize_profile(node.marginal_subtree_LH * tmp_msg_from_parent, return_offset=False)\n\n # choose sequence based maximal marginal LH.\n seq, prof_vals, idxs = prof2seq(node.marginal_profile, self.gtr,\n sample_from_prof=other_sample_from_profile, normalize=False)\n\n if hasattr(node, 'cseq') and node.cseq is not None:\n N_diff += (seq!=node.cseq).sum()\n else:\n N_diff += L\n\n #assign new sequence\n node.cseq = seq\n if final:\n if self.is_vcf:\n node.sequence = self.dict_sequence(node)\n else:\n node.sequence = self.expanded_sequence(node)\n node.mutations = self.get_mutations(node)\n\n\n # note that the root doesn't contribute to N_diff (intended, since root sequence is often ambiguous)\n self.logger(\"TreeAnc._ml_anc_marginal: ...done\", 3)\n if store_compressed:\n self._store_compressed_sequence_pairs()\n\n # do clean-up:\n if not debug:\n for node in self.tree.find_clades():\n try:\n del node.marginal_log_Lx\n del node.marginal_subtree_LH_prefactor\n except:\n pass\n gc.collect()\n\n return N_diff"} {"SOURCE": "codesearchnet", "instruction": "How would you explain what the following Python 3 function does\ndef _ml_anc_joint(self, store_compressed=True, final=True, sample_from_profile=False,\n debug=False, **kwargs):\n\n \"\"\"\n Perform joint ML reconstruction of the ancestral states. In contrast to\n marginal reconstructions, this only needs to compare and multiply LH and\n can hence operate in log space.\n\n Parameters\n ----------\n\n store_compressed : bool, default True\n attach a reduced representation of sequence changed to each branch\n\n final : bool, default True\n stop full length by expanding sites with identical alignment patterns\n\n sample_from_profile : str\n This parameter can take the value 'root' in which case probabilistic\n sampling will happen at the root. otherwise sequences at ALL nodes are\n set to the value that jointly optimized the likelihood.\n\n \"\"\"\n N_diff = 0 # number of sites differ from perv reconstruction\n L = self.multiplicity.shape[0]\n n_states = self.gtr.alphabet.shape[0]\n\n self.logger(\"TreeAnc._ml_anc_joint: type of reconstruction: Joint\", 2)\n\n self.logger(\"TreeAnc._ml_anc_joint: Walking up the tree, computing likelihoods... \", 3)\n # for the internal nodes, scan over all states j of this node, maximize the likelihood\n for node in self.tree.find_clades(order='postorder'):\n if node.up is None:\n node.joint_Cx=None # not needed for root\n continue\n\n # preallocate storage\n node.joint_Lx = np.zeros((L, n_states)) # likelihood array\n node.joint_Cx = np.zeros((L, n_states), dtype=int) # max LH indices\n branch_len = self._branch_length_to_gtr(node)\n # transition matrix from parent states to the current node states.\n # denoted as Pij(i), where j - parent state, i - node state\n log_transitions = np.log(np.maximum(ttconf.TINY_NUMBER, self.gtr.expQt(branch_len)))\n if node.is_terminal():\n try:\n msg_from_children = np.log(np.maximum(seq2prof(node.original_cseq, self.gtr.profile_map), ttconf.TINY_NUMBER))\n except:\n raise ValueError(\"sequence assignment to node \"+node.name+\" failed\")\n msg_from_children[np.isnan(msg_from_children) | np.isinf(msg_from_children)] = -ttconf.BIG_NUMBER\n else:\n # Product (sum-Log) over all child subtree likelihoods.\n # this is prod_ch L_x(i)\n msg_from_children = np.sum(np.stack([c.joint_Lx for c in node.clades], axis=0), axis=0)\n\n # for every possible state of the parent node,\n # get the best state of the current node\n # and compute the likelihood of this state\n for char_i, char in enumerate(self.gtr.alphabet):\n # Pij(i) * L_ch(i) for given parent state j\n msg_to_parent = (log_transitions[:,char_i].T + msg_from_children)\n # For this parent state, choose the best state of the current node:\n node.joint_Cx[:, char_i] = msg_to_parent.argmax(axis=1)\n # compute the likelihood of the best state of the current node\n # given the state of the parent (char_i)\n node.joint_Lx[:, char_i] = msg_to_parent.max(axis=1)\n\n # root node profile = likelihood of the total tree\n msg_from_children = np.sum(np.stack([c.joint_Lx for c in self.tree.root], axis = 0), axis=0)\n # Pi(i) * Prod_ch Lch(i)\n self.tree.root.joint_Lx = msg_from_children + np.log(self.gtr.Pi).T\n normalized_profile = (self.tree.root.joint_Lx.T - self.tree.root.joint_Lx.max(axis=1)).T\n\n # choose sequence characters from this profile.\n # treat root node differently to avoid piling up mutations on the longer branch\n if sample_from_profile=='root':\n root_sample_from_profile = True\n elif isinstance(sample_from_profile, bool):\n root_sample_from_profile = sample_from_profile\n\n seq, anc_lh_vals, idxs = prof2seq(np.exp(normalized_profile),\n self.gtr, sample_from_prof = root_sample_from_profile)\n\n # compute the likelihood of the most probable root sequence\n self.tree.sequence_LH = np.choose(idxs, self.tree.root.joint_Lx.T)\n self.tree.sequence_joint_LH = (self.tree.sequence_LH*self.multiplicity).sum()\n self.tree.root.cseq = seq\n self.tree.root.seq_idx = idxs\n if final:\n if self.is_vcf:\n self.tree.root.sequence = self.dict_sequence(self.tree.root)\n else:\n self.tree.root.sequence = self.expanded_sequence(self.tree.root)\n\n self.logger(\"TreeAnc._ml_anc_joint: Walking down the tree, computing maximum likelihood sequences...\",3)\n # for each node, resolve the conditioning on the parent node\n for node in self.tree.find_clades(order='preorder'):\n\n # root node has no mutations, everything else has been alread y set\n if node.up is None:\n node.mutations = []\n continue\n\n # choose the value of the Cx(i), corresponding to the state of the\n # parent node i. This is the state of the current node\n node.seq_idx = np.choose(node.up.seq_idx, node.joint_Cx.T)\n # reconstruct seq, etc\n tmp_sequence = np.choose(node.seq_idx, self.gtr.alphabet)\n if hasattr(node, 'sequence') and node.cseq is not None:\n N_diff += (tmp_sequence!=node.cseq).sum()\n else:\n N_diff += L\n\n node.cseq = tmp_sequence\n if final:\n node.mutations = self.get_mutations(node)\n if self.is_vcf:\n node.sequence = self.dict_sequence(node)\n else:\n node.sequence = self.expanded_sequence(node)\n\n\n self.logger(\"TreeAnc._ml_anc_joint: ...done\", 3)\n if store_compressed:\n self._store_compressed_sequence_pairs()\n\n # do clean-up\n if not debug:\n for node in self.tree.find_clades(order='preorder'):\n del node.joint_Lx\n del node.joint_Cx\n del node.seq_idx\n\n return N_diff", "response": "Compute joint ML reconstruction of the ancestral states."} {"SOURCE": "codesearchnet", "instruction": "Can you tell what is the following Python 3 function doing\ndef _store_compressed_sequence_pairs(self):\n self.logger(\"TreeAnc._store_compressed_sequence_pairs...\",2)\n for node in self.tree.find_clades():\n if node.up is None:\n continue\n self._store_compressed_sequence_to_node(node)\n self.logger(\"TreeAnc._store_compressed_sequence_pairs...done\",3)", "response": "Store the compressed sequence pairs in the tree."} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 script for\nperforming optimization for the branch lengths of the entire tree. This method only does a single path and needs to be iterated. **Note** this method assumes that each node stores information about its sequence as numpy.array object (node.sequence attribute). Therefore, before calling this method, sequence reconstruction with either of the available models must be performed. Parameters ---------- mode : str Optimize branch length assuming the joint ML sequence assignment of both ends of the branch (:code:`joint`), or trace over all possible sequence assignments on both ends of the branch (:code:`marginal`) (slower, experimental). **kwargs : Keyword arguments Keyword Args ------------ verbose : int Output level store_old : bool If True, the old lengths will be saved in :code:`node._old_dist` attribute. Useful for testing, and special post-processing.", "response": "def optimize_branch_length(self, mode='joint', **kwargs):\n \"\"\"\n Perform optimization for the branch lengths of the entire tree.\n This method only does a single path and needs to be iterated.\n\n **Note** this method assumes that each node stores information\n about its sequence as numpy.array object (node.sequence attribute).\n Therefore, before calling this method, sequence reconstruction with\n either of the available models must be performed.\n\n Parameters\n ----------\n\n mode : str\n Optimize branch length assuming the joint ML sequence assignment\n of both ends of the branch (:code:`joint`), or trace over all possible sequence\n assignments on both ends of the branch (:code:`marginal`) (slower, experimental).\n\n **kwargs :\n Keyword arguments\n\n Keyword Args\n ------------\n\n verbose : int\n Output level\n\n store_old : bool\n If True, the old lengths will be saved in :code:`node._old_dist` attribute.\n Useful for testing, and special post-processing.\n\n\n \"\"\"\n\n self.logger(\"TreeAnc.optimize_branch_length: running branch length optimization in mode %s...\"%mode,1)\n if (self.tree is None) or (self.aln is None):\n self.logger(\"TreeAnc.optimize_branch_length: ERROR, alignment or tree are missing\", 0)\n return ttconf.ERROR\n\n store_old_dist = False\n\n if 'store_old' in kwargs:\n store_old_dist = kwargs['store_old']\n\n if mode=='marginal':\n # a marginal ancestral reconstruction is required for\n # marginal branch length inference\n if not hasattr(self.tree.root, \"marginal_profile\"):\n self.infer_ancestral_sequences(marginal=True)\n\n max_bl = 0\n for node in self.tree.find_clades(order='postorder'):\n if node.up is None: continue # this is the root\n if store_old_dist:\n node._old_length = node.branch_length\n\n if mode=='marginal':\n new_len = self.optimal_marginal_branch_length(node)\n elif mode=='joint':\n new_len = self.optimal_branch_length(node)\n else:\n self.logger(\"treeanc.optimize_branch_length: unsupported optimization mode\",4, warn=True)\n new_len = node.branch_length\n\n if new_len < 0:\n continue\n\n self.logger(\"Optimization results: old_len=%.4e, new_len=%.4e, naive=%.4e\"\n \" Updating branch length...\"%(node.branch_length, new_len, len(node.mutations)*self.one_mutation), 5)\n\n node.branch_length = new_len\n node.mutation_length=new_len\n max_bl = max(max_bl, new_len)\n\n # as branch lengths changed, the params must be fixed\n self.tree.root.up = None\n self.tree.root.dist2root = 0.0\n if max_bl>0.15 and mode=='joint':\n self.logger(\"TreeAnc.optimize_branch_length: THIS TREE HAS LONG BRANCHES.\"\n \" \\n\\t ****TreeTime IS NOT DESIGNED TO OPTIMIZE LONG BRANCHES.\"\n \" \\n\\t ****PLEASE OPTIMIZE BRANCHES WITH ANOTHER TOOL AND RERUN WITH\"\n \" \\n\\t ****branch_length_mode='input'\", 0, warn=True)\n self._prepare_nodes()\n return ttconf.SUCCESS"} {"SOURCE": "codesearchnet", "instruction": "How would you explain what the following Python 3 function does\ndef optimal_branch_length(self, node):\n '''\n Calculate optimal branch length given the sequences of node and parent\n\n Parameters\n ----------\n node : PhyloTree.Clade\n TreeNode, attached to the branch.\n\n Returns\n -------\n new_len : float\n Optimal length of the given branch\n\n '''\n if node.up is None:\n return self.one_mutation\n\n parent = node.up\n if hasattr(node, 'compressed_sequence'):\n new_len = self.gtr.optimal_t_compressed(node.compressed_sequence['pair'],\n node.compressed_sequence['multiplicity'])\n else:\n new_len = self.gtr.optimal_t(parent.cseq, node.cseq,\n pattern_multiplicity=self.multiplicity,\n ignore_gaps=self.ignore_gaps)\n return new_len", "response": "Calculate the optimal branch length given the sequences of node and parent."} {"SOURCE": "codesearchnet", "instruction": "Explain what the following Python 3 code does\ndef marginal_branch_profile(self, node):\n '''\n calculate the marginal distribution of sequence states on both ends\n of the branch leading to node,\n\n Parameters\n ----------\n node : PhyloTree.Clade\n TreeNode, attached to the branch.\n\n\n Returns\n -------\n pp, pc : Pair of vectors (profile parent, pp) and (profile child, pc)\n that are of shape (L,n) where L is sequence length and n is alphabet size.\n note that this correspond to the compressed sequences.\n '''\n parent = node.up\n if parent is None:\n raise Exception(\"Branch profiles can't be calculated for the root!\")\n if not hasattr(node, 'marginal_outgroup_LH'):\n raise Exception(\"marginal ancestral inference needs to be performed first!\")\n\n pc = node.marginal_subtree_LH\n pp = node.marginal_outgroup_LH\n return pp, pc", "response": "Calculates the marginal distribution of sequence states on both ends of the branch leading to node and returns the parent profile and child profile."} {"SOURCE": "codesearchnet", "instruction": "Explain what the following Python 3 code does\ndef optimal_marginal_branch_length(self, node, tol=1e-10):\n '''\n calculate the marginal distribution of sequence states on both ends\n of the branch leading to node,\n\n Parameters\n ----------\n node : PhyloTree.Clade\n TreeNode, attached to the branch.\n\n Returns\n -------\n branch_length : float\n branch length of the branch leading to the node.\n note: this can be unstable on iteration\n '''\n\n if node.up is None:\n return self.one_mutation\n pp, pc = self.marginal_branch_profile(node)\n return self.gtr.optimal_t_compressed((pp, pc), self.multiplicity, profiles=True, tol=tol)", "response": "Calculates the optimal marginal distribution of sequence states on both ends of the branch leading to node."} {"SOURCE": "codesearchnet", "instruction": "Given the following Python 3 function, write the documentation\ndef prune_short_branches(self):\n self.logger(\"TreeAnc.prune_short_branches: pruning short branches (max prob at zero)...\", 1)\n for node in self.tree.find_clades():\n if node.up is None or node.is_terminal():\n continue\n\n # probability of the two seqs separated by zero time is not zero\n if self.gtr.prob_t(node.up.cseq, node.cseq, 0.0,\n pattern_multiplicity=self.multiplicity) > 0.1:\n # re-assign the node children directly to its parent\n node.up.clades = [k for k in node.up.clades if k != node] + node.clades\n for clade in node.clades:\n clade.up = node.up", "response": "Removes the short branches from the tree."} {"SOURCE": "codesearchnet", "instruction": "Explain what the following Python 3 code does\ndef optimize_seq_and_branch_len(self,reuse_branch_len=True, prune_short=True,\n marginal_sequences=False, branch_length_mode='joint',\n max_iter=5, infer_gtr=False, **kwargs):\n \"\"\"\n Iteratively set branch lengths and reconstruct ancestral sequences until\n the values of either former or latter do not change. The algorithm assumes\n knowing only the topology of the tree, and requires that sequences are assigned\n to all leaves of the tree.\n\n The first step is to pre-reconstruct ancestral\n states using Fitch reconstruction algorithm or ML using existing branch length\n estimates. Then, optimize branch lengths and re-do reconstruction until\n convergence using ML method.\n\n Parameters\n -----------\n\n reuse_branch_len : bool\n If True, rely on the initial branch lengths, and start with the\n maximum-likelihood ancestral sequence inference using existing branch\n lengths. Otherwise, do initial reconstruction of ancestral states with\n Fitch algorithm, which uses only the tree topology.\n\n prune_short : bool\n If True, the branches with zero optimal length will be pruned from\n the tree, creating polytomies. The polytomies could be further\n processed using :py:meth:`treetime.TreeTime.resolve_polytomies` from the TreeTime class.\n\n marginal_sequences : bool\n Assign sequences to their marginally most likely value, rather than\n the values that are jointly most likely across all nodes.\n\n branch_length_mode : str\n 'joint', 'marginal', or 'input'. Branch lengths are left unchanged in case\n of 'input'. 'joint' and 'marginal' cause branch length optimization\n while setting sequences to the ML value or tracing over all possible\n internal sequence states.\n\n max_iter : int\n Maximal number of times sequence and branch length iteration are optimized\n\n infer_gtr : bool\n Infer a GTR model from the observed substitutions.\n\n \"\"\"\n if branch_length_mode=='marginal':\n marginal_sequences = True\n\n self.logger(\"TreeAnc.optimize_sequences_and_branch_length: sequences...\", 1)\n if reuse_branch_len:\n N_diff = self.reconstruct_anc(method='probabilistic', infer_gtr=infer_gtr,\n marginal=marginal_sequences, **kwargs)\n self.optimize_branch_len(verbose=0, store_old=False, mode=branch_length_mode)\n else:\n N_diff = self.reconstruct_anc(method='fitch', infer_gtr=infer_gtr, **kwargs)\n\n self.optimize_branch_len(verbose=0, store_old=False, marginal=False)\n\n n = 0\n while nija', self.Pi, self.W)\n diag_vals = np.sum(tmp, axis=0)\n for x in range(tmp.shape[-1]):\n np.fill_diagonal(tmp[:,:,x], -diag_vals[:,x])\n return tmp"} {"SOURCE": "codesearchnet", "instruction": "Given the following Python 3 function, write the documentation\ndef assign_rates(self, mu=1.0, pi=None, W=None):\n n = len(self.alphabet)\n self.mu = np.copy(mu)\n\n if pi is not None and pi.shape[0]==n:\n self.seq_len = pi.shape[-1]\n Pi = np.copy(pi)\n else:\n if pi is not None and len(pi)!=n:\n self.logger(\"length of equilibrium frequency vector does not match alphabet length\", 4, warn=True)\n self.logger(\"Ignoring input equilibrium frequencies\", 4, warn=True)\n Pi = np.ones(shape=(n,self.seq_len))\n\n self.Pi = Pi/np.sum(Pi, axis=0)\n\n if W is None or W.shape!=(n,n):\n if (W is not None) and W.shape!=(n,n):\n self.logger(\"Substitution matrix size does not match alphabet size\", 4, warn=True)\n self.logger(\"Ignoring input substitution matrix\", 4, warn=True)\n # flow matrix\n W = np.ones((n,n))\n else:\n W=0.5*(np.copy(W)+np.copy(W).T)\n\n np.fill_diagonal(W,0)\n avg_pi = self.Pi.mean(axis=-1)\n average_rate = W.dot(avg_pi).dot(avg_pi)\n self.W = W/average_rate\n self.mu *=average_rate\n\n self._eig()", "response": "Assign the rates of the current GTR model to the given data."} {"SOURCE": "codesearchnet", "instruction": "Can you generate the documentation for the following Python 3 function\ndef random(cls, L=1, avg_mu=1.0, alphabet='nuc', pi_dirichlet_alpha=1,\n W_dirichlet_alpha=3.0, mu_gamma_alpha=3.0):\n \"\"\"\n Creates a random GTR model\n\n Parameters\n ----------\n\n mu : float\n Substitution rate\n\n alphabet : str\n Alphabet name (should be standard: 'nuc', 'nuc_gap', 'aa', 'aa_gap')\n\n\n \"\"\"\n from scipy.stats import gamma\n alphabet=alphabets[alphabet]\n gtr = cls(alphabet=alphabet, seq_len=L)\n n = gtr.alphabet.shape[0]\n\n if pi_dirichlet_alpha:\n pi = 1.0*gamma.rvs(pi_dirichlet_alpha, size=(n,L))\n else:\n pi = np.ones((n,L))\n\n pi /= pi.sum(axis=0)\n if W_dirichlet_alpha:\n tmp = 1.0*gamma.rvs(W_dirichlet_alpha, size=(n,n))\n else:\n tmp = np.ones((n,n))\n tmp = np.tril(tmp,k=-1)\n W = tmp + tmp.T\n\n if mu_gamma_alpha:\n mu = gamma.rvs(mu_gamma_alpha, size=(L,))\n else:\n mu = np.ones(L)\n\n gtr.assign_rates(mu=mu, pi=pi, W=W)\n gtr.mu *= avg_mu/np.mean(gtr.mu)\n\n return gtr", "response": "Creates a random GTR model from a set of random values."} {"SOURCE": "codesearchnet", "instruction": "How would you implement a function in Python 3 that\ncreates a custom GTR model by specifying the matrix explicitly passed as .", "response": "def custom(cls, mu=1.0, pi=None, W=None, **kwargs):\n \"\"\"\n Create a GTR model by specifying the matrix explicitly\n\n Parameters\n ----------\n\n mu : float\n Substitution rate\n\n W : nxn matrix\n Substitution matrix\n\n pi : n vector\n Equilibrium frequencies\n\n **kwargs:\n Key word arguments to be passed\n\n Keyword Args\n ------------\n\n alphabet : str\n Specify alphabet when applicable. If the alphabet specification is\n required, but no alphabet is specified, the nucleotide alphabet will be used as\n default.\n\n \"\"\"\n gtr = cls(**kwargs)\n gtr.assign_rates(mu=mu, pi=pi, W=W)\n return gtr"} {"SOURCE": "codesearchnet", "instruction": "Implement a Python 3 function for\ninferring a GTR model by specifying the number of transitions and time spent in each character state.", "response": "def infer(cls, sub_ija, T_ia, root_state, pc=0.01,\n gap_limit=0.01, Nit=30, dp=1e-5, **kwargs):\n \"\"\"\n Infer a GTR model by specifying the number of transitions and time spent in each\n character. The basic equation that is being solved is\n\n :math:`n_{ij} = pi_i W_{ij} T_j`\n\n where :math:`n_{ij}` are the transitions, :math:`pi_i` are the equilibrium\n state frequencies, :math:`W_{ij}` is the \"substitution attempt matrix\",\n while :math:`T_i` is the time on the tree spent in character state\n :math:`i`. To regularize the process, we add pseudocounts and also need\n to account for the fact that the root of the tree is in a particular\n state. the modified equation is\n\n :math:`n_{ij} + pc = pi_i W_{ij} (T_j+pc+root\\_state)`\n\n Parameters\n ----------\n\n nija : nxn matrix\n The number of times a change in character state is observed\n between state j and i at position a\n\n Tia :n vector\n The time spent in each character state at position a\n\n root_state : n vector\n The number of characters in state i in the sequence\n of the root node.\n\n pc : float\n Pseudocounts, this determines the lower cutoff on the rate when\n no substitutions are observed\n\n **kwargs:\n Key word arguments to be passed\n\n Keyword Args\n ------------\n\n alphabet : str\n Specify alphabet when applicable. If the alphabet specification\n is required, but no alphabet is specified, the nucleotide alphabet will be used as default.\n\n \"\"\"\n from scipy import linalg as LA\n gtr = cls(**kwargs)\n gtr.logger(\"GTR: model inference \",1)\n q = len(gtr.alphabet)\n L = sub_ija.shape[-1]\n\n n_iter = 0\n n_ija = np.copy(sub_ija)\n n_ija[range(q),range(q),:] = 0\n n_ij = n_ija.sum(axis=-1)\n\n m_ia = np.sum(n_ija,axis=1) + root_state + pc\n n_a = n_ija.sum(axis=1).sum(axis=0) + pc\n\n Lambda = np.sum(root_state,axis=0) + q*pc\n p_ia_old=np.zeros((q,L))\n p_ia = np.ones((q,L))/q\n mu_a = np.ones(L)\n\n W_ij = np.ones((q,q)) - np.eye(q)\n\n while (LA.norm(p_ia_old-p_ia)>dp) and n_itera', p_ia, W_ij, T_ia))\n\n if n_iter >= Nit:\n gtr.logger('WARNING: maximum number of iterations has been reached in GTR inference',3, warn=True)\n if LA.norm(p_ia_old-p_ia) > dp:\n gtr.logger('the iterative scheme has not converged',3,warn=True)\n if gtr.gap_index is not None:\n for p in range(p_ia.shape[-1]):\n if p_ia[gtr.gap_index,p] dp and count < Nit:\n gtr.logger(' '.join(map(str, ['GTR inference iteration',count,'change:',LA.norm(pi_old-pi)])), 3)\n count += 1\n pi_old = np.copy(pi)\n W_ij = (nij+nij.T+2*pc_mat)/mu/(np.outer(pi,Ti) + np.outer(Ti,pi)\n + ttconf.TINY_NUMBER + 2*pc_mat)\n\n np.fill_diagonal(W_ij, 0)\n scale_factor = np.einsum('i,ij,j',pi,W_ij,pi)\n\n W_ij = W_ij/scale_factor\n if fixed_pi is None:\n pi = (np.sum(nij+pc_mat,axis=1)+root_state)/(ttconf.TINY_NUMBER + mu*np.dot(W_ij,Ti)+root_state.sum()+np.sum(pc_mat, axis=1))\n pi /= pi.sum()\n mu = nij.sum()/(ttconf.TINY_NUMBER + np.sum(pi * (W_ij.dot(Ti))))\n if count >= Nit:\n gtr.logger('WARNING: maximum number of iterations has been reached in GTR inference',3, warn=True)\n if LA.norm(pi_old-pi) > dp:\n gtr.logger('the iterative scheme has not converged',3,warn=True)\n elif np.abs(1-np.max(pi.sum(axis=0))) > dp:\n gtr.logger('the iterative scheme has converged, but proper normalization was not reached',3,warn=True)\n if gtr.gap_index is not None:\n if pi[gtr.gap_index] .9 * ttconf.MAX_BRANCH_LENGTH:\n self.logger(\"WARNING: GTR.optimal_t_compressed -- The branch length seems to be very long!\", 4, warn=True)\n\n if opt[\"success\"] != True:\n # return hamming distance: number of state pairs where state differs/all pairs\n new_len = np.sum(multiplicity[seq_pair[:,1]!=seq_pair[:,0]])/np.sum(multiplicity)\n\n return new_len", "response": "This function calculates the optimal distance between two compressed sequences at a given time."} {"SOURCE": "codesearchnet", "instruction": "Can you generate the documentation for the following Python 3 function\ndef prob_t_profiles(self, profile_pair, multiplicity, t,\n return_log=False, ignore_gaps=True):\n '''\n Calculate the probability of observing a node pair at a distance t\n\n Parameters\n ----------\n\n profile_pair: numpy arrays\n Probability distributions of the nucleotides at either\n end of the branch. pp[0] = parent, pp[1] = child\n\n multiplicity : numpy array\n The number of times an alignment pattern is observed\n\n t : float\n Length of the branch separating parent and child\n\n ignore_gaps: bool\n If True, ignore mutations to and from gaps in distance calculations\n\n return_log : bool\n Whether or not to exponentiate the result\n\n '''\n if t<0:\n logP = -ttconf.BIG_NUMBER\n else:\n Qt = self.expQt(t)\n if len(Qt.shape)==3:\n res = np.einsum('ai,ija,aj->a', profile_pair[1], Qt, profile_pair[0])\n else:\n res = np.einsum('ai,ij,aj->a', profile_pair[1], Qt, profile_pair[0])\n if ignore_gaps and (self.gap_index is not None): # calculate the probability that neither outgroup/node has a gap\n non_gap_frac = (1-profile_pair[0][:,self.gap_index])*(1-profile_pair[1][:,self.gap_index])\n # weigh log LH by the non-gap probability\n logP = np.sum(multiplicity*np.log(res)*non_gap_frac)\n else:\n logP = np.sum(multiplicity*np.log(res))\n\n return logP if return_log else np.exp(logP)", "response": "Calculates the probability of observing a node pair at a given distance t."} {"SOURCE": "codesearchnet", "instruction": "How would you explain what the following Python 3 function does\ndef propagate_profile(self, profile, t, return_log=False):\n Qt = self.expQt(t)\n res = profile.dot(Qt)\n\n return np.log(res) if return_log else res", "response": "Given a profile compute the probability of the sequence state of the parent at time t0 and given the sequence state of the child at time t1."} {"SOURCE": "codesearchnet", "instruction": "Given the following Python 3 function, write the documentation\ndef evolve(self, profile, t, return_log=False):\n Qt = self.expQt(t).T\n res = profile.dot(Qt)\n return np.log(res) if return_log else res", "response": "Compute the probability of the sequence state of the child at time t later given the parent profile."} {"SOURCE": "codesearchnet", "instruction": "Implement a Python 3 function for\nreturning the array of values exp that is less than the given time", "response": "def _exp_lt(self, t):\n \"\"\"\n Parameters\n ----------\n\n t : float\n time to propagate\n\n Returns\n --------\n\n exp_lt : numpy.array\n Array of values exp(lambda(i) * t),\n where (i) - alphabet index (the eigenvalue number).\n \"\"\"\n return np.exp(self.mu * t * self.eigenvals)"} {"SOURCE": "codesearchnet", "instruction": "Can you generate the documentation for the following Python 3 function\ndef expQt(self, t):\n '''\n Parameters\n ----------\n\n t : float\n Time to propagate\n\n Returns\n --------\n\n expQt : numpy.array\n Matrix exponential of exo(Qt)\n '''\n eLambdaT = np.diag(self._exp_lt(t)) # vector length = a\n Qs = self.v.dot(eLambdaT.dot(self.v_inv)) # This is P(nuc1 | given nuc_2)\n return np.maximum(0,Qs)", "response": "Calculates the exponential of the avec of avec with respect to the time t."} {"SOURCE": "codesearchnet", "instruction": "How would you explain what the following Python 3 function does\ndef expQsds(self, s):\n '''\n Returns\n -------\n Qtds : Returns 2 V_{ij} \\lambda_j s e^{\\lambda_j s**2 } V^{-1}_{jk}\n This is the derivative of the branch probability with respect to s=\\sqrt(t)\n '''\n lambda_eLambdaT = np.diag(2.0*self._exp_lt(s**2)*self.eigenvals*s) # vector length = a\n Qsds = self.v.dot(lambda_eLambdaT.dot(self.v_inv))\n return Qsds", "response": "Returns the derivative of the derivative of the branch probability with respect to s"} {"SOURCE": "codesearchnet", "instruction": "Can you implement a function in Python 3 that\nreturns the second derivative of the branch probability wrt time", "response": "def expQsdsds(self, s):\n '''\n Returns\n -------\n Qtdtdt : Returns V_{ij} \\lambda_j^2 e^{\\lambda_j s**2} V^{-1}_{jk}\n This is the second derivative of the branch probability wrt time\n '''\n t=s**2\n elt = self._exp_lt(t)\n lambda_eLambdaT = np.diag(elt*(4.0*t*self.eigenvals**2 + 2.0*self.eigenvals))\n Qsdsds = self.v.dot(lambda_eLambdaT.dot(self.v_inv))\n return Qsdsds"} {"SOURCE": "codesearchnet", "instruction": "Can you generate a brief explanation for the following Python 3 code\ndef sequence_logLH(self,seq, pattern_multiplicity=None):\n if pattern_multiplicity is None:\n pattern_multiplicity = np.ones_like(seq, dtype=float)\n return np.sum([np.sum((seq==state)*pattern_multiplicity*np.log(self.Pi[si]))\n for si,state in enumerate(self.alphabet)])", "response": "Returns the log - likelihood of sampling a sequence from equilibrium frequency."} {"SOURCE": "codesearchnet", "instruction": "Make a summary of the following Python 3 code\ndef plot_vs_years(tt, step = None, ax=None, confidence=None, ticks=True, **kwargs):\n '''\n Converts branch length to years and plots the time tree on a time axis.\n\n Parameters\n ----------\n tt : TreeTime object\n A TreeTime instance after a time tree is inferred\n\n step : int\n Width of shaded boxes indicating blocks of years. Will be inferred if not specified.\n To switch off drawing of boxes, set to 0\n\n ax : matplotlib axes\n Axes to be used to plot, will create new axis if None\n\n confidence : tuple, float\n Draw confidence intervals. This assumes that marginal time tree inference was run.\n Confidence intervals are either specified as an interval of the posterior distribution\n like (0.05, 0.95) or as the weight of the maximal posterior region , e.g. 0.9\n\n **kwargs : dict\n Key word arguments that are passed down to Phylo.draw\n\n '''\n import matplotlib.pyplot as plt\n tt.branch_length_to_years()\n nleafs = tt.tree.count_terminals()\n\n if ax is None:\n fig = plt.figure(figsize=(12,10))\n ax = plt.subplot(111)\n else:\n fig = None\n # draw tree\n if \"label_func\" not in kwargs:\n kwargs[\"label_func\"] = lambda x:x.name if (x.is_terminal() and nleafs<30) else \"\"\n Phylo.draw(tt.tree, axes=ax, **kwargs)\n\n offset = tt.tree.root.numdate - tt.tree.root.branch_length\n date_range = np.max([n.numdate for n in tt.tree.get_terminals()])-offset\n\n # estimate year intervals if not explicitly specified\n if step is None or (step>0 and date_range/step>100):\n step = 10**np.floor(np.log10(date_range))\n if date_range/step<2:\n step/=5\n elif date_range/step<5:\n step/=2\n step = max(1.0/12,step)\n\n # set axis labels\n if step:\n dtick = step\n min_tick = step*(offset//step)\n extra = dtick if dtick=xlim[0] and pos<=xlim[1] and ticks:\n label_str = str(step*(year//step)) if step<1 else str(int(year))\n ax.text(pos,ylim[0]-0.04*(ylim[1]-ylim[0]), label_str,\n horizontalalignment='center')\n ax.set_axis_off()\n\n # add confidence intervals to the tree graph -- grey bars\n if confidence:\n tree_layout(tt.tree)\n if not hasattr(tt.tree.root, \"marginal_inverse_cdf\"):\n print(\"marginal time tree reconstruction required for confidence intervals\")\n return ttconf.ERROR\n elif type(confidence) is float:\n cfunc = tt.get_max_posterior_region\n elif len(confidence)==2:\n cfunc = tt.get_confidence_interval\n else:\n print(\"confidence needs to be either a float (for max posterior region) or a two numbers specifying lower and upper bounds\")\n return ttconf.ERROR\n\n for n in tt.tree.find_clades():\n pos = cfunc(n, confidence)\n ax.plot(pos-offset, np.ones(len(pos))*n.ypos, lw=3, c=(0.5,0.5,0.5))\n return fig, ax", "response": "Plots the time tree vs. years of the time tree."} {"SOURCE": "codesearchnet", "instruction": "Can you tell what is the following Python 3 function doing\ndef run(self, root=None, infer_gtr=True, relaxed_clock=None, n_iqd = None,\n resolve_polytomies=True, max_iter=0, Tc=None, fixed_clock_rate=None,\n time_marginal=False, sequence_marginal=False, branch_length_mode='auto',\n vary_rate=False, use_covariation=False, **kwargs):\n\n \"\"\"\n Run TreeTime reconstruction. Based on the input parameters, it divides\n the analysis into semi-independent jobs and conquers them one-by-one,\n gradually optimizing the tree given the temporal constarints and leaf\n node sequences.\n\n Parameters\n ----------\n root : str\n Try to find better root position on a given tree. If string is passed,\n the root will be searched according to the specified method. If none,\n use tree as-is.\n\n See :py:meth:`treetime.TreeTime.reroot` for available rooting methods.\n\n infer_gtr : bool\n If True, infer GTR model\n\n relaxed_clock : dic\n If not None, use autocorrelated molecular clock model. Specify the\n clock parameters as :code:`{slack:, coupling:}` dictionary.\n\n n_iqd : int\n If not None, filter tree nodes which do not obey the molecular clock\n for the particular tree. The nodes, which deviate more than\n :code:`n_iqd` interquantile intervals from the molecular clock\n regression will be marked as 'BAD' and not used in the TreeTime\n analysis\n\n resolve_polytomies : bool\n If True, attempt to resolve multiple mergers\n\n max_iter : int\n Maximum number of iterations to optimize the tree\n\n Tc : float, str\n If not None, use coalescent model to correct the branch lengths by\n introducing merger costs.\n\n If Tc is float, it is interpreted as the coalescence time scale\n\n If Tc is str, it should be one of (:code:`opt`, :code:`const`, :code:`skyline`)\n\n fixed_clock_rate : float\n Fixed clock rate to be used. If None, infer clock rate from the molecular clock.\n\n time_marginal : bool\n If True, perform a final round of marginal reconstruction of the node's positions.\n\n sequence_marginal : bool, optional\n use marginal reconstruction for ancestral sequences\n\n branch_length_mode : str\n Should be one of: :code:`joint`, :code:`marginal`, :code:`input`.\n\n If 'input', rely on the branch lengths in the input tree and skip directly\n to the maximum-likelihood ancestral sequence reconstruction.\n Otherwise, perform preliminary sequence reconstruction using parsimony\n algorithm and do branch length optimization\n\n vary_rate : bool or float, optional\n redo the time tree estimation for rates +/- one standard deviation.\n if a float is passed, it is interpreted as standard deviation,\n otherwise this standard deviation is estimated from the root-to-tip regression\n\n use_covariation : bool, optional\n default False, if False, rate estimates will be performed using simple\n regression ignoring phylogenetic covaration between nodes.\n\n **kwargs\n Keyword arguments needed by the downstream functions\n\n\n Returns\n -------\n TreeTime error/succces code : str\n return value depending on success or error\n\n\n \"\"\"\n\n # register the specified covaration mode\n self.use_covariation = use_covariation\n\n if (self.tree is None) or (self.aln is None and self.seq_len is None):\n self.logger(\"TreeTime.run: ERROR, alignment or tree are missing\", 0)\n return ttconf.ERROR\n if (self.aln is None):\n branch_length_mode='input'\n\n self._set_branch_length_mode(branch_length_mode)\n\n # determine how to reconstruct and sample sequences\n seq_kwargs = {\"marginal_sequences\":sequence_marginal or (self.branch_length_mode=='marginal'),\n \"sample_from_profile\":\"root\"}\n\n tt_kwargs = {'clock_rate':fixed_clock_rate, 'time_marginal':False}\n tt_kwargs.update(kwargs)\n\n seq_LH = 0\n if \"fixed_pi\" in kwargs:\n seq_kwargs[\"fixed_pi\"] = kwargs[\"fixed_pi\"]\n if \"do_marginal\" in kwargs:\n time_marginal=kwargs[\"do_marginal\"]\n\n # initially, infer ancestral sequences and infer gtr model if desired\n if self.branch_length_mode=='input':\n if self.aln:\n self.infer_ancestral_sequences(infer_gtr=infer_gtr, **seq_kwargs)\n self.prune_short_branches()\n else:\n self.optimize_sequences_and_branch_length(infer_gtr=infer_gtr,\n max_iter=1, prune_short=True, **seq_kwargs)\n avg_root_to_tip = np.mean([x.dist2root for x in self.tree.get_terminals()])\n\n # optionally reroot the tree either by oldest, best regression or with a specific leaf\n if n_iqd or root=='clock_filter':\n if \"plot_rtt\" in kwargs and kwargs[\"plot_rtt\"]:\n plot_rtt=True\n else:\n plot_rtt=False\n reroot_mechanism = 'least-squares' if root=='clock_filter' else root\n if self.clock_filter(reroot=reroot_mechanism, n_iqd=n_iqd, plot=plot_rtt)==ttconf.ERROR:\n return ttconf.ERROR\n elif root is not None:\n if self.reroot(root=root)==ttconf.ERROR:\n return ttconf.ERROR\n\n if self.branch_length_mode=='input':\n if self.aln:\n self.infer_ancestral_sequences(**seq_kwargs)\n else:\n self.optimize_sequences_and_branch_length(max_iter=1, prune_short=False,\n **seq_kwargs)\n\n # infer time tree and optionally resolve polytomies\n self.logger(\"###TreeTime.run: INITIAL ROUND\",0)\n self.make_time_tree(**tt_kwargs)\n\n if self.aln:\n seq_LH = self.tree.sequence_marginal_LH if seq_kwargs['marginal_sequences'] else self.tree.sequence_joint_LH\n self.LH =[[seq_LH, self.tree.positional_joint_LH, 0]]\n\n if root is not None and max_iter:\n if self.reroot(root='least-squares' if root=='clock_filter' else root)==ttconf.ERROR:\n return ttconf.ERROR\n\n # iteratively reconstruct ancestral sequences and re-infer\n # time tree to ensure convergence.\n niter = 0\n ndiff = 0\n need_new_time_tree=False\n while niter < max_iter:\n self.logger(\"###TreeTime.run: ITERATION %d out of %d iterations\"%(niter+1,max_iter),0)\n # add coalescent prior\n if Tc:\n if Tc=='skyline' and niter0.1:\n bl_mode = 'input'\n else:\n bl_mode = 'joint'\n self.logger(\"TreeTime._set_branch_length_mode: maximum branch length is %1.3e, using branch length mode %s\"%(max_bl, bl_mode),1)\n self.branch_length_mode = bl_mode\n else:\n self.branch_length_mode = 'input'", "response": "Set the branch length mode according to the empirical branch length distribution in input tree."} {"SOURCE": "codesearchnet", "instruction": "Given the following Python 3 function, write the documentation\ndef clock_filter(self, reroot='least-squares', n_iqd=None, plot=False):\n '''\n Labels outlier branches that don't seem to follow a molecular clock\n and excludes them from subsequent molecular clock estimation and\n the timetree propagation.\n\n Parameters\n ----------\n reroot : str\n Method to find the best root in the tree (see :py:meth:`treetime.TreeTime.reroot` for options)\n\n n_iqd : int\n Number of iqd intervals. The outlier nodes are those which do not fall\n into :math:`IQD\\cdot n_iqd` interval (:math:`IQD` is the interval between\n 75\\ :sup:`th` and 25\\ :sup:`th` percentiles)\n\n If None, the default (3) assumed\n\n plot : bool\n If True, plot the results\n\n '''\n if n_iqd is None:\n n_iqd = ttconf.NIQD\n if type(reroot) is list and len(reroot)==1:\n reroot=str(reroot[0])\n\n terminals = self.tree.get_terminals()\n if reroot:\n if self.reroot(root='least-squares' if reroot=='best' else reroot, covariation=False)==ttconf.ERROR:\n return ttconf.ERROR\n else:\n self.get_clock_model(covariation=False)\n\n clock_rate = self.clock_model['slope']\n icpt = self.clock_model['intercept']\n res = {}\n for node in terminals:\n if hasattr(node, 'raw_date_constraint') and (node.raw_date_constraint is not None):\n res[node] = node.dist2root - clock_rate*np.mean(node.raw_date_constraint) - icpt\n\n residuals = np.array(list(res.values()))\n iqd = np.percentile(residuals,75) - np.percentile(residuals,25)\n for node,r in res.items():\n if abs(r)>n_iqd*iqd and node.up.up is not None:\n self.logger('TreeTime.ClockFilter: marking %s as outlier, residual %f interquartile distances'%(node.name,r/iqd), 3, warn=True)\n node.bad_branch=True\n else:\n node.bad_branch=False\n\n # redo root estimation after outlier removal\n if reroot and self.reroot(root=reroot)==ttconf.ERROR:\n return ttconf.ERROR\n\n if plot:\n self.plot_root_to_tip()\n\n return ttconf.SUCCESS", "response": "This method filters outlier branches that do not follow a molecular clock and excludes them from subsequent molecular clock estimation and the timetree propagation."} {"SOURCE": "codesearchnet", "instruction": "Create a Python 3 function for\nplotting the root - to - tip regression of the current node.", "response": "def plot_root_to_tip(self, add_internal=False, label=True, ax=None):\n \"\"\"\n Plot root-to-tip regression\n\n Parameters\n ----------\n add_internal : bool\n If true, plot inte`rnal node positions\n\n label : bool\n If true, label the plots\n\n ax : matplotlib axes\n If not None, use the provided matplotlib axes to plot the results\n \"\"\"\n Treg = self.setup_TreeRegression()\n if self.clock_model and 'cov' in self.clock_model:\n cf = self.clock_model['valid_confidence']\n else:\n cf = False\n Treg.clock_plot(ax=ax, add_internal=add_internal, confidence=cf, n_sigma=2,\n regression=self.clock_model)"} {"SOURCE": "codesearchnet", "instruction": "Implement a Python 3 function for\nfinding best root and re-root the tree to the new root Parameters ---------- root : str Which method should be used to find the best root. Available methods are: :code:`best`, `least-squares` - minimize squared residual or likelihood of root-to-tip regression :code:`min_dev` - minimize variation of root-to-tip distance :code:`oldest` - reroot on the oldest node :code:`` - reroot to the node with name :code:`` :code:`[, , ...]` - reroot to the MRCA of these nodes force_positive : bool only consider positive rates when searching for the optimal root covariation : bool account for covariation in root-to-tip regression", "response": "def reroot(self, root='least-squares', force_positive=True, covariation=None):\n \"\"\"\n Find best root and re-root the tree to the new root\n\n Parameters\n ----------\n\n root : str\n Which method should be used to find the best root. Available methods are:\n\n :code:`best`, `least-squares` - minimize squared residual or likelihood of root-to-tip regression\n\n :code:`min_dev` - minimize variation of root-to-tip distance\n\n :code:`oldest` - reroot on the oldest node\n\n :code:`` - reroot to the node with name :code:``\n\n :code:`[, , ...]` - reroot to the MRCA of these nodes\n\n force_positive : bool\n only consider positive rates when searching for the optimal root\n\n covariation : bool\n account for covariation in root-to-tip regression\n \"\"\"\n if type(root) is list and len(root)==1:\n root=str(root[0])\n\n if root=='best':\n root='least-squares'\n\n use_cov = self.use_covariation if covariation is None else covariation\n\n self.logger(\"TreeTime.reroot: with method or node: %s\"%root,0)\n for n in self.tree.find_clades():\n n.branch_length=n.mutation_length\n\n if (type(root) is str) and \\\n (root in rerooting_mechanisms or root in deprecated_rerooting_mechanisms):\n if root in deprecated_rerooting_mechanisms:\n if \"ML\" in root:\n use_cov=True\n self.logger('TreeTime.reroot: rerooting mechanisms %s has been renamed to %s'\n %(root, deprecated_rerooting_mechanisms[root]), 1, warn=True)\n root = deprecated_rerooting_mechanisms[root]\n\n self.logger(\"TreeTime.reroot: rerooting will %s covariance and shared ancestry.\"%(\"account for\" if use_cov else \"ignore\"),0)\n new_root = self._find_best_root(covariation=use_cov,\n slope = 0.0 if root.startswith('min_dev') else None,\n force_positive=force_positive and (not root.startswith('min_dev')))\n else:\n if isinstance(root,Phylo.BaseTree.Clade):\n new_root = root\n elif isinstance(root, list):\n new_root = self.tree.common_ancestor(*root)\n elif root in self._leaves_lookup:\n new_root = self._leaves_lookup[root]\n elif root=='oldest':\n new_root = sorted([n for n in self.tree.get_terminals()\n if n.raw_date_constraint is not None],\n key=lambda x:np.mean(x.raw_date_constraint))[0]\n else:\n self.logger('TreeTime.reroot -- ERROR: unsupported rooting mechanisms or root not found',0,warn=True)\n return ttconf.ERROR\n\n #this forces a bifurcating root, as we want. Branch lengths will be reoptimized anyway.\n #(Without outgroup_branch_length, gives a trifurcating root, but this will mean\n #mutations may have to occur multiple times.)\n self.tree.root_with_outgroup(new_root, outgroup_branch_length=new_root.branch_length/2)\n self.get_clock_model(covariation=use_cov)\n\n\n if new_root == ttconf.ERROR:\n return ttconf.ERROR\n\n self.logger(\"TreeTime.reroot: Tree was re-rooted to node \"\n +('new_node' if new_root.name is None else new_root.name), 2)\n\n self.tree.root.branch_length = self.one_mutation\n self.tree.root.clock_length = self.one_mutation\n self.tree.root.raw_date_constraint = None\n if hasattr(new_root, 'time_before_present'):\n self.tree.root.time_before_present = new_root.time_before_present\n if hasattr(new_root, 'numdate'):\n self.tree.root.numdate = new_root.numdate\n # set root.gamma bc root doesn't have a branch_length_interpolator but gamma is needed\n if not hasattr(self.tree.root, 'gamma'):\n self.tree.root.gamma = 1.0\n for n in self.tree.find_clades():\n n.mutation_length = n.branch_length\n if not hasattr(n, 'clock_length'):\n n.clock_length = n.branch_length\n self.prepare_tree()\n\n self.get_clock_model(covariation=self.use_covariation)\n\n return ttconf.SUCCESS"} {"SOURCE": "codesearchnet", "instruction": "Explain what the following Python 3 code does\ndef resolve_polytomies(self, merge_compressed=False):\n self.logger(\"TreeTime.resolve_polytomies: resolving multiple mergers...\",1)\n poly_found=0\n\n for n in self.tree.find_clades():\n if len(n.clades) > 2:\n prior_n_clades = len(n.clades)\n self._poly(n, merge_compressed)\n poly_found+=prior_n_clades - len(n.clades)\n\n obsolete_nodes = [n for n in self.tree.find_clades() if len(n.clades)==1 and n.up is not None]\n for node in obsolete_nodes:\n self.logger('TreeTime.resolve_polytomies: remove obsolete node '+node.name,4)\n if node.up is not None:\n self.tree.collapse(node)\n\n if poly_found:\n self.logger('TreeTime.resolve_polytomies: introduces %d new nodes'%poly_found,3)\n else:\n self.logger('TreeTime.resolve_polytomies: No more polytomies to resolve',3)\n return poly_found", "response": "Resolves the polytomies on the tree."} {"SOURCE": "codesearchnet", "instruction": "Here you have a function in Python 3, explain what it does\ndef _poly(self, clade, merge_compressed):\n\n \"\"\"\n Function to resolve polytomies for a given parent node. If the\n number of the direct decendants is less than three (not a polytomy), does\n nothing. Otherwise, for each pair of nodes, assess the possible LH increase\n which could be gained by merging the two nodes. The increase in the LH is\n basically the tradeoff between the gain of the LH due to the changing the\n branch lenghts towards the optimal values and the decrease due to the\n introduction of the new branch with zero optimal length.\n \"\"\"\n\n from .branch_len_interpolator import BranchLenInterpolator\n\n zero_branch_slope = self.gtr.mu*self.seq_len\n\n def _c_gain(t, n1, n2, parent):\n \"\"\"\n cost gain if nodes n1, n2 are joined and their parent is placed at time t\n cost gain = (LH loss now) - (LH loss when placed at time t)\n \"\"\"\n cg2 = n2.branch_length_interpolator(parent.time_before_present - n2.time_before_present) - n2.branch_length_interpolator(t - n2.time_before_present)\n cg1 = n1.branch_length_interpolator(parent.time_before_present - n1.time_before_present) - n1.branch_length_interpolator(t - n1.time_before_present)\n cg_new = - zero_branch_slope * (parent.time_before_present - t) # loss in LH due to the new branch\n return -(cg2+cg1+cg_new)\n\n def cost_gain(n1, n2, parent):\n \"\"\"\n cost gained if the two nodes would have been connected.\n \"\"\"\n try:\n cg = sciopt.minimize_scalar(_c_gain,\n bounds=[max(n1.time_before_present,n2.time_before_present), parent.time_before_present],\n method='Bounded',args=(n1,n2, parent))\n return cg['x'], - cg['fun']\n except:\n self.logger(\"TreeTime._poly.cost_gain: optimization of gain failed\", 3, warn=True)\n return parent.time_before_present, 0.0\n\n\n def merge_nodes(source_arr, isall=False):\n mergers = np.array([[cost_gain(n1,n2, clade) if i1 1 + int(isall):\n # max possible gains of the cost when connecting the nodes:\n # this is only a rough approximation because it assumes the new node positions\n # to be optimal\n new_positions = mergers[:,:,0]\n cost_gains = mergers[:,:,1]\n # set zero to large negative value and find optimal pair\n np.fill_diagonal(cost_gains, -1e11)\n idxs = np.unravel_index(cost_gains.argmax(),cost_gains.shape)\n if (idxs[0] == idxs[1]) or cost_gains.max()<0:\n self.logger(\"TreeTime._poly.merge_nodes: node is not fully resolved \"+clade.name,4)\n return LH\n\n n1, n2 = source_arr[idxs[0]], source_arr[idxs[1]]\n LH += cost_gains[idxs]\n\n new_node = Phylo.BaseTree.Clade()\n\n # fix positions and branch lengths\n new_node.time_before_present = new_positions[idxs]\n new_node.branch_length = clade.time_before_present - new_node.time_before_present\n new_node.clades = [n1,n2]\n n1.branch_length = new_node.time_before_present - n1.time_before_present\n n2.branch_length = new_node.time_before_present - n2.time_before_present\n\n # set parameters for the new node\n new_node.up = clade\n n1.up = new_node\n n2.up = new_node\n if hasattr(clade, \"cseq\"):\n new_node.cseq = clade.cseq\n self._store_compressed_sequence_to_node(new_node)\n\n new_node.mutations = []\n new_node.mutation_length = 0.0\n new_node.branch_length_interpolator = BranchLenInterpolator(new_node, self.gtr, one_mutation=self.one_mutation,\n branch_length_mode = self.branch_length_mode)\n clade.clades.remove(n1)\n clade.clades.remove(n2)\n clade.clades.append(new_node)\n self.logger('TreeTime._poly.merge_nodes: creating new node as child of '+clade.name,3)\n self.logger(\"TreeTime._poly.merge_nodes: Delta-LH = \" + str(cost_gains[idxs].round(3)), 3)\n\n # and modify source_arr array for the next loop\n if len(source_arr)>2: # if more than 3 nodes in polytomy, replace row/column\n for ii in np.sort(idxs)[::-1]:\n tmp_ind = np.arange(mergers.shape[0])!=ii\n mergers = mergers[tmp_ind].swapaxes(0,1)\n mergers = mergers[tmp_ind].swapaxes(0,1)\n\n source_arr.remove(n1)\n source_arr.remove(n2)\n new_gains = np.array([[cost_gain(n1,new_node, clade) for n1 in source_arr]])\n mergers = np.vstack((mergers, new_gains)).swapaxes(0,1)\n\n source_arr.append(new_node)\n new_gains = np.array([[cost_gain(n1,new_node, clade) for n1 in source_arr]])\n mergers = np.vstack((mergers, new_gains)).swapaxes(0,1)\n else: # otherwise just recalculate matrix\n source_arr.remove(n1)\n source_arr.remove(n2)\n source_arr.append(new_node)\n mergers = np.array([[cost_gain(n1,n2, clade) for n1 in source_arr]\n for n2 in source_arr])\n\n return LH\n\n stretched = [c for c in clade.clades if c.mutation_length < c.clock_length]\n compressed = [c for c in clade.clades if c not in stretched]\n\n if len(stretched)==1 and merge_compressed is False:\n return 0.0\n\n LH = merge_nodes(stretched, isall=len(stretched)==len(clade.clades))\n if merge_compressed and len(compressed)>1:\n LH += merge_nodes(compressed, isall=len(compressed)==len(clade.clades))\n\n return LH", "response": "Function to resolve the polytomies for a given parent node."} {"SOURCE": "codesearchnet", "instruction": "Can you implement a function in Python 3 that\nprints the total likelihood of the tree given the constrained leaves and the constrained leaves.", "response": "def print_lh(self, joint=True):\n \"\"\"\n Print the total likelihood of the tree given the constrained leaves\n\n Parameters\n ----------\n\n joint : bool\n If true, print joint LH, else print marginal LH\n\n \"\"\"\n try:\n u_lh = self.tree.unconstrained_sequence_LH\n if joint:\n s_lh = self.tree.sequence_joint_LH\n t_lh = self.tree.positional_joint_LH\n c_lh = self.tree.coalescent_joint_LH\n else:\n s_lh = self.tree.sequence_marginal_LH\n t_lh = self.tree.positional_marginal_LH\n c_lh = 0\n\n print (\"### Tree Log-Likelihood ###\\n\"\n \" Sequence log-LH without constraints: \\t%1.3f\\n\"\n \" Sequence log-LH with constraints: \\t%1.3f\\n\"\n \" TreeTime sequence log-LH: \\t%1.3f\\n\"\n \" Coalescent log-LH: \\t%1.3f\\n\"\n \"#########################\"%(u_lh, s_lh,t_lh, c_lh))\n except:\n print(\"ERROR. Did you run the corresponding inference (joint/marginal)?\")"} {"SOURCE": "codesearchnet", "instruction": "Can you write a function in Python 3 where it\nadds a coalescent model to the tree and optionally optimze the coalescent model with the specified time scale.", "response": "def add_coalescent_model(self, Tc, **kwargs):\n \"\"\"Add a coalescent model to the tree and optionally optimze\n\n Parameters\n ----------\n Tc : float,str\n If this is a float, it will be interpreted as the inverse merger\n rate in molecular clock units, if its is a\n \"\"\"\n from .merger_models import Coalescent\n self.logger('TreeTime.run: adding coalescent prior with Tc='+str(Tc),1)\n self.merger_model = Coalescent(self.tree,\n date2dist=self.date2dist, logger=self.logger)\n\n if Tc=='skyline': # restrict skyline model optimization to last iteration\n self.merger_model.optimize_skyline(**kwargs)\n self.logger(\"optimized a skyline \", 2)\n else:\n if Tc in ['opt', 'const']:\n self.merger_model.optimize_Tc()\n self.logger(\"optimized Tc to %f\"%self.merger_model.Tc.y[0], 2)\n else:\n try:\n self.merger_model.set_Tc(Tc)\n except:\n self.logger(\"setting of coalescent time scale failed\", 1, warn=True)\n\n self.merger_model.attach_to_tree()"} {"SOURCE": "codesearchnet", "instruction": "Create a Python 3 function for\nallowing the mutation rate to vary on the tree (relaxed molecular clock). Changes of the mutation rates from one branch to another are penalized. In addition, deviation of the mutation rate from the mean rate is penalized. Parameters ---------- slack : float Maximum change in substitution rate between parent and child nodes coupling : float Maximum difference in substitution rates in sibling nodes", "response": "def relaxed_clock(self, slack=None, coupling=None, **kwargs):\n \"\"\"\n Allow the mutation rate to vary on the tree (relaxed molecular clock).\n Changes of the mutation rates from one branch to another are penalized.\n In addition, deviation of the mutation rate from the mean rate is\n penalized.\n\n Parameters\n ----------\n slack : float\n Maximum change in substitution rate between parent and child nodes\n\n coupling : float\n Maximum difference in substitution rates in sibling nodes\n\n \"\"\"\n if slack is None: slack=ttconf.MU_ALPHA\n if coupling is None: coupling=ttconf.MU_BETA\n self.logger(\"TreeTime.relaxed_clock: slack=%f, coupling=%f\"%(slack, coupling),2)\n\n c=1.0/self.one_mutation\n for node in self.tree.find_clades(order='postorder'):\n opt_len = node.mutation_length\n act_len = node.clock_length if hasattr(node, 'clock_length') else node.branch_length\n\n # opt_len \\approx 1.0*len(node.mutations)/node.profile.shape[0] but calculated via gtr model\n # stiffness is the expectation of the inverse variance of branch length (one_mutation/opt_len)\n # contact term: stiffness*(g*bl - bl_opt)^2 + slack(g-1)^2 =\n # (slack+bl^2) g^2 - 2 (bl*bl_opt+1) g + C= k2 g^2 + k1 g + C\n node._k2 = slack + c*act_len**2/(opt_len+self.one_mutation)\n node._k1 = -2*(c*act_len*opt_len/(opt_len+self.one_mutation) + slack)\n # coupling term: \\sum_c coupling*(g-g_c)^2 + Cost_c(g_c|g)\n # given g, g_c needs to be optimal-> 2*coupling*(g-g_c) = 2*child.k2 g_c + child.k1\n # hence g_c = (coupling*g - 0.5*child.k1)/(coupling+child.k2)\n # substituting yields\n for child in node.clades:\n denom = coupling+child._k2\n node._k2 += coupling*(1.0-coupling/denom)**2 + child._k2*coupling**2/denom**2\n node._k1 += (coupling*(1.0-coupling/denom)*child._k1/denom \\\n - coupling*child._k1*child._k2/denom**2 \\\n + coupling*child._k1/denom)\n\n for node in self.tree.find_clades(order='preorder'):\n if node.up is None:\n node.gamma = max(0.1, -0.5*node._k1/node._k2)\n else:\n if node.up.up is None:\n g_up = node.up.gamma\n else:\n g_up = node.up.branch_length_interpolator.gamma\n node.branch_length_interpolator.gamma = max(0.1,(coupling*g_up - 0.5*node._k1)/(coupling+node._k2))"} {"SOURCE": "codesearchnet", "instruction": "Can you generate the documentation for the following Python 3 function\ndef _find_best_root(self, covariation=True, force_positive=True, slope=0, **kwarks):\n '''\n Determine the node that, when the tree is rooted on this node, results\n in the best regression of temporal constraints and root to tip distances.\n\n Parameters\n ----------\n\n infer_gtr : bool\n If True, infer new GTR model after re-root\n\n covariation : bool\n account for covariation structure when rerooting the tree\n\n force_positive : bool\n only accept positive evolutionary rate estimates when rerooting the tree\n\n '''\n for n in self.tree.find_clades():\n n.branch_length=n.mutation_length\n self.logger(\"TreeTime._find_best_root: searching for the best root position...\",2)\n Treg = self.setup_TreeRegression(covariation=covariation)\n return Treg.optimal_reroot(force_positive=force_positive, slope=slope)['node']", "response": "Find the best root node in the tree."} {"SOURCE": "codesearchnet", "instruction": "How would you code a function in Python 3 to\nfunction that attempts to load a tree and build it from the alignment", "response": "def assure_tree(params, tmp_dir='treetime_tmp'):\n \"\"\"\n Function that attempts to load a tree and build it from the alignment\n if no tree is provided.\n \"\"\"\n if params.tree is None:\n params.tree = os.path.basename(params.aln)+'.nwk'\n print(\"No tree given: inferring tree\")\n utils.tree_inference(params.aln, params.tree, tmp_dir = tmp_dir)\n\n if os.path.isdir(tmp_dir):\n shutil.rmtree(tmp_dir)\n\n try:\n tt = TreeAnc(params.tree)\n except:\n print(\"Tree loading/building failed.\")\n return 1\n return 0"} {"SOURCE": "codesearchnet", "instruction": "Create a Python 3 function for\ncreating a GTR model from the input arguments.", "response": "def create_gtr(params):\n \"\"\"\n parse the arguments referring to the GTR model and return a GTR structure\n \"\"\"\n model = params.gtr\n gtr_params = params.gtr_params\n if model == 'infer':\n gtr = GTR.standard('jc', alphabet='aa' if params.aa else 'nuc')\n else:\n try:\n kwargs = {}\n if gtr_params is not None:\n for param in gtr_params:\n keyval = param.split('=')\n if len(keyval)!=2: continue\n if keyval[0] in ['pis', 'pi', 'Pi', 'Pis']:\n keyval[0] = 'pi'\n keyval[1] = list(map(float, keyval[1].split(',')))\n elif keyval[0] not in ['alphabet']:\n keyval[1] = float(keyval[1])\n kwargs[keyval[0]] = keyval[1]\n else:\n print (\"GTR params are not specified. Creating GTR model with default parameters\")\n\n gtr = GTR.standard(model, **kwargs)\n infer_gtr = False\n except:\n print (\"Could not create GTR model from input arguments. Using default (Jukes-Cantor 1969)\")\n gtr = GTR.standard('jc', alphabet='aa' if params.aa else 'nuc')\n infer_gtr = False\n return gtr"} {"SOURCE": "codesearchnet", "instruction": "Make a summary of the following Python 3 code\ndef read_if_vcf(params):\n ref = None\n aln = params.aln\n fixed_pi = None\n if hasattr(params, 'aln') and params.aln is not None:\n if any([params.aln.lower().endswith(x) for x in ['.vcf', '.vcf.gz']]):\n if not params.vcf_reference:\n print(\"ERROR: a reference Fasta is required with VCF-format alignments\")\n return -1\n compress_seq = read_vcf(params.aln, params.vcf_reference)\n sequences = compress_seq['sequences']\n ref = compress_seq['reference']\n aln = sequences\n\n if not hasattr(params, 'gtr') or params.gtr==\"infer\": #if not specified, set it:\n alpha = alphabets['aa'] if params.aa else alphabets['nuc']\n fixed_pi = [ref.count(base)/len(ref) for base in alpha]\n if fixed_pi[-1] == 0:\n fixed_pi[-1] = 0.05\n fixed_pi = [v-0.01 for v in fixed_pi]\n\n return aln, ref, fixed_pi", "response": "Checks if input is VCF and reads in appropriately if it is\n "} {"SOURCE": "codesearchnet", "instruction": "Make a summary of the following Python 3 code\ndef scan_homoplasies(params):\n if assure_tree(params, tmp_dir='homoplasy_tmp'):\n return 1\n\n gtr = create_gtr(params)\n\n ###########################################################################\n ### READ IN VCF\n ###########################################################################\n #sets ref and fixed_pi to None if not VCF\n aln, ref, fixed_pi = read_if_vcf(params)\n is_vcf = True if ref is not None else False\n\n ###########################################################################\n ### ANCESTRAL RECONSTRUCTION\n ###########################################################################\n treeanc = TreeAnc(params.tree, aln=aln, ref=ref, gtr=gtr, verbose=1,\n fill_overhangs=True)\n if treeanc.aln is None: # if alignment didn't load, exit\n return 1\n\n if is_vcf:\n L = len(ref) + params.const\n else:\n L = treeanc.aln.get_alignment_length() + params.const\n\n N_seq = len(treeanc.aln)\n N_tree = treeanc.tree.count_terminals()\n if params.rescale!=1.0:\n for n in treeanc.tree.find_clades():\n n.branch_length *= params.rescale\n n.mutation_length = n.branch_length\n\n print(\"read alignment from file %s with %d sequences of length %d\"%(params.aln,N_seq,L))\n print(\"read tree from file %s with %d leaves\"%(params.tree,N_tree))\n print(\"\\ninferring ancestral sequences...\")\n\n ndiff = treeanc.infer_ancestral_sequences('ml', infer_gtr=params.gtr=='infer',\n marginal=False, fixed_pi=fixed_pi)\n print(\"...done.\")\n if ndiff==ttconf.ERROR: # if reconstruction failed, exit\n print(\"Something went wrong during ancestral reconstruction, please check your input files.\", file=sys.stderr)\n return 1\n else:\n print(\"...done.\")\n\n if is_vcf:\n treeanc.recover_var_ambigs()\n\n ###########################################################################\n ### analysis of reconstruction\n ###########################################################################\n from collections import defaultdict\n from scipy.stats import poisson\n offset = 0 if params.zero_based else 1\n\n if params.drms:\n DRM_info = read_in_DRMs(params.drms, offset)\n drms = DRM_info['DRMs']\n\n # construct dictionaries gathering mutations and positions\n mutations = defaultdict(list)\n positions = defaultdict(list)\n terminal_mutations = defaultdict(list)\n for n in treeanc.tree.find_clades():\n if n.up is None:\n continue\n\n if len(n.mutations):\n for (a,pos, d) in n.mutations:\n if '-' not in [a,d] and 'N' not in [a,d]:\n mutations[(a,pos+offset,d)].append(n)\n positions[pos+offset].append(n)\n if n.is_terminal():\n for (a,pos, d) in n.mutations:\n if '-' not in [a,d] and 'N' not in [a,d]:\n terminal_mutations[(a,pos+offset,d)].append(n)\n\n # gather homoplasic mutations by strain\n mutation_by_strain = defaultdict(list)\n for n in treeanc.tree.get_terminals():\n for a,pos,d in n.mutations:\n if pos+offset in positions and len(positions[pos+offset])>1:\n if '-' not in [a,d] and 'N' not in [a,d]:\n mutation_by_strain[n.name].append([(a,pos+offset,d), len(positions[pos])])\n\n\n # total_branch_length is the expected number of substitutions\n # corrected_branch_length is the expected number of observable substitutions\n # (probability of an odd number of substitutions at a particular site)\n total_branch_length = treeanc.tree.total_branch_length()\n corrected_branch_length = np.sum([np.exp(-x.branch_length)*np.sinh(x.branch_length)\n for x in treeanc.tree.find_clades()])\n corrected_terminal_branch_length = np.sum([np.exp(-x.branch_length)*np.sinh(x.branch_length)\n for x in treeanc.tree.get_terminals()])\n expected_mutations = L*corrected_branch_length\n expected_terminal_mutations = L*corrected_terminal_branch_length\n\n # make histograms and sum mutations in different categories\n multiplicities = np.bincount([len(x) for x in mutations.values()])\n total_mutations = np.sum([len(x) for x in mutations.values()])\n\n multiplicities_terminal = np.bincount([len(x) for x in terminal_mutations.values()])\n terminal_mutation_count = np.sum([len(x) for x in terminal_mutations.values()])\n\n multiplicities_positions = np.bincount([len(x) for x in positions.values()])\n multiplicities_positions[0] = L - np.sum(multiplicities_positions)\n\n ###########################################################################\n ### Output the distribution of times particular mutations are observed\n ###########################################################################\n print(\"\\nThe TOTAL tree length is %1.3e and %d mutations were observed.\"\n %(total_branch_length,total_mutations))\n print(\"Of these %d mutations,\"%total_mutations\n +\"\".join(['\\n\\t - %d occur %d times'%(n,mi)\n for mi,n in enumerate(multiplicities) if n]))\n # additional optional output this for terminal mutations only\n if params.detailed:\n print(\"\\nThe TERMINAL branch length is %1.3e and %d mutations were observed.\"\n %(corrected_terminal_branch_length,terminal_mutation_count))\n print(\"Of these %d mutations,\"%terminal_mutation_count\n +\"\".join(['\\n\\t - %d occur %d times'%(n,mi)\n for mi,n in enumerate(multiplicities_terminal) if n]))\n\n\n ###########################################################################\n ### Output the distribution of times mutations at particular positions are observed\n ###########################################################################\n print(\"\\nOf the %d positions in the genome,\"%L\n +\"\".join(['\\n\\t - %d were hit %d times (expected %1.2f)'%(n,mi,L*poisson.pmf(mi,1.0*total_mutations/L))\n for mi,n in enumerate(multiplicities_positions) if n]))\n\n\n # compare that distribution to a Poisson distribution with the same mean\n p = poisson.pmf(np.arange(10*multiplicities_positions.max()),1.0*total_mutations/L)\n print(\"\\nlog-likelihood difference to Poisson distribution with same mean: %1.3e\"%(\n - L*np.sum(p*np.log(p+1e-100))\n + np.sum(multiplicities_positions*np.log(p[:len(multiplicities_positions)]+1e-100))))\n\n\n ###########################################################################\n ### Output the mutations that are observed most often\n ###########################################################################\n if params.drms:\n print(\"\\n\\nThe ten most homoplasic mutations are:\\n\\tmut\\tmultiplicity\\tDRM details (gene drug AAmut)\")\n mutations_sorted = sorted(mutations.items(), key=lambda x:len(x[1])-0.1*x[0][1]/L, reverse=True)\n for mut, val in mutations_sorted[:params.n]:\n if len(val)>1:\n print(\"\\t%s%d%s\\t%d\\t%s\"%(mut[0], mut[1], mut[2], len(val),\n \" \".join([drms[mut[1]]['gene'], drms[mut[1]]['drug'], drms[mut[1]]['alt_base'][mut[2]]]) if mut[1] in drms else \"\"))\n else:\n break\n else:\n print(\"\\n\\nThe ten most homoplasic mutations are:\\n\\tmut\\tmultiplicity\")\n mutations_sorted = sorted(mutations.items(), key=lambda x:len(x[1])-0.1*x[0][1]/L, reverse=True)\n for mut, val in mutations_sorted[:params.n]:\n if len(val)>1:\n print(\"\\t%s%d%s\\t%d\"%(mut[0], mut[1], mut[2], len(val)))\n else:\n break\n\n # optional output specifically for mutations on terminal branches\n if params.detailed:\n if params.drms:\n print(\"\\n\\nThe ten most homoplasic mutation on terminal branches are:\\n\\tmut\\tmultiplicity\\tDRM details (gene drug AAmut)\")\n terminal_mutations_sorted = sorted(terminal_mutations.items(), key=lambda x:len(x[1])-0.1*x[0][1]/L, reverse=True)\n for mut, val in terminal_mutations_sorted[:params.n]:\n if len(val)>1:\n print(\"\\t%s%d%s\\t%d\\t%s\"%(mut[0], mut[1], mut[2], len(val),\n \" \".join([drms[mut[1]]['gene'], drms[mut[1]]['drug'], drms[mut[1]]['alt_base'][mut[2]]]) if mut[1] in drms else \"\"))\n else:\n break\n else:\n print(\"\\n\\nThe ten most homoplasic mutation on terminal branches are:\\n\\tmut\\tmultiplicity\")\n terminal_mutations_sorted = sorted(terminal_mutations.items(), key=lambda x:len(x[1])-0.1*x[0][1]/L, reverse=True)\n for mut, val in terminal_mutations_sorted[:params.n]:\n if len(val)>1:\n print(\"\\t%s%d%s\\t%d\"%(mut[0], mut[1], mut[2], len(val)))\n else:\n break\n\n ###########################################################################\n ### Output strains that have many homoplasic mutations\n ###########################################################################\n # TODO: add statistical criterion\n if params.detailed:\n if params.drms:\n print(\"\\n\\nTaxons that carry positions that mutated elsewhere in the tree:\\n\\ttaxon name\\t#of homoplasic mutations\\t# DRM\")\n mutation_by_strain_sorted = sorted(mutation_by_strain.items(), key=lambda x:len(x[1]), reverse=True)\n for name, val in mutation_by_strain_sorted[:params.n]:\n if len(val):\n print(\"\\t%s\\t%d\\t%d\"%(name, len(val),\n len([mut for mut,l in val if mut[1] in drms])))\n else:\n print(\"\\n\\nTaxons that carry positions that mutated elsewhere in the tree:\\n\\ttaxon name\\t#of homoplasic mutations\")\n mutation_by_strain_sorted = sorted(mutation_by_strain.items(), key=lambda x:len(x[1]), reverse=True)\n for name, val in mutation_by_strain_sorted[:params.n]:\n if len(val):\n print(\"\\t%s\\t%d\"%(name, len(val)))\n\n\n return 0", "response": "function implementing treetime homoplasies\n "} {"SOURCE": "codesearchnet", "instruction": "Can you create a Python 3 function that\nreconstructs the ancestral tree", "response": "def ancestral_reconstruction(params):\n \"\"\"\n implementing treetime ancestral\n \"\"\"\n\n # set up\n if assure_tree(params, tmp_dir='ancestral_tmp'):\n return 1\n\n outdir = get_outdir(params, '_ancestral')\n basename = get_basename(params, outdir)\n\n gtr = create_gtr(params)\n\n ###########################################################################\n ### READ IN VCF\n ###########################################################################\n #sets ref and fixed_pi to None if not VCF\n aln, ref, fixed_pi = read_if_vcf(params)\n is_vcf = True if ref is not None else False\n\n treeanc = TreeAnc(params.tree, aln=aln, ref=ref, gtr=gtr, verbose=1,\n fill_overhangs=not params.keep_overhangs)\n ndiff =treeanc.infer_ancestral_sequences('ml', infer_gtr=params.gtr=='infer',\n marginal=params.marginal, fixed_pi=fixed_pi)\n if ndiff==ttconf.ERROR: # if reconstruction failed, exit\n return 1\n\n ###########################################################################\n ### OUTPUT and saving of results\n ###########################################################################\n if params.gtr==\"infer\":\n print('\\nInferred GTR model:')\n print(treeanc.gtr)\n\n export_sequences_and_tree(treeanc, basename, is_vcf, params.zero_based,\n report_ambiguous=params.report_ambiguous)\n\n return 0"} {"SOURCE": "codesearchnet", "instruction": "Can you generate a brief explanation for the following Python 3 code\ndef mugration(params):\n\n ###########################################################################\n ### Parse states\n ###########################################################################\n if os.path.isfile(params.states):\n states = pd.read_csv(params.states, sep='\\t' if params.states[-3:]=='tsv' else ',',\n skipinitialspace=True)\n else:\n print(\"file with states does not exist\")\n return 1\n\n outdir = get_outdir(params, '_mugration')\n\n taxon_name = 'name' if 'name' in states.columns else states.columns[0]\n if params.attribute:\n if params.attribute in states.columns:\n attr = params.attribute\n else:\n print(\"The specified attribute was not found in the metadata file \"+params.states, file=sys.stderr)\n print(\"Available columns are: \"+\", \".join(states.columns), file=sys.stderr)\n return 1\n else:\n attr = states.columns[1]\n print(\"Attribute for mugration inference was not specified. Using \"+attr, file=sys.stderr)\n\n leaf_to_attr = {x[taxon_name]:x[attr] for xi, x in states.iterrows()\n if x[attr]!=params.missing_data}\n unique_states = sorted(set(leaf_to_attr.values()))\n nc = len(unique_states)\n if nc>180:\n print(\"mugration: can't have more than 180 states!\", file=sys.stderr)\n return 1\n elif nc<2:\n print(\"mugration: only one or zero states found -- this doesn't make any sense\", file=sys.stderr)\n return 1\n\n ###########################################################################\n ### make a single character alphabet that maps to discrete states\n ###########################################################################\n alphabet = [chr(65+i) for i,state in enumerate(unique_states)]\n missing_char = chr(65+nc)\n letter_to_state = {a:unique_states[i] for i,a in enumerate(alphabet)}\n letter_to_state[missing_char]=params.missing_data\n reverse_alphabet = {v:k for k,v in letter_to_state.items()}\n\n ###########################################################################\n ### construct gtr model\n ###########################################################################\n if params.weights:\n params.infer_gtr = True\n tmp_weights = pd.read_csv(params.weights, sep='\\t' if params.states[-3:]=='tsv' else ',',\n skipinitialspace=True)\n weights = {row[0]:row[1] for ri,row in tmp_weights.iterrows()}\n mean_weight = np.mean(list(weights.values()))\n weights = np.array([weights[c] if c in weights else mean_weight for c in unique_states], dtype=float)\n weights/=weights.sum()\n else:\n weights = np.ones(nc, dtype=float)/nc\n\n # set up dummy matrix\n W = np.ones((nc,nc), dtype=float)\n\n mugration_GTR = GTR.custom(pi = weights, W=W, alphabet = np.array(alphabet))\n mugration_GTR.profile_map[missing_char] = np.ones(nc)\n mugration_GTR.ambiguous=missing_char\n\n ###########################################################################\n ### set up treeanc\n ###########################################################################\n treeanc = TreeAnc(params.tree, gtr=mugration_GTR, verbose=params.verbose,\n convert_upper=False, one_mutation=0.001)\n pseudo_seqs = [SeqRecord(id=n.name,name=n.name,\n seq=Seq(reverse_alphabet[leaf_to_attr[n.name]]\n if n.name in leaf_to_attr else missing_char))\n for n in treeanc.tree.get_terminals()]\n treeanc.aln = MultipleSeqAlignment(pseudo_seqs)\n\n ndiff = treeanc.infer_ancestral_sequences(method='ml', infer_gtr=True,\n store_compressed=False, pc=params.pc, marginal=True, normalized_rate=False,\n fixed_pi=weights if params.weights else None)\n if ndiff==ttconf.ERROR: # if reconstruction failed, exit\n return 1\n\n\n ###########################################################################\n ### output\n ###########################################################################\n print(\"\\nCompleted mugration model inference of attribute '%s' for\"%attr,params.tree)\n\n basename = get_basename(params, outdir)\n gtr_name = basename + 'GTR.txt'\n with open(gtr_name, 'w') as ofile:\n ofile.write('Character to attribute mapping:\\n')\n for state in unique_states:\n ofile.write(' %s: %s\\n'%(reverse_alphabet[state], state))\n ofile.write('\\n\\n'+str(treeanc.gtr)+'\\n')\n print(\"\\nSaved inferred mugration model as:\", gtr_name)\n\n terminal_count = 0\n for n in treeanc.tree.find_clades():\n if n.up is None:\n continue\n n.confidence=None\n # due to a bug in older versions of biopython that truncated filenames in nexus export\n # we truncate them by hand and make them unique.\n if n.is_terminal() and len(n.name)>40 and bioversion<\"1.69\":\n n.name = n.name[:35]+'_%03d'%terminal_count\n terminal_count+=1\n n.comment= '&%s=\"'%attr + letter_to_state[n.sequence[0]] +'\"'\n\n if params.confidence:\n conf_name = basename+'confidence.csv'\n with open(conf_name, 'w') as ofile:\n ofile.write('#name, '+', '.join(unique_states)+'\\n')\n for n in treeanc.tree.find_clades():\n ofile.write(n.name + ', '+', '.join([str(x) for x in n.marginal_profile[0]])+'\\n')\n print(\"Saved table with ancestral state confidences as:\", conf_name)\n\n # write tree to file\n outtree_name = basename+'annotated_tree.nexus'\n Phylo.write(treeanc.tree, outtree_name, 'nexus')\n print(\"Saved annotated tree as:\",outtree_name)\n\n return 0", "response": "Parse the states file and return the treetime mugration sequence."} {"SOURCE": "codesearchnet", "instruction": "Can you generate the documentation for the following Python 3 function\ndef estimate_clock_model(params):\n\n if assure_tree(params, tmp_dir='clock_model_tmp'):\n return 1\n dates = utils.parse_dates(params.dates)\n if len(dates)==0:\n return 1\n\n outdir = get_outdir(params, '_clock')\n\n ###########################################################################\n ### READ IN VCF\n ###########################################################################\n #sets ref and fixed_pi to None if not VCF\n aln, ref, fixed_pi = read_if_vcf(params)\n is_vcf = True if ref is not None else False\n\n ###########################################################################\n ### ESTIMATE ROOT (if requested) AND DETERMINE TEMPORAL SIGNAL\n ###########################################################################\n if params.aln is None and params.sequence_length is None:\n print(\"one of arguments '--aln' and '--sequence-length' is required.\", file=sys.stderr)\n return 1\n\n basename = get_basename(params, outdir)\n myTree = TreeTime(dates=dates, tree=params.tree, aln=aln, gtr='JC69',\n verbose=params.verbose, seq_len=params.sequence_length,\n ref=ref)\n myTree.tip_slack=params.tip_slack\n if myTree.tree is None:\n print(\"ERROR: tree loading failed. exiting...\")\n return 1\n\n if params.clock_filter:\n n_bad = [n.name for n in myTree.tree.get_terminals() if n.bad_branch]\n myTree.clock_filter(n_iqd=params.clock_filter, reroot=params.reroot or 'least-squares')\n n_bad_after = [n.name for n in myTree.tree.get_terminals() if n.bad_branch]\n if len(n_bad_after)>len(n_bad):\n print(\"The following leaves don't follow a loose clock and \"\n \"will be ignored in rate estimation:\\n\\t\"\n +\"\\n\\t\".join(set(n_bad_after).difference(n_bad)))\n\n if not params.keep_root:\n # reroot to optimal root, this assigns clock_model to myTree\n if params.covariation: # this requires branch length estimates\n myTree.run(root=\"least-squares\", max_iter=0,\n use_covariation=params.covariation)\n\n res = myTree.reroot(params.reroot,\n force_positive=not params.allow_negative_rate)\n myTree.get_clock_model(covariation=params.covariation)\n\n if res==ttconf.ERROR:\n print(\"ERROR: unknown root or rooting mechanism!\\n\"\n \"\\tvalid choices are 'least-squares', 'ML', and 'ML-rough'\")\n return 1\n else:\n myTree.get_clock_model(covariation=params.covariation)\n\n d2d = utils.DateConversion.from_regression(myTree.clock_model)\n print('\\n',d2d)\n print('The R^2 value indicates the fraction of variation in'\n '\\nroot-to-tip distance explained by the sampling times.'\n '\\nHigher values corresponds more clock-like behavior (max 1.0).')\n\n print('\\nThe rate is the slope of the best fit of the date to'\n '\\nthe root-to-tip distance and provides an estimate of'\n '\\nthe substitution rate. The rate needs to be positive!'\n '\\nNegative rates suggest an inappropriate root.\\n')\n\n print('\\nThe estimated rate and tree correspond to a root date:')\n if params.covariation:\n reg = myTree.clock_model\n dp = np.array([reg['intercept']/reg['slope']**2,-1./reg['slope']])\n droot = np.sqrt(reg['cov'][:2,:2].dot(dp).dot(dp))\n print('\\n--- root-date:\\t %3.2f +/- %1.2f (one std-dev)\\n\\n'%(-d2d.intercept/d2d.clock_rate, droot))\n else:\n print('\\n--- root-date:\\t %3.2f\\n\\n'%(-d2d.intercept/d2d.clock_rate))\n\n if not params.keep_root:\n # write rerooted tree to file\n outtree_name = basename+'rerooted.newick'\n Phylo.write(myTree.tree, outtree_name, 'newick')\n print(\"--- re-rooted tree written to \\n\\t%s\\n\"%outtree_name)\n\n table_fname = basename+'rtt.csv'\n with open(table_fname, 'w') as ofile:\n ofile.write(\"#name, date, root-to-tip distance\\n\")\n ofile.write(\"#Dates of nodes that didn't have a specified date are inferred from the root-to-tip regression.\\n\")\n for n in myTree.tree.get_terminals():\n if hasattr(n, \"raw_date_constraint\") and (n.raw_date_constraint is not None):\n if np.isscalar(n.raw_date_constraint):\n tmp_str = str(n.raw_date_constraint)\n elif len(n.raw_date_constraint):\n tmp_str = str(n.raw_date_constraint[0])+'-'+str(n.raw_date_constraint[1])\n else:\n tmp_str = ''\n ofile.write(\"%s, %s, %f\\n\"%(n.name, tmp_str, n.dist2root))\n else:\n ofile.write(\"%s, %f, %f\\n\"%(n.name, d2d.numdate_from_dist2root(n.dist2root), n.dist2root))\n for n in myTree.tree.get_nonterminals(order='preorder'):\n ofile.write(\"%s, %f, %f\\n\"%(n.name, d2d.numdate_from_dist2root(n.dist2root), n.dist2root))\n print(\"--- wrote dates and root-to-tip distances to \\n\\t%s\\n\"%table_fname)\n\n\n ###########################################################################\n ### PLOT AND SAVE RESULT\n ###########################################################################\n plot_rtt(myTree, outdir+params.plot_rtt)\n return 0", "response": "estimate the clock model for a single tree time"} {"SOURCE": "codesearchnet", "instruction": "How would you code a function in Python 3 to\ncreate delta function distribution.", "response": "def delta_function(cls, x_pos, weight=1., min_width=MIN_INTEGRATION_PEAK):\n \"\"\"\n Create delta function distribution.\n \"\"\"\n\n distribution = cls(x_pos,0.,is_log=True, min_width=min_width)\n distribution.weight = weight\n return distribution"} {"SOURCE": "codesearchnet", "instruction": "Make a summary of the following Python 3 code\ndef multiply(dists):\n '''\n multiplies a list of Distribution objects\n '''\n if not all([isinstance(k, Distribution) for k in dists]):\n raise NotImplementedError(\"Can only multiply Distribution objects\")\n\n n_delta = np.sum([k.is_delta for k in dists])\n min_width = np.max([k.min_width for k in dists])\n if n_delta>1:\n raise ArithmeticError(\"Cannot multiply more than one delta functions!\")\n elif n_delta==1:\n delta_dist_ii = np.where([k.is_delta for k in dists])[0][0]\n delta_dist = dists[delta_dist_ii]\n new_xpos = delta_dist.peak_pos\n new_weight = np.prod([k.prob(new_xpos) for k in dists if k!=delta_dist_ii]) * delta_dist.weight\n res = Distribution.delta_function(new_xpos, weight = new_weight,min_width=min_width)\n else:\n new_xmin = np.max([k.xmin for k in dists])\n new_xmax = np.min([k.xmax for k in dists])\n\n x_vals = np.unique(np.concatenate([k.x for k in dists]))\n x_vals = x_vals[(x_vals>new_xmin-TINY_NUMBER)&(x_valsself.tree.count_terminals()-3:\n self.logger(\"ERROR: ALMOST NO VALID DATE CONSTRAINTS, EXITING\", 1, warn=True)\n return ttconf.ERROR\n\n return ttconf.SUCCESS"} {"SOURCE": "codesearchnet", "instruction": "Implement a Python 3 function for\nsetting the precision of the current locale", "response": "def _set_precision(self, precision):\n '''\n function that sets precision to an (hopfully) reasonable guess based\n on the length of the sequence if not explicitly set\n '''\n # if precision is explicitly specified, use it.\n\n if self.one_mutation:\n self.min_width = 10*self.one_mutation\n else:\n self.min_width = 0.001\n if precision in [0,1,2,3]:\n self.precision=precision\n if self.one_mutation and self.one_mutation<1e-4 and precision<2:\n self.logger(\"ClockTree._set_precision: FOR LONG SEQUENCES (>1e4) precision>=2 IS RECOMMENDED.\"\n \" \\n\\t **** precision %d was specified by the user\"%precision, level=0)\n else:\n # otherwise adjust it depending on the minimal sensible branch length\n if self.one_mutation:\n if self.one_mutation>1e-4:\n self.precision=1\n else:\n self.precision=2\n else:\n self.precision=1\n self.logger(\"ClockTree: Setting precision to level %s\"%self.precision, 2)\n\n if self.precision==0:\n self.node_grid_points = ttconf.NODE_GRID_SIZE_ROUGH\n self.branch_grid_points = ttconf.BRANCH_GRID_SIZE_ROUGH\n self.n_integral = ttconf.N_INTEGRAL_ROUGH\n elif self.precision==2:\n self.node_grid_points = ttconf.NODE_GRID_SIZE_FINE\n self.branch_grid_points = ttconf.BRANCH_GRID_SIZE_FINE\n self.n_integral = ttconf.N_INTEGRAL_FINE\n elif self.precision==3:\n self.node_grid_points = ttconf.NODE_GRID_SIZE_ULTRA\n self.branch_grid_points = ttconf.BRANCH_GRID_SIZE_ULTRA\n self.n_integral = ttconf.N_INTEGRAL_ULTRA\n else:\n self.node_grid_points = ttconf.NODE_GRID_SIZE\n self.branch_grid_points = ttconf.BRANCH_GRID_SIZE\n self.n_integral = ttconf.N_INTEGRAL"} {"SOURCE": "codesearchnet", "instruction": "Here you have a function in Python 3, explain what it does\ndef init_date_constraints(self, ancestral_inference=False, clock_rate=None, **kwarks):\n self.logger(\"ClockTree.init_date_constraints...\",2)\n self.tree.coalescent_joint_LH = 0\n if self.aln and (ancestral_inference or (not hasattr(self.tree.root, 'sequence'))):\n self.infer_ancestral_sequences('probabilistic', marginal=self.branch_length_mode=='marginal',\n sample_from_profile='root',**kwarks)\n\n # set the None for the date-related attributes in the internal nodes.\n # make interpolation objects for the branches\n self.logger('ClockTree.init_date_constraints: Initializing branch length interpolation objects...',3)\n has_clock_length = []\n for node in self.tree.find_clades(order='postorder'):\n if node.up is None:\n node.branch_length_interpolator = None\n else:\n has_clock_length.append(hasattr(node, 'clock_length'))\n # copy the merger rate and gamma if they are set\n if hasattr(node,'branch_length_interpolator') and node.branch_length_interpolator is not None:\n gamma = node.branch_length_interpolator.gamma\n merger_cost = node.branch_length_interpolator.merger_cost\n else:\n gamma = 1.0\n merger_cost = None\n\n if self.branch_length_mode=='marginal':\n node.profile_pair = self.marginal_branch_profile(node)\n\n node.branch_length_interpolator = BranchLenInterpolator(node, self.gtr,\n pattern_multiplicity = self.multiplicity, min_width=self.min_width,\n one_mutation=self.one_mutation, branch_length_mode=self.branch_length_mode)\n\n node.branch_length_interpolator.merger_cost = merger_cost\n node.branch_length_interpolator.gamma = gamma\n\n # use covariance in clock model only after initial timetree estimation is done\n use_cov = (np.sum(has_clock_length) > len(has_clock_length)*0.7) and self.use_covariation\n self.get_clock_model(covariation=use_cov, slope=clock_rate)\n\n # make node distribution objects\n for node in self.tree.find_clades(order=\"postorder\"):\n # node is constrained\n if hasattr(node, 'raw_date_constraint') and node.raw_date_constraint is not None:\n # set the absolute time before present in branch length units\n if np.isscalar(node.raw_date_constraint):\n tbp = self.date2dist.get_time_before_present(node.raw_date_constraint)\n node.date_constraint = Distribution.delta_function(tbp, weight=1.0, min_width=self.min_width)\n else:\n tbp = self.date2dist.get_time_before_present(np.array(node.raw_date_constraint))\n node.date_constraint = Distribution(tbp, np.ones_like(tbp), is_log=False, min_width=self.min_width)\n\n if hasattr(node, 'bad_branch') and node.bad_branch is True:\n self.logger(\"ClockTree.init_date_constraints -- WARNING: Branch is marked as bad\"\n \", excluding it from the optimization process.\\n\"\n \"\\t\\tDate constraint will be ignored!\", 4, warn=True)\n else: # node without sampling date set\n node.raw_date_constraint = None\n node.date_constraint = None", "response": "Initializes the internal branch length interpolation objects for the internal nodes."} {"SOURCE": "codesearchnet", "instruction": "Here you have a function in Python 3, explain what it does\ndef make_time_tree(self, time_marginal=False, clock_rate=None, **kwargs):\n '''\n Use the date constraints to calculate the most likely positions of\n unconstrained nodes.\n\n Parameters\n ----------\n\n time_marginal : bool\n If true, use marginal reconstruction for node positions\n\n **kwargs\n Key word arguments to initialize dates constraints\n\n '''\n self.logger(\"ClockTree: Maximum likelihood tree optimization with temporal constraints\",1)\n\n self.init_date_constraints(clock_rate=clock_rate, **kwargs)\n\n if time_marginal:\n self._ml_t_marginal(assign_dates = time_marginal==\"assign\")\n else:\n self._ml_t_joint()\n\n self.convert_dates()", "response": "This function creates a time tree based on the given parameters."} {"SOURCE": "codesearchnet", "instruction": "How would you explain what the following Python 3 function does\ndef _ml_t_joint(self):\n\n def _cleanup():\n for node in self.tree.find_clades():\n del node.joint_pos_Lx\n del node.joint_pos_Cx\n\n\n self.logger(\"ClockTree - Joint reconstruction: Propagating leaves -> root...\", 2)\n # go through the nodes from leaves towards the root:\n for node in self.tree.find_clades(order='postorder'): # children first, msg to parents\n # Lx is the maximal likelihood of a subtree given the parent position\n # Cx is the branch length corresponding to the maximally likely subtree\n if node.bad_branch:\n # no information at the node\n node.joint_pos_Lx = None\n node.joint_pos_Cx = None\n else: # all other nodes\n if node.date_constraint is not None and node.date_constraint.is_delta: # there is a time constraint\n # subtree probability given the position of the parent node\n # Lx.x is the position of the parent node\n # Lx.y is the probablity of the subtree (consisting of one terminal node in this case)\n # Cx.y is the branch length corresponding the optimal subtree\n bl = node.branch_length_interpolator.x\n x = bl + node.date_constraint.peak_pos\n node.joint_pos_Lx = Distribution(x, node.branch_length_interpolator(bl),\n min_width=self.min_width, is_log=True)\n node.joint_pos_Cx = Distribution(x, bl, min_width=self.min_width) # map back to the branch length\n else: # all nodes without precise constraint but positional information\n msgs_to_multiply = [node.date_constraint] if node.date_constraint is not None else []\n msgs_to_multiply.extend([child.joint_pos_Lx for child in node.clades\n if child.joint_pos_Lx is not None])\n\n # subtree likelihood given the node's constraint and child messages\n if len(msgs_to_multiply) == 0: # there are no constraints\n node.joint_pos_Lx = None\n node.joint_pos_Cx = None\n continue\n elif len(msgs_to_multiply)>1: # combine the different msgs and constraints\n subtree_distribution = Distribution.multiply(msgs_to_multiply)\n else: # there is exactly one constraint.\n subtree_distribution = msgs_to_multiply[0]\n\n if node.up is None: # this is the root, set dates\n subtree_distribution._adjust_grid(rel_tol=self.rel_tol_prune)\n # set root position and joint likelihood of the tree\n node.time_before_present = subtree_distribution.peak_pos\n node.joint_pos_Lx = subtree_distribution\n node.joint_pos_Cx = None\n node.clock_length = node.branch_length\n else: # otherwise propagate to parent\n res, res_t = NodeInterpolator.convolve(subtree_distribution,\n node.branch_length_interpolator,\n max_or_integral='max',\n inverse_time=True,\n n_grid_points = self.node_grid_points,\n n_integral=self.n_integral,\n rel_tol=self.rel_tol_refine)\n\n res._adjust_grid(rel_tol=self.rel_tol_prune)\n\n node.joint_pos_Lx = res\n node.joint_pos_Cx = res_t\n\n\n # go through the nodes from root towards the leaves and assign joint ML positions:\n self.logger(\"ClockTree - Joint reconstruction: Propagating root -> leaves...\", 2)\n for node in self.tree.find_clades(order='preorder'): # root first, msgs to children\n\n if node.up is None: # root node\n continue # the position was already set on the previous step\n\n if node.joint_pos_Cx is None: # no constraints or branch is bad - reconstruct from the branch len interpolator\n node.branch_length = node.branch_length_interpolator.peak_pos\n\n elif isinstance(node.joint_pos_Cx, Distribution):\n # NOTE the Lx distribution is the likelihood, given the position of the parent\n # (Lx.x = parent position, Lx.y = LH of the node_pos given Lx.x,\n # the length of the branch corresponding to the most likely\n # subtree is node.Cx(node.time_before_present))\n subtree_LH = node.joint_pos_Lx(node.up.time_before_present)\n node.branch_length = node.joint_pos_Cx(max(node.joint_pos_Cx.xmin,\n node.up.time_before_present)+ttconf.TINY_NUMBER)\n\n node.time_before_present = node.up.time_before_present - node.branch_length\n node.clock_length = node.branch_length\n\n # just sanity check, should never happen:\n if node.branch_length < 0 or node.time_before_present < 0:\n if node.branch_length<0 and node.branch_length>-ttconf.TINY_NUMBER:\n self.logger(\"ClockTree - Joint reconstruction: correcting rounding error of %s\"%node.name, 4)\n node.branch_length = 0\n\n self.tree.positional_joint_LH = self.timetree_likelihood()\n # cleanup, if required\n if not self.debug:\n _cleanup()", "response": "Compute the joint maximum likelihood assignment of the internal nodes positions by propagating from the tree leaves towards the root."} {"SOURCE": "codesearchnet", "instruction": "Implement a Python 3 function for\nreturning the likelihood of the data given the current branch length in the tree.", "response": "def timetree_likelihood(self):\n '''\n Return the likelihood of the data given the current branch length in the tree\n '''\n LH = 0\n for node in self.tree.find_clades(order='preorder'): # sum the likelihood contributions of all branches\n if node.up is None: # root node\n continue\n LH -= node.branch_length_interpolator(node.branch_length)\n\n # add the root sequence LH and return\n if self.aln:\n LH += self.gtr.sequence_logLH(self.tree.root.cseq, pattern_multiplicity=self.multiplicity)\n return LH"} {"SOURCE": "codesearchnet", "instruction": "Create a Python 3 function for\ncomputing the marginal probability distribution of the internal nodes positions by propagating from the tree leaves towards the root.", "response": "def _ml_t_marginal(self, assign_dates=False):\n \"\"\"\n Compute the marginal probability distribution of the internal nodes positions by\n propagating from the tree leaves towards the root. The result of\n this operation are the probability distributions of each internal node,\n conditional on the constraints on all leaves of the tree, which have sampling dates.\n The probability distributions are set as marginal_pos_LH attributes to the nodes.\n\n Parameters\n ----------\n\n assign_dates : bool, default False\n If True, the inferred dates will be assigned to the nodes as\n :code:`time_before_present' attributes, and their branch lengths\n will be corrected accordingly.\n .. Note::\n Normally, the dates are assigned by running joint reconstruction.\n\n Returns\n -------\n\n None\n Every internal node is assigned the probability distribution in form\n of an interpolation object and sends this distribution further towards the\n root.\n\n \"\"\"\n\n def _cleanup():\n for node in self.tree.find_clades():\n try:\n del node.marginal_pos_Lx\n del node.subtree_distribution\n del node.msg_from_parent\n #del node.marginal_pos_LH\n except:\n pass\n\n\n self.logger(\"ClockTree - Marginal reconstruction: Propagating leaves -> root...\", 2)\n # go through the nodes from leaves towards the root:\n for node in self.tree.find_clades(order='postorder'): # children first, msg to parents\n if node.bad_branch:\n # no information\n node.marginal_pos_Lx = None\n else: # all other nodes\n if node.date_constraint is not None and node.date_constraint.is_delta: # there is a time constraint\n # initialize the Lx for nodes with precise date constraint:\n # subtree probability given the position of the parent node\n # position of the parent node is given by the branch length\n # distribution attached to the child node position\n node.subtree_distribution = node.date_constraint\n bl = node.branch_length_interpolator.x\n x = bl + node.date_constraint.peak_pos\n node.marginal_pos_Lx = Distribution(x, node.branch_length_interpolator(bl),\n min_width=self.min_width, is_log=True)\n\n else: # all nodes without precise constraint but positional information\n # subtree likelihood given the node's constraint and child msg:\n msgs_to_multiply = [node.date_constraint] if node.date_constraint is not None else []\n msgs_to_multiply.extend([child.marginal_pos_Lx for child in node.clades\n if child.marginal_pos_Lx is not None])\n\n # combine the different msgs and constraints\n if len(msgs_to_multiply)==0:\n # no information\n node.marginal_pos_Lx = None\n continue\n elif len(msgs_to_multiply)==1:\n node.subtree_distribution = msgs_to_multiply[0]\n else: # combine the different msgs and constraints\n node.subtree_distribution = Distribution.multiply(msgs_to_multiply)\n\n if node.up is None: # this is the root, set dates\n node.subtree_distribution._adjust_grid(rel_tol=self.rel_tol_prune)\n node.marginal_pos_Lx = node.subtree_distribution\n node.marginal_pos_LH = node.subtree_distribution\n self.tree.positional_marginal_LH = -node.subtree_distribution.peak_val\n else: # otherwise propagate to parent\n res, res_t = NodeInterpolator.convolve(node.subtree_distribution,\n node.branch_length_interpolator,\n max_or_integral='integral',\n n_grid_points = self.node_grid_points,\n n_integral=self.n_integral,\n rel_tol=self.rel_tol_refine)\n res._adjust_grid(rel_tol=self.rel_tol_prune)\n node.marginal_pos_Lx = res\n\n self.logger(\"ClockTree - Marginal reconstruction: Propagating root -> leaves...\", 2)\n from scipy.interpolate import interp1d\n for node in self.tree.find_clades(order='preorder'):\n\n ## The root node\n if node.up is None:\n node.msg_from_parent = None # nothing beyond the root\n # all other cases (All internal nodes + unconstrained terminals)\n else:\n parent = node.up\n # messages from the complementary subtree (iterate over all sister nodes)\n complementary_msgs = [sister.marginal_pos_Lx for sister in parent.clades\n if (sister != node) and (sister.marginal_pos_Lx is not None)]\n\n # if parent itself got smth from the root node, include it\n if parent.msg_from_parent is not None:\n complementary_msgs.append(parent.msg_from_parent)\n elif parent.marginal_pos_Lx is not None:\n complementary_msgs.append(parent.marginal_pos_LH)\n\n if len(complementary_msgs):\n msg_parent_to_node = NodeInterpolator.multiply(complementary_msgs)\n msg_parent_to_node._adjust_grid(rel_tol=self.rel_tol_prune)\n else:\n x = [parent.numdate, numeric_date()]\n msg_parent_to_node = NodeInterpolator(x, [1.0, 1.0],min_width=self.min_width)\n\n # integral message, which delivers to the node the positional information\n # from the complementary subtree\n res, res_t = NodeInterpolator.convolve(msg_parent_to_node, node.branch_length_interpolator,\n max_or_integral='integral',\n inverse_time=False,\n n_grid_points = self.node_grid_points,\n n_integral=self.n_integral,\n rel_tol=self.rel_tol_refine)\n\n node.msg_from_parent = res\n if node.marginal_pos_Lx is None:\n node.marginal_pos_LH = node.msg_from_parent\n else:\n node.marginal_pos_LH = NodeInterpolator.multiply((node.msg_from_parent, node.subtree_distribution))\n\n self.logger('ClockTree._ml_t_root_to_leaves: computed convolution'\n ' with %d points at node %s'%(len(res.x),node.name),4)\n\n if self.debug:\n tmp = np.diff(res.y-res.peak_val)\n nsign_changed = np.sum((tmp[1:]*tmp[:-1]<0)&(res.y[1:-1]-res.peak_val<500))\n if nsign_changed>1:\n import matplotlib.pyplot as plt\n plt.ion()\n plt.plot(res.x, res.y-res.peak_val, '-o')\n plt.plot(res.peak_pos - node.branch_length_interpolator.x,\n node.branch_length_interpolator(node.branch_length_interpolator.x)-node.branch_length_interpolator.peak_val, '-o')\n plt.plot(msg_parent_to_node.x,msg_parent_to_node.y-msg_parent_to_node.peak_val, '-o')\n plt.ylim(0,100)\n plt.xlim(-0.05, 0.05)\n import ipdb; ipdb.set_trace()\n\n # assign positions of nodes and branch length only when desired\n # since marginal reconstruction can result in negative branch length\n if assign_dates:\n node.time_before_present = node.marginal_pos_LH.peak_pos\n if node.up:\n node.clock_length = node.up.time_before_present - node.time_before_present\n node.branch_length = node.clock_length\n\n # construct the inverse cumulant distribution to evaluate confidence intervals\n if node.marginal_pos_LH.is_delta:\n node.marginal_inverse_cdf=interp1d([0,1], node.marginal_pos_LH.peak_pos*np.ones(2), kind=\"linear\")\n else:\n dt = np.diff(node.marginal_pos_LH.x)\n y = node.marginal_pos_LH.prob_relative(node.marginal_pos_LH.x)\n int_y = np.concatenate(([0], np.cumsum(dt*(y[1:]+y[:-1])/2.0)))\n int_y/=int_y[-1]\n node.marginal_inverse_cdf = interp1d(int_y, node.marginal_pos_LH.x, kind=\"linear\")\n node.marginal_cdf = interp1d(node.marginal_pos_LH.x, int_y, kind=\"linear\")\n\n if not self.debug:\n _cleanup()\n\n return"} {"SOURCE": "codesearchnet", "instruction": "Can you tell what is the following Python 3 function doing\ndef branch_length_to_years(self):\n '''\n This function sets branch length to reflect the date differences between parent and child\n nodes measured in years. Should only be called after :py:meth:`timetree.ClockTree.convert_dates` has been called.\n\n Returns\n -------\n None\n All manipulations are done in place on the tree\n\n '''\n self.logger('ClockTree.branch_length_to_years: setting node positions in units of years', 2)\n if not hasattr(self.tree.root, 'numdate'):\n self.logger('ClockTree.branch_length_to_years: infer ClockTree first', 2,warn=True)\n self.tree.root.branch_length = 0.1\n for n in self.tree.find_clades(order='preorder'):\n if n.up is not None:\n n.branch_length = n.numdate - n.up.numdate", "response": "This function sets the branch length to reflect the date differences between parent and child nodes measured in years."} {"SOURCE": "codesearchnet", "instruction": "Create a Python 3 function for\ncalculating the time tree estimation of evolutionary rates + / - one - time rate standard deviation form the ML estimate", "response": "def calc_rate_susceptibility(self, rate_std=None, params=None):\n \"\"\"return the time tree estimation of evolutionary rates +/- one\n standard deviation form the ML estimate.\n\n Returns\n -------\n TreeTime.return_code : str\n success or failure\n \"\"\"\n params = params or {}\n if rate_std is None:\n if not (self.clock_model['valid_confidence'] and 'cov' in self.clock_model):\n self.logger(\"ClockTree.calc_rate_susceptibility: need valid standard deviation of the clock rate to estimate dating error.\", 1, warn=True)\n return ttconf.ERROR\n rate_std = np.sqrt(self.clock_model['cov'][0,0])\n\n current_rate = np.abs(self.clock_model['slope'])\n upper_rate = self.clock_model['slope'] + rate_std\n lower_rate = max(0.1*current_rate, self.clock_model['slope'] - rate_std)\n for n in self.tree.find_clades():\n if n.up:\n n._orig_gamma = n.branch_length_interpolator.gamma\n n.branch_length_interpolator.gamma = n._orig_gamma*upper_rate/current_rate\n\n self.logger(\"###ClockTree.calc_rate_susceptibility: run with upper bound of rate estimate\", 1)\n self.make_time_tree(**params)\n self.logger(\"###ClockTree.calc_rate_susceptibility: rate: %f, LH:%f\"%(upper_rate, self.tree.positional_joint_LH), 2)\n for n in self.tree.find_clades():\n n.numdate_rate_variation = [(upper_rate, n.numdate)]\n if n.up:\n n.branch_length_interpolator.gamma = n._orig_gamma*lower_rate/current_rate\n\n self.logger(\"###ClockTree.calc_rate_susceptibility: run with lower bound of rate estimate\", 1)\n self.make_time_tree(**params)\n self.logger(\"###ClockTree.calc_rate_susceptibility: rate: %f, LH:%f\"%(lower_rate, self.tree.positional_joint_LH), 2)\n for n in self.tree.find_clades():\n n.numdate_rate_variation.append((lower_rate, n.numdate))\n if n.up:\n n.branch_length_interpolator.gamma = n._orig_gamma\n\n self.logger(\"###ClockTree.calc_rate_susceptibility: run with central rate estimate\", 1)\n self.make_time_tree(**params)\n self.logger(\"###ClockTree.calc_rate_susceptibility: rate: %f, LH:%f\"%(current_rate, self.tree.positional_joint_LH), 2)\n for n in self.tree.find_clades():\n n.numdate_rate_variation.append((current_rate, n.numdate))\n n.numdate_rate_variation.sort(key=lambda x:x[1]) # sort estimates for different rates by numdate\n\n return ttconf.SUCCESS"} {"SOURCE": "codesearchnet", "instruction": "Make a summary of the following Python 3 code\ndef date_uncertainty_due_to_rate(self, node, interval=(0.05, 0.095)):\n if hasattr(node, \"numdate_rate_variation\"):\n from scipy.special import erfinv\n nsig = [np.sqrt(2.0)*erfinv(-1.0 + 2.0*x) if x*(1.0-x) else 0\n for x in interval]\n l,c,u = [x[1] for x in node.numdate_rate_variation]\n return np.array([c + x*np.abs(y-c) for x,y in zip(nsig, (l,u))])\n\n else:\n return None", "response": "estimate the uncertainty due to rate variation of a particular node"} {"SOURCE": "codesearchnet", "instruction": "Here you have a function in Python 3, explain what it does\ndef get_max_posterior_region(self, node, fraction = 0.9):\n '''\n If temporal reconstruction was done using the marginal ML mode, the entire distribution of\n times is available. This function determines the interval around the highest\n posterior probability region that contains the specified fraction of the probability mass.\n In absense of marginal reconstruction, it will return uncertainty based on rate\n variation. If both are present, the wider interval will be returned.\n\n Parameters\n ----------\n\n node : PhyloTree.Clade\n The node for which the posterior region is to be calculated\n\n interval : float\n Float specifying who much of the posterior probability is\n to be contained in the region\n\n Returns\n -------\n max_posterior_region : numpy array\n Array with two numerical dates delineating the high posterior region\n\n '''\n if node.marginal_inverse_cdf==\"delta\":\n return np.array([node.numdate, node.numdate])\n\n\n min_max = (node.marginal_pos_LH.xmin, node.marginal_pos_LH.xmax)\n min_date, max_date = [self.date2dist.to_numdate(x) for x in min_max][::-1]\n if node.marginal_pos_LH.peak_pos == min_max[0]: #peak on the left\n return self.get_confidence_interval(node, (0, fraction))\n elif node.marginal_pos_LH.peak_pos == min_max[1]: #peak on the right\n return self.get_confidence_interval(node, (1.0-fraction ,1.0))\n else: # peak in the center of the distribution\n rate_contribution = self.date_uncertainty_due_to_rate(node, ((1-fraction)*0.5, 1.0-(1.0-fraction)*0.5))\n\n # construct height to position interpolators left and right of the peak\n # this assumes there is only one peak --- might fail in odd cases\n from scipy.interpolate import interp1d\n from scipy.optimize import minimize_scalar as minimize\n pidx = np.argmin(node.marginal_pos_LH.y)\n pval = np.min(node.marginal_pos_LH.y)\n left = interp1d(node.marginal_pos_LH.y[:(pidx+1)]-pval, node.marginal_pos_LH.x[:(pidx+1)],\n kind='linear', fill_value=min_max[0], bounds_error=False)\n right = interp1d(node.marginal_pos_LH.y[pidx:]-pval, node.marginal_pos_LH.x[pidx:],\n kind='linear', fill_value=min_max[1], bounds_error=False)\n\n # function to minimize -- squared difference between prob mass and desired fracion\n def func(x, thres):\n interval = np.array([left(x), right(x)]).squeeze()\n return (thres - np.diff(node.marginal_cdf(np.array(interval))))**2\n\n # minimza and determine success\n sol = minimize(func, bracket=[0,10], args=(fraction,))\n if sol['success']:\n mutation_contribution = self.date2dist.to_numdate(np.array([right(sol['x']), left(sol['x'])]).squeeze())\n else: # on failure, return standard confidence interval\n mutation_contribution = None\n\n return self.combine_confidence(node.numdate, (min_date, max_date),\n c1=rate_contribution, c2=mutation_contribution)", "response": "This function determines the interval around the highest posterior probability region that contains the specified fraction of the probability mass."} {"SOURCE": "codesearchnet", "instruction": "Can you create a Python 3 function that\nreads in a vcf file and associated reference sequence fasta file and returns a dictionary with the appropriate fields.", "response": "def read_vcf(vcf_file, ref_file):\r\n \"\"\"\r\n Reads in a vcf/vcf.gz file and associated\r\n reference sequence fasta (to which the VCF file is mapped).\r\n\r\n Parses mutations, insertions, and deletions and stores them in a nested dict,\r\n see 'returns' for the dict structure.\r\n\r\n Calls with heterozygous values 0/1, 0/2, etc and no-calls (./.) are\r\n replaced with Ns at the associated sites.\r\n\r\n Positions are stored to correspond the location in the reference sequence\r\n in Python (numbering is transformed to start at 0)\r\n\r\n Parameters\r\n ----------\r\n vcf_file : string\r\n Path to the vcf or vcf.gz file to be read in\r\n ref_file : string\r\n Path to the fasta reference file to be read in\r\n\r\n Returns\r\n --------\r\n compress_seq : nested dict\r\n In the format: ::\r\n\r\n {\r\n 'reference':'AGCTCGA..A',\r\n 'sequences': { 'seq1':{4:'A', 7:'-'}, 'seq2':{100:'C'} },\r\n 'insertions': { 'seq1':{4:'ATT'}, 'seq3':{1:'TT', 10:'CAG'} },\r\n 'positions': [1,4,7,10,100...]\r\n }\r\n\r\n references : string\r\n String of the reference sequence read from the Fasta, to which\r\n the variable sites are mapped\r\n sequences : nested dict\r\n Dict containing sequence names as keys which map to dicts\r\n that have position as key and the single-base mutation (or deletion)\r\n as values\r\n insertions : nested dict\r\n Dict in the same format as the above, which stores insertions and their\r\n locations. The first base of the insertion is the same as whatever is\r\n currently in that position (Ref if no mutation, mutation in 'sequences'\r\n otherwise), so the current base can be directly replaced by the bases held here.\r\n positions : list\r\n Python list of all positions with a mutation, insertion, or deletion.\r\n\r\n \"\"\"\r\n\r\n #Programming Note:\r\n # Note on VCF Format\r\n # -------------------\r\n # 'Insertion where there are also deletions' (special handling)\r\n # Ex:\r\n # REF ALT Seq1 Seq2\r\n # GC GCC,G 1/1 2/2\r\n # Insertions formatted differently - don't know how many bp match\r\n # the Ref (unlike simple insert below). Could be mutations, also.\r\n # 'Deletion'\r\n # Ex:\r\n # REF ALT\r\n # GC G\r\n # Alt does not have to be 1 bp - any length shorter than Ref.\r\n # 'Insertion'\r\n # Ex:\r\n # REF ALT\r\n # A ATT\r\n # First base always matches Ref.\r\n # 'No indel'\r\n # Ex:\r\n # REF ALT\r\n # A G\r\n\r\n #define here, so that all sub-functions can access them\r\n sequences = defaultdict(dict)\r\n insertions = defaultdict(dict) #Currently not used, but kept in case of future use.\r\n\r\n #TreeTime handles 2-3 base ambig codes, this will allow that.\r\n def getAmbigCode(bp1, bp2, bp3=\"\"):\r\n bps = [bp1,bp2,bp3]\r\n bps.sort()\r\n key = \"\".join(bps)\r\n\r\n return {\r\n 'CT': 'Y',\r\n 'AG': 'R',\r\n 'AT': 'W',\r\n 'CG': 'S',\r\n 'GT': 'K',\r\n 'AC': 'M',\r\n 'AGT': 'D',\r\n 'ACG': 'V',\r\n 'ACT': 'H',\r\n 'CGT': 'B'\r\n }[key]\r\n\r\n #Parses a 'normal' (not hetero or no-call) call depending if insertion+deletion, insertion,\r\n #deletion, or single bp subsitution\r\n def parseCall(snps, ins, pos, ref, alt):\r\n\r\n #Insertion where there are also deletions (special handling)\r\n if len(ref) > 1 and len(alt)>len(ref):\r\n for i in range(len(ref)):\r\n #if the pos doesn't match, store in sequences\r\n if ref[i] != alt[i]:\r\n snps[pos+i] = alt[i] if alt[i] != '.' else 'N' #'.' = no-call\r\n #if about to run out of ref, store rest:\r\n if (i+1) >= len(ref):\r\n ins[pos+i] = alt[i:]\r\n #Deletion\r\n elif len(ref) > 1:\r\n for i in range(len(ref)):\r\n #if ref is longer than alt, these are deletion positions\r\n if i+1 > len(alt):\r\n snps[pos+i] = '-'\r\n #if not, there may be mutations\r\n else:\r\n if ref[i] != alt[i]:\r\n snps[pos+i] = alt[i] if alt[i] != '.' else 'N' #'.' = no-call\r\n #Insertion\r\n elif len(alt) > 1:\r\n ins[pos] = alt\r\n #No indel\r\n else:\r\n snps[pos] = alt\r\n\r\n\r\n #Parses a 'bad' (hetero or no-call) call depending on what it is\r\n def parseBadCall(snps, ins, pos, ref, ALT):\r\n #Deletion\r\n # REF ALT Seq1 Seq2 Seq3\r\n # GCC G 1/1 0/1 ./.\r\n # Seq1 (processed by parseCall, above) will become 'G--'\r\n # Seq2 will become 'GNN'\r\n # Seq3 will become 'GNN'\r\n if len(ref) > 1:\r\n #Deleted part becomes Ns\r\n if gen[0] == '0' or gen[0] == '.':\r\n if gen[0] == '0': #if het, get first bp\r\n alt = str(ALT[int(gen[2])-1])\r\n else: #if no-call, there is no alt, so just put Ns after 1st ref base\r\n alt = ref[0]\r\n for i in range(len(ref)):\r\n #if ref is longer than alt, these are deletion positions\r\n if i+1 > len(alt):\r\n snps[pos+i] = 'N'\r\n #if not, there may be mutations\r\n else:\r\n if ref[i] != alt[i]:\r\n snps[pos+i] = alt[i] if alt[i] != '.' else 'N' #'.' = no-call\r\n\r\n #If not deletion, need to know call type\r\n #if het, see if proposed alt is 1bp mutation\r\n elif gen[0] == '0':\r\n alt = str(ALT[int(gen[2])-1])\r\n if len(alt)==1:\r\n #alt = getAmbigCode(ref,alt) #if want to allow ambig\r\n alt = 'N' #if you want to disregard ambig\r\n snps[pos] = alt\r\n #else a het-call insertion, so ignore.\r\n\r\n #else it's a no-call; see if all alts have a length of 1\r\n #(meaning a simple 1bp mutation)\r\n elif len(ALT)==len(\"\".join(ALT)):\r\n alt = 'N'\r\n snps[pos] = alt\r\n #else a no-call insertion, so ignore.\r\n\r\n\r\n #House code is *much* faster than pyvcf because we don't care about all info\r\n #about coverage, quality, counts, etc, which pyvcf goes to effort to parse\r\n #(and it's not easy as there's no standard ordering). Custom code can completely\r\n #ignore all of this.\r\n import gzip\r\n from Bio import SeqIO\r\n import numpy as np\r\n\r\n nsamp = 0\r\n posLoc = 0\r\n refLoc = 0\r\n altLoc = 0\r\n sampLoc = 9\r\n\r\n #Use different openers depending on whether compressed\r\n opn = gzip.open if vcf_file.endswith(('.gz', '.GZ')) else open\r\n\r\n with opn(vcf_file, mode='rt') as f:\r\n for line in f:\r\n if line[0] != '#':\r\n #actual data - most common so first in 'if-list'!\r\n line = line.strip()\r\n dat = line.split('\\t')\r\n POS = int(dat[posLoc])\r\n REF = dat[refLoc]\r\n ALT = dat[altLoc].split(',')\r\n calls = np.array(dat[sampLoc:])\r\n\r\n #get samples that differ from Ref at this site\r\n recCalls = {}\r\n for sname, sa in zip(samps, calls):\r\n if ':' in sa: #if proper VCF file (followed by quality/coverage info)\r\n gt = sa.split(':')[0]\r\n else: #if 'pseudo' VCF file (nextstrain output, or otherwise stripped)\r\n gt = sa\r\n if gt == '0' or gt == '1': #for haploid calls in VCF\r\n gt = '0/0' if gt == '0' else '1/1'\r\n\r\n #ignore if ref call: '.' or '0/0', depending on VCF\r\n if ('/' in gt and gt != '0/0') or ('|' in gt and gt != '0|0'):\r\n recCalls[sname] = gt\r\n\r\n #store the position and the alt\r\n for seq, gen in recCalls.items():\r\n ref = REF\r\n pos = POS-1 #VCF numbering starts from 1, but Reference seq numbering\r\n #will be from 0 because it's python!\r\n #Accepts only calls that are 1/1, 2/2 etc. Rejects hets and no-calls\r\n if gen[0] != '0' and gen[2] != '0' and gen[0] != '.' and gen[2] != '.':\r\n alt = str(ALT[int(gen[0])-1]) #get the index of the alternate\r\n if seq not in sequences.keys():\r\n sequences[seq] = {}\r\n\r\n parseCall(sequences[seq],insertions[seq], pos, ref, alt)\r\n\r\n #If is heterozygote call (0/1) or no call (./.)\r\n else:\r\n #alt will differ here depending on het or no-call, must pass original\r\n parseBadCall(sequences[seq],insertions[seq], pos, ref, ALT)\r\n\r\n elif line[0] == '#' and line[1] == 'C':\r\n #header line, get all the information\r\n header = line.strip().split('\\t')\r\n posLoc = header.index(\"POS\")\r\n refLoc = header.index('REF')\r\n altLoc = header.index('ALT')\r\n sampLoc = header.index('FORMAT')+1\r\n samps = header[sampLoc:]\r\n samps = [ x.strip() for x in samps ] #ensure no leading/trailing spaces\r\n nsamp = len(samps)\r\n\r\n #else you are a comment line, ignore.\r\n\r\n #Gather all variable positions\r\n positions = set()\r\n for seq, muts in sequences.items():\r\n positions.update(muts.keys())\r\n\r\n #One or more seqs are same as ref! (No non-ref calls) So haven't been 'seen' yet\r\n if nsamp > len(sequences):\r\n missings = set(samps).difference(sequences.keys())\r\n for s in missings:\r\n sequences[s] = {}\r\n\r\n refSeq = SeqIO.read(ref_file, format='fasta')\r\n refSeq = refSeq.upper() #convert to uppercase to avoid unknown chars later\r\n refSeqStr = str(refSeq.seq)\r\n\r\n compress_seq = {'reference':refSeqStr,\r\n 'sequences': sequences,\r\n 'insertions': insertions,\r\n 'positions': sorted(positions)}\r\n\r\n return compress_seq"} {"SOURCE": "codesearchnet", "instruction": "Can you generate the documentation for the following Python 3 function\ndef write_vcf(tree_dict, file_name):#, compress=False):\r\n\r\n# Programming Logic Note:\r\n#\r\n# For a sequence like:\r\n# Pos 1 2 3 4 5 6\r\n# Ref A C T T A C\r\n# Seq1 A C - - - G\r\n#\r\n# In a dict it is stored:\r\n# Seq1:{3:'-', 4:'-', 5:'-', 6:'G'} (Numbering from 1 for simplicity)\r\n#\r\n# In a VCF it needs to be:\r\n# POS REF ALT Seq1\r\n# 2 CTTA C 1/1\r\n# 6 C G 1/1\r\n#\r\n# If a position is deleted (pos 3), need to get invariable position preceeding it\r\n#\r\n# However, in alternative case, the base before a deletion is mutant, so need to check\r\n# that next position isn't a deletion (as otherwise won't be found until after the\r\n# current single bp mutation is written out)\r\n#\r\n# When deleted position found, need to gather up all adjacent mutant positions with deletions,\r\n# but not include adjacent mutant positions that aren't deletions (pos 6)\r\n#\r\n# Don't run off the 'end' of the position list if deletion is the last thing to be included\r\n# in the VCF file\r\n\r\n sequences = tree_dict['sequences']\r\n ref = tree_dict['reference']\r\n positions = tree_dict['positions']\r\n\r\n def handleDeletions(i, pi, pos, ref, delete, pattern):\r\n refb = ref[pi]\r\n if delete: #Need to get the position before\r\n i-=1 #As we'll next go to this position again\r\n pi-=1\r\n pos = pi+1\r\n refb = ref[pi]\r\n #re-get pattern\r\n pattern = []\r\n for k,v in sequences.items():\r\n try:\r\n pattern.append(sequences[k][pi])\r\n except KeyError:\r\n pattern.append(ref[pi])\r\n pattern = np.array(pattern)\r\n\r\n sites = []\r\n sites.append(pattern)\r\n\r\n #Gather all positions affected by deletion - but don't run off end of position list\r\n while (i+1) < len(positions) and positions[i+1] == pi+1:\r\n i+=1\r\n pi = positions[i]\r\n pattern = []\r\n for k,v in sequences.items():\r\n try:\r\n pattern.append(sequences[k][pi])\r\n except KeyError:\r\n pattern.append(ref[pi])\r\n pattern = np.array(pattern)\r\n\r\n #Stops 'greedy' behaviour from adding mutations adjacent to deletions\r\n if any(pattern == '-'): #if part of deletion, append\r\n sites.append(pattern)\r\n refb = refb+ref[pi]\r\n else: #this is another mutation next to the deletion!\r\n i-=1 #don't append, break this loop\r\n\r\n #Rotate them into 'calls'\r\n sites = np.asarray(sites)\r\n align = np.rot90(sites)\r\n align = np.flipud(align)\r\n\r\n #Get rid of '-', and put '.' for calls that match ref\r\n #Only removes trailing '-'. This breaks VCF convension, but the standard\r\n #VCF way of handling this* is really complicated, and the situation is rare.\r\n #(*deletions and mutations at the same locations)\r\n fullpat = []\r\n for pt in align:\r\n gp = len(pt)-1\r\n while pt[gp] == '-':\r\n pt[gp] = ''\r\n gp-=1\r\n pat = \"\".join(pt)\r\n if pat == refb:\r\n fullpat.append('.')\r\n else:\r\n fullpat.append(pat)\r\n\r\n pattern = np.array(fullpat)\r\n\r\n return i, pi, pos, refb, pattern\r\n\r\n\r\n #prepare the header of the VCF & write out\r\n header=[\"#CHROM\",\"POS\",\"ID\",\"REF\",\"ALT\",\"QUAL\",\"FILTER\",\"INFO\",\"FORMAT\"]+list(sequences.keys())\r\n with open(file_name, 'w') as the_file:\r\n the_file.write( \"##fileformat=VCFv4.2\\n\"+\r\n \"##source=NextStrain\\n\"+\r\n \"##FORMAT=\\n\")\r\n the_file.write(\"\\t\".join(header)+\"\\n\")\r\n\r\n vcfWrite = []\r\n errorPositions = []\r\n explainedErrors = 0\r\n\r\n #Why so basic? Because we sometimes have to back up a position!\r\n i=0\r\n while i < len(positions):\r\n #Get the 'pattern' of all calls at this position.\r\n #Look out specifically for current (this pos) or upcoming (next pos) deletions\r\n #But also distinguish these two, as handled differently.\r\n\r\n pi = positions[i]\r\n pos = pi+1 #change numbering to match VCF, not python, for output\r\n refb = ref[pi] #reference base at this position\r\n delete = False #deletion at this position - need to grab previous base (invariable)\r\n deleteGroup = False #deletion at next position (mutation at this pos) - do not need to get prev base\r\n\r\n #try/except is much more efficient than 'if' statements for constructing patterns,\r\n #as on average a 'variable' location will not be variable for any given sequence\r\n pattern = []\r\n #pattern2 gets the pattern at next position to check for upcoming deletions\r\n #it's more efficient to get both here rather than loop through sequences twice!\r\n pattern2 = []\r\n for k,v in sequences.items():\r\n try:\r\n pattern.append(sequences[k][pi])\r\n except KeyError:\r\n pattern.append(ref[pi])\r\n\r\n try:\r\n pattern2.append(sequences[k][pi+1])\r\n except KeyError:\r\n pattern2.append(ref[pi+1])\r\n\r\n pattern = np.array(pattern)\r\n pattern2 = np.array(pattern2)\r\n\r\n #If a deletion here, need to gather up all bases, and position before\r\n if any(pattern == '-'):\r\n if pos != 1:\r\n deleteGroup = True\r\n delete = True\r\n else:\r\n #If theres a deletion in 1st pos, VCF files do not handle this well.\r\n #Proceed keeping it as '-' for alt (violates VCF), but warn user to check output.\r\n #(This is rare)\r\n print (\"WARNING: You have a deletion in the first position of your alignment. VCF format does not handle this well. Please check the output to ensure it is correct.\")\r\n else:\r\n #If a deletion in next pos, need to gather up all bases\r\n if any(pattern2 == '-'):\r\n deleteGroup = True\r\n\r\n #If deletion, treat affected bases as 1 'call':\r\n if delete or deleteGroup:\r\n i, pi, pos, refb, pattern = handleDeletions(i, pi, pos, ref, delete, pattern)\r\n #If no deletion, replace ref with '.', as in VCF format\r\n else:\r\n pattern[pattern==refb] = '.'\r\n\r\n #Get the list of ALTs - minus any '.'!\r\n uniques = np.unique(pattern)\r\n uniques = uniques[np.where(uniques!='.')]\r\n\r\n #Convert bases to the number that matches the ALT\r\n j=1\r\n for u in uniques:\r\n pattern[np.where(pattern==u)[0]] = str(j)\r\n j+=1\r\n #Now convert these calls to #/# (VCF format)\r\n calls = [ j+\"/\"+j if j!='.' else '.' for j in pattern ]\r\n\r\n #What if there's no variation at a variable site??\r\n #This can happen when sites are modified by TreeTime - see below.\r\n printPos = True\r\n if len(uniques)==0:\r\n #If we expect it (it was made constant by TreeTime), it's fine.\r\n if 'inferred_const_sites' in tree_dict and pi in tree_dict['inferred_const_sites']:\r\n explainedErrors += 1\r\n printPos = False #and don't output position to the VCF\r\n else:\r\n #If we don't expect, raise an error\r\n errorPositions.append(str(pi))\r\n\r\n #Write it out - Increment positions by 1 so it's in VCF numbering\r\n #If no longer variable, and explained, don't write it out\r\n if printPos:\r\n output = [\"MTB_anc\", str(pos), \".\", refb, \",\".join(uniques), \".\", \"PASS\", \".\", \"GT\"] + calls\r\n vcfWrite.append(\"\\t\".join(output))\r\n\r\n i+=1\r\n\r\n #Note: The number of 'inferred_const_sites' passed back by TreeTime will often be longer\r\n #than the number of 'site that were made constant' that prints below. This is because given the site:\r\n # Ref Alt Seq\r\n # G A AANAA\r\n #This will be converted to 'AAAAA' and listed as an 'inferred_const_sites'. However, for VCF\r\n #purposes, because the site is 'variant' against the ref, it is variant, as expected, and so\r\n #won't be counted in the below list, which is only sites removed from the VCF.\r\n\r\n if 'inferred_const_sites' in tree_dict and explainedErrors != 0:\r\n print ( \"Sites that were constant except for ambiguous bases were made constant by TreeTime. This happened {} times. These sites are now excluded from the VCF.\".format(explainedErrors))\r\n\r\n if len(errorPositions) != 0:\r\n print (\"\\n***WARNING: vcf_utils.py\"\r\n \"\\n{} sites were found that had no alternative bases. If this data has been \"\r\n \"run through TreeTime and contains ambiguous bases, try calling get_tree_dict with \"\r\n \"var_ambigs=True to see if this clears the error.\"\r\n \"\\n\\nAlternative causes:\"\r\n \"\\n- Not all sequences in your alignment are in the tree (if you are running TreeTime via commandline \"\r\n \"this is most likely)\"\r\n \"\\n- In TreeTime, can be caused by overwriting variants in tips with small branch lengths (debug)\"\r\n \"\\n\\nThese are the positions affected (numbering starts at 0):\".format(str(len(errorPositions))))\r\n print (\",\".join(errorPositions))\r\n\r\n with open(file_name, 'a') as the_file:\r\n the_file.write(\"\\n\".join(vcfWrite))\r\n\r\n if file_name.endswith(('.gz', '.GZ')):\r\n import os\r\n #must temporarily remove .gz ending, or gzip won't zip it!\r\n os.rename(file_name, file_name[:-3])\r\n call = [\"gzip\", file_name[:-3]]\r\n os.system(\" \".join(call))", "response": "Writes out a VCF - style file for the given tree_dict."} {"SOURCE": "codesearchnet", "instruction": "Can you generate a brief explanation for the following Python 3 code\ndef _convolution_integrand(t_val, f, g,\n inverse_time=None, return_log=False):\n '''\n Evaluates int_tau f(t+tau)*g(tau) or int_tau f(t-tau)g(tau) if inverse time is TRUE\n\n Parameters\n -----------\n\n t_val : double\n Time point\n\n f : Interpolation object\n First multiplier in convolution\n\n g : Interpolation object\n Second multiplier in convolution\n\n inverse_time : bool, None\n time direction. If True, then the f(t-tau)*g(tau) is calculated, otherwise,\n f(t+tau)*g(tau)\n\n return_log : bool\n If True, the logarithm will be returned\n\n\n Returns\n -------\n\n FG : Distribution\n The function to be integrated as Distribution object (interpolator)\n\n '''\n\n if inverse_time is None:\n raise Exception(\"Inverse time argument must be set!\")\n\n # determine integration boundaries:\n if inverse_time:\n ## tau>g.xmin and t-tauf.xmin\n tau_max = min(t_val - f.xmin, g.xmax)\n else:\n ## tau>g.xmin and t+tau>f.xmin\n tau_min = max(f.xmin-t_val, g.xmin)\n ## tautau_min-ttconf.TINY_NUMBER)&(tau4*center_width:\n grid_right = grid_center[-1] + right_range*(np.linspace(0, 1, n)**2.0)\n elif right_range>0: # use linear grid the right_range is comparable to center_width\n grid_right = grid_center[-1] + right_range*np.linspace(0,1, int(min(n,1+0.5*n*right_range/center_width)))\n else:\n grid_right =[]\n\n left_range = grid_center[0]-tmin\n if left_range>4*center_width:\n grid_left = tmin + left_range*(np.linspace(0, 1, n)**2.0)\n elif left_range>0:\n grid_left = tmin + left_range*np.linspace(0,1, int(min(n,1+0.5*n*left_range/center_width)))\n else:\n grid_left =[]\n\n\n if tmin>-1:\n grid_zero_left = tmin + (tmax-tmin)*np.linspace(0,0.01,11)**2\n else:\n grid_zero_left = [tmin]\n if tmax<1:\n grid_zero_right = tmax - (tmax-tmin)*np.linspace(0,0.01,11)**2\n else:\n grid_zero_right = [tmax]\n\n # make grid and calculate convolution\n t_grid_0 = np.unique(np.concatenate([grid_zero_left, grid_left[:-1], grid_center, grid_right[1:], grid_zero_right]))\n t_grid_0 = t_grid_0[(t_grid_0 > tmin-ttconf.TINY_NUMBER) & (t_grid_0 < tmax+ttconf.TINY_NUMBER)]\n\n # res0 - the values of the convolution (integral or max)\n # t_0 - the value, at which the res0 achieves maximum\n # (when determining the maximum of the integrand, otherwise meaningless)\n res_0, t_0 = np.array([conv_in_point(t_val) for t_val in t_grid_0]).T\n\n # refine grid as necessary and add new points\n # calculate interpolation error at all internal points [2:-2] bc end points are sometime off scale\n interp_error = np.abs(res_0[3:-1]+res_0[1:-3]-2*res_0[2:-2])\n # determine the number of extra points needed, criterion depends on distance from peak dy\n dy = (res_0[2:-2]-res_0.min())\n dx = np.diff(t_grid_0)\n refine_factor = np.minimum(np.minimum(np.array(np.floor(np.sqrt(interp_error/(rel_tol*(1+(dy/yc)**4)))), dtype=int),\n np.array(100*(dx[1:-2]+dx[2:-1])/min_fwhm, dtype=int)), 10)\n\n insert_point_idx = np.zeros(interp_error.shape[0]+1, dtype=int)\n insert_point_idx[1:] = refine_factor\n insert_point_idx[:-1] += refine_factor\n # add additional points if there are any to add\n if np.sum(insert_point_idx):\n add_x = np.concatenate([np.linspace(t1,t2,n+2)[1:-1] for t1,t2,n in\n zip(t_grid_0[1:-2], t_grid_0[2:-1], insert_point_idx) if n>0])\n # calculate convolution at these points\n add_y, add_t = np.array([conv_in_point(t_val) for t_val in add_x]).T\n\n t_grid_0 = np.concatenate((t_grid_0, add_x))\n res_0 = np.concatenate ((res_0, add_y))\n t_0 = np.concatenate ((t_0, add_t))\n\n # instantiate the new interpolation object and return\n res_y = cls(t_grid_0, res_0, is_log=True, kind='linear')\n\n # the interpolation object, which is used to store the value of the\n # grid, which maximizes the convolution (for 'max' option),\n # or flat -1 distribution (for 'integral' option)\n # this grid is the optimal branch length\n res_t = Distribution(t_grid_0, t_0, is_log=True,\n min_width=node_interp.min_width, kind='linear')\n\n return res_y, res_t"} {"SOURCE": "codesearchnet", "instruction": "Given the following Python 3 function, write the documentation\ndef min_interp(interp_object):\n try:\n return interp_object.x[interp_object(interp_object.x).argmin()]\n except Exception as e:\n s = \"Cannot find minimum of the interpolation object\" + str(interp_object.x) + \\\n \"Minimal x: \" + str(interp_object.x.min()) + \"Maximal x: \" + str(interp_object.x.max())\n raise e", "response": "Find the global minimum of a function represented as an interpolation object."} {"SOURCE": "codesearchnet", "instruction": "Here you have a function in Python 3, explain what it does\ndef median_interp(interp_object):\n new_grid = np.sort(np.concatenate([interp_object.x[:-1] + 0.1*ii*np.diff(interp_object.x)\n for ii in range(10)]).flatten())\n\n tmp_prop = np.exp(-(interp_object(new_grid)-interp_object.y.min()))\n tmp_cumsum = np.cumsum(0.5*(tmp_prop[1:]+tmp_prop[:-1])*np.diff(new_grid))\n median_index = min(len(tmp_cumsum)-3, max(2,np.searchsorted(tmp_cumsum, tmp_cumsum[-1]*0.5)+1))\n return new_grid[median_index]", "response": "Find the median of the function represented as an interpolation object."} {"SOURCE": "codesearchnet", "instruction": "Can you write a function in Python 3 where it\nconverts datetime object to numeric date.", "response": "def numeric_date(dt=None):\n \"\"\"\n Convert datetime object to the numeric date.\n The numeric date format is YYYY.F, where F is the fraction of the year passed\n\n Parameters\n ----------\n dt: datetime.datetime, None\n date of to be converted. if None, assume today\n\n \"\"\"\n if dt is None:\n dt = datetime.datetime.now()\n\n try:\n res = dt.year + dt.timetuple().tm_yday / 365.25\n except:\n res = None\n\n return res"} {"SOURCE": "codesearchnet", "instruction": "Explain what the following Python 3 code does\ndef parse_dates(date_file):\n print(\"\\nAttempting to parse dates...\")\n dates = {}\n if not os.path.isfile(date_file):\n print(\"\\n\\tERROR: file %s does not exist, exiting...\"%date_file)\n return dates\n # separator for the csv/tsv file. If csv, we'll strip extra whitespace around ','\n full_sep = '\\t' if date_file.endswith('.tsv') else r'\\s*,\\s*'\n\n try:\n # read the metadata file into pandas dataframe.\n df = pd.read_csv(date_file, sep=full_sep, engine='python')\n # check the metadata has strain names in the first column\n # look for the column containing sampling dates\n # We assume that the dates might be given either in human-readable format\n # (e.g. ISO dates), or be already converted to the numeric format.\n potential_date_columns = []\n potential_numdate_columns = []\n potential_index_columns = []\n # Scan the dataframe columns and find ones which likely to store the\n # dates\n for ci,col in enumerate(df.columns):\n d = df.iloc[0,ci]\n # strip quotation marks\n if type(d)==str and d[0] in ['\"', \"'\"] and d[-1] in ['\"', \"'\"]:\n for i,tmp_d in enumerate(df.iloc[:,ci]):\n df.iloc[i,ci] = tmp_d.strip(d[0])\n if 'date' in col.lower():\n potential_date_columns.append((ci, col))\n if any([x==col.lower() for x in ['name', 'strain', 'accession']]):\n potential_index_columns.append((ci, col))\n\n dates = {}\n # if a potential numeric date column was found, use it\n # (use the first, if there are more than one)\n if not len(potential_index_columns):\n print(\"ERROR: Cannot read metadata: need at least one column that contains the taxon labels.\"\n \" Looking for the first column that contains 'name', 'strain', or 'accession' in the header.\", file=sys.stderr)\n return dates\n else:\n # use the first column that is either 'name', 'strain', 'accession'\n index_col = sorted(potential_index_columns)[0][1]\n print(\"\\tUsing column '%s' as name. This needs match the taxon names in the tree!!\"%index_col)\n\n if len(potential_date_columns)>=1:\n #try to parse the csv file with dates in the idx column:\n idx = potential_date_columns[0][0]\n col_name = potential_date_columns[0][1]\n print(\"\\tUsing column '%s' as date.\"%col_name)\n for ri, row in df.iterrows():\n date_str = row.loc[col_name]\n k = row.loc[index_col]\n # try parsing as a float first\n try:\n dates[k] = float(date_str)\n continue\n except ValueError:\n # try whether the date string can be parsed as [2002.2:2004.3]\n # to indicate general ambiguous ranges\n if date_str[0]=='[' and date_str[-1]==']' and len(date_str[1:-1].split(':'))==2:\n try:\n dates[k] = [float(x) for x in date_str[1:-1].split(':')]\n continue\n except ValueError:\n pass\n # try date format parsing 2017-08-12\n try:\n tmp_date = pd.to_datetime(date_str)\n dates[k] = numeric_date(tmp_date)\n except ValueError: # try ambiguous date format parsing 2017-XX-XX\n lower, upper = ambiguous_date_to_date_range(date_str, '%Y-%m-%d')\n if lower is not None:\n dates[k] = [numeric_date(x) for x in [lower, upper]]\n\n else:\n print(\"ERROR: Metadata file has no column which looks like a sampling date!\", file=sys.stderr)\n\n if all(v is None for v in dates.values()):\n print(\"ERROR: Cannot parse dates correctly! Check date format.\", file=sys.stderr)\n return {}\n return dates\n except:\n print(\"ERROR: Cannot read the metadata file!\", file=sys.stderr)\n return {}", "response": "Parse the dates from the arguments and return a dictionary mapping taxon names to numerical dates."} {"SOURCE": "codesearchnet", "instruction": "Implement a Python 3 function for\nparsing an ambiguous date string such as 2017 - xx - xx to 2017 - xx.", "response": "def ambiguous_date_to_date_range(mydate, fmt=\"%Y-%m-%d\", min_max_year=None):\n \"\"\"parse an abiguous date such as 2017-XX-XX to [2017,2017.999]\n\n Parameters\n ----------\n mydate : str\n date string to be parsed\n fmt : str\n format descriptor. default is %Y-%m-%d\n min_max_year : None, optional\n if date is completely unknown, use this as bounds.\n\n Returns\n -------\n tuple\n upper and lower bounds on the date. return (None, None) if errors\n \"\"\"\n from datetime import datetime\n sep = fmt.split('%')[1][-1]\n min_date, max_date = {}, {}\n today = datetime.today().date()\n\n for val, field in zip(mydate.split(sep), fmt.split(sep+'%')):\n f = 'year' if 'y' in field.lower() else ('day' if 'd' in field.lower() else 'month')\n if 'XX' in val:\n if f=='year':\n if min_max_year:\n min_date[f]=min_max_year[0]\n if len(min_max_year)>1:\n max_date[f]=min_max_year[1]\n elif len(min_max_year)==1:\n max_date[f]=4000 #will be replaced by 'today' below.\n else:\n return None, None\n elif f=='month':\n min_date[f]=1\n max_date[f]=12\n elif f=='day':\n min_date[f]=1\n max_date[f]=31\n else:\n try:\n min_date[f]=int(val)\n max_date[f]=int(val)\n except ValueError:\n print(\"Can't parse date string: \"+mydate, file=sys.stderr)\n return None, None\n max_date['day'] = min(max_date['day'], 31 if max_date['month'] in [1,3,5,7,8,10,12]\n else 28 if max_date['month']==2 else 30)\n lower_bound = datetime(year=min_date['year'], month=min_date['month'], day=min_date['day']).date()\n upper_bound = datetime(year=max_date['year'], month=max_date['month'], day=max_date['day']).date()\n return (lower_bound, upper_bound if upper_bound> args = decode_instruction('4.size,4.1024;')\n >> args == ['size', '1024']\n >> True\n\n :param instruction: Instruction string.\n\n :return: list\n \"\"\"\n if not instruction.endswith(INST_TERM):\n raise InvalidInstruction('Instruction termination not found.')\n\n # Use proper encoding\n instruction = utf8(instruction)\n\n # Get arg size\n elems = instruction.split(ELEM_SEP, 1)\n\n try:\n arg_size = int(elems[0])\n except Exception:\n # Expected ValueError\n raise InvalidInstruction(\n 'Invalid arg length.' +\n ' Possibly due to missing element separator!')\n\n arg_str = elems[1][:arg_size]\n\n remaining = elems[1][arg_size:]\n\n args = [arg_str]\n\n if remaining.startswith(ARG_SEP):\n # Ignore the ARG_SEP to parse next arg.\n remaining = remaining[1:]\n elif remaining == INST_TERM:\n # This was the last arg!\n return args\n else:\n # The remaining is neither starting with ARG_SEP nor INST_TERM.\n raise InvalidInstruction(\n 'Instruction arg (%s) has invalid length.' % arg_str)\n\n next_args = GuacamoleInstruction.decode_instruction(remaining)\n\n if next_args:\n args = args + next_args\n\n return args"} {"SOURCE": "codesearchnet", "instruction": "Can you generate a brief explanation for the following Python 3 code\ndef encode_arg(arg):\n arg_utf8 = utf8(arg)\n\n return ELEM_SEP.join([str(len(str(arg_utf8))), str(arg_utf8)])", "response": "Encode an argument to be sent in a valid GuacamoleInstruction."} {"SOURCE": "codesearchnet", "instruction": "Can you generate the documentation for the following Python 3 function\ndef encode(self):\n instruction_iter = itertools.chain([self.opcode], self.args)\n\n elems = ARG_SEP.join(self.encode_arg(arg) for arg in instruction_iter)\n\n return elems + INST_TERM", "response": "Encode the instruction to be sent over the wire."} {"SOURCE": "codesearchnet", "instruction": "How would you implement a function in Python 3 that\nreturns a versioned URI string for this class", "response": "def class_url(cls):\n \"\"\"Returns a versioned URI string for this class\"\"\"\n base = 'v{0}'.format(getattr(cls, 'RESOURCE_VERSION', '1'))\n return \"/{0}/{1}\".format(base, class_to_api_name(cls.class_name()))"} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 function that can\nget instance URL by ID", "response": "def instance_url(self):\n \"\"\"Get instance URL by ID\"\"\"\n id_ = self.get(self.ID_ATTR)\n base = self.class_url()\n\n if id_:\n return '/'.join([base, six.text_type(id_)])\n else:\n raise Exception(\n 'Could not determine which URL to request: %s instance '\n 'has invalid ID: %r' % (type(self).__name__, id_),\n self.ID_ATTR)"} {"SOURCE": "codesearchnet", "instruction": "How would you code a function in Python 3 to\nreturn a versioned URI string for this class and don t pluralize the class name.", "response": "def class_url(cls):\n \"\"\"\n Returns a versioned URI string for this class,\n and don't pluralize the class name.\n \"\"\"\n base = 'v{0}'.format(getattr(cls, 'RESOURCE_VERSION', '1'))\n return \"/{0}/{1}\".format(base, class_to_api_name(\n cls.class_name(), pluralize=False))"} {"SOURCE": "codesearchnet", "instruction": "Can you implement a function in Python 3 that\ndownloads the file to the specified directory or file path.", "response": "def download(self, path=None, **kwargs):\n \"\"\"\n Download the file to the specified directory or file path.\n Downloads to a temporary directory if no path is specified.\n\n Returns the absolute path to the file.\n \"\"\"\n download_url = self.download_url(**kwargs)\n try:\n # For vault objects, use the object's filename\n # as the fallback if none is specified.\n filename = self.filename\n except AttributeError:\n # If the object has no filename attribute,\n # extract one from the download URL.\n filename = download_url.split('%3B%20filename%3D')[1]\n # Remove additional URL params from the name and \"unquote\" it.\n filename = unquote(filename.split('&')[0])\n\n if path:\n path = os.path.expanduser(path)\n # If the path is a dir, use the extracted filename\n if os.path.isdir(path):\n path = os.path.join(path, filename)\n else:\n # Create a temporary directory for the file\n path = os.path.join(tempfile.gettempdir(), filename)\n\n try:\n response = requests.request(method='get', url=download_url)\n except Exception as e:\n _handle_request_error(e)\n\n if not (200 <= response.status_code < 400):\n _handle_api_error(response)\n\n with open(path, 'wb') as fileobj:\n fileobj.write(response._content)\n\n return path"} {"SOURCE": "codesearchnet", "instruction": "How would you implement a function in Python 3 that\ngets the commit objects parent Import or Migration", "response": "def parent_object(self):\n \"\"\" Get the commit objects parent Import or Migration \"\"\"\n from . import types\n parent_klass = types.get(self.parent_job_model.split('.')[1])\n return parent_klass.retrieve(self.parent_job_id, client=self._client)"} {"SOURCE": "codesearchnet", "instruction": "Implement a function in Python 3 to\nask the user for their email and password.", "response": "def _ask_for_credentials():\n \"\"\"\n Asks the user for their email and password.\n \"\"\"\n _print_msg('Please enter your SolveBio credentials')\n domain = raw_input('Domain (e.g. .solvebio.com): ')\n # Check to see if this domain supports password authentication\n try:\n account = client.request('get', '/p/accounts/{}'.format(domain))\n auth = account['authentication']\n except:\n raise SolveError('Invalid domain: {}'.format(domain))\n\n # Account must support password-based login\n if auth.get('login') or auth.get('SAML', {}).get('simple_login'):\n email = raw_input('Email: ')\n password = getpass.getpass('Password (typing will be hidden): ')\n return (domain, email, password)\n else:\n _print_msg(\n 'Your domain uses Single Sign-On (SSO). '\n 'Please visit https://{}.solvebio.com/settings/security '\n 'for instructions on how to log in.'.format(domain))\n sys.exit(1)"} {"SOURCE": "codesearchnet", "instruction": "Can you generate a brief explanation for the following Python 3 code\ndef login(*args, **kwargs):\n if args and args[0].api_key:\n # Handle command-line arguments if provided.\n solvebio.login(api_key=args[0].api_key)\n elif kwargs:\n # Run the global login() if kwargs are provided\n # or local credentials are found.\n solvebio.login(**kwargs)\n else:\n interactive_login()\n\n # Print information about the current user\n user = client.whoami()\n\n if user:\n print_user(user)\n save_credentials(user['email'].lower(), solvebio.api_key)\n _print_msg('Updated local credentials.')\n return True\n else:\n _print_msg('Invalid credentials. You may not be logged-in.')\n return False", "response": "Prompt user for login information."} {"SOURCE": "codesearchnet", "instruction": "Can you generate the documentation for the following Python 3 function\ndef interactive_login():\n solvebio.access_token = None\n solvebio.api_key = None\n client.set_token()\n\n domain, email, password = _ask_for_credentials()\n if not all([domain, email, password]):\n print(\"Domain, email, and password are all required.\")\n return\n\n try:\n response = client.post('/v1/auth/token', {\n 'domain': domain.replace('.solvebio.com', ''),\n 'email': email,\n 'password': password\n })\n except SolveError as e:\n print('Login failed: {0}'.format(e))\n else:\n solvebio.api_key = response['token']\n client.set_token()", "response": "Force an interactive login via the command line."} {"SOURCE": "codesearchnet", "instruction": "Create a Python 3 function for\nprinting information about the current user.", "response": "def whoami(*args, **kwargs):\n \"\"\"\n Prints information about the current user.\n Assumes the user is already logged-in.\n \"\"\"\n user = client.whoami()\n\n if user:\n print_user(user)\n else:\n print('You are not logged-in.')"} {"SOURCE": "codesearchnet", "instruction": "Make a summary of the following Python 3 code\ndef print_user(user):\n email = user['email']\n domain = user['account']['domain']\n role = user['role']\n print('You are logged-in to the \"{0}\" domain '\n 'as {1} with role {2}.'\n .format(domain, email, role))", "response": "Print the user information about the current user."} {"SOURCE": "codesearchnet", "instruction": "Can you generate the documentation for the following Python 3 function\ndef from_string(cls, string, exact=False):\n try:\n chromosome, pos = string.split(':')\n except ValueError:\n raise ValueError('Please use UCSC-style format: \"chr2:1000-2000\"')\n\n if '-' in pos:\n start, stop = pos.replace(',', '').split('-')\n else:\n start = stop = pos.replace(',', '')\n\n return cls(chromosome, start, stop, exact=exact)", "response": "Create a new instance of the class from a string."} {"SOURCE": "codesearchnet", "instruction": "Make a summary of the following Python 3 code\ndef filter(self, *filters, **kwargs):\n f = list(filters)\n\n if kwargs:\n f += [Filter(**kwargs)]\n\n return self._clone(filters=f)", "response": "Returns a new Query instance with the specified filters combined with existing set with AND."} {"SOURCE": "codesearchnet", "instruction": "Given the following Python 3 function, write the documentation\ndef range(self, chromosome, start, stop, exact=False):\n return self._clone(\n filters=[GenomicFilter(chromosome, start, stop, exact)])", "response": "Returns a new object with the range of genomic data for the specified chromosome."} {"SOURCE": "codesearchnet", "instruction": "Implement a function in Python 3 to\nreturn a new instance with a single position filter on genomic datasets.", "response": "def position(self, chromosome, position, exact=False):\n \"\"\"\n Shortcut to do a single position filter on genomic datasets.\n \"\"\"\n return self._clone(\n filters=[GenomicFilter(chromosome, position, exact=exact)])"} {"SOURCE": "codesearchnet", "instruction": "Create a Python 3 function for\nreturning a dictionary with the requested facets.", "response": "def facets(self, *args, **kwargs):\n \"\"\"\n Returns a dictionary with the requested facets.\n\n The facets function supports string args, and keyword\n args.\n\n q.facets('field_1', 'field_2') will return facets for\n field_1 and field_2.\n q.facets(field_1={'limit': 0}, field_2={'limit': 10})\n will return all facets for field_1 and 10 facets for field_2.\n \"\"\"\n # Combine args and kwargs into facet format.\n facets = dict((a, {}) for a in args)\n facets.update(kwargs)\n\n if not facets:\n raise AttributeError('Faceting requires at least one field')\n\n for f in facets.keys():\n if not isinstance(f, six.string_types):\n raise AttributeError('Facet field arguments must be strings')\n\n q = self._clone()\n q._limit = 0\n q.execute(offset=0, facets=facets)\n return q._response.get('facets')"} {"SOURCE": "codesearchnet", "instruction": "How would you implement a function in Python 3 that\ntakes a list of filters and returns a list of JSON API filters that can be used to create a new object.", "response": "def _process_filters(cls, filters):\n \"\"\"Takes a list of filters and returns JSON\n\n :Parameters:\n - `filters`: List of Filters, (key, val) tuples, or dicts\n\n Returns: List of JSON API filters\n \"\"\"\n data = []\n\n # Filters should always be a list\n for f in filters:\n if isinstance(f, Filter):\n if f.filters:\n data.extend(cls._process_filters(f.filters))\n elif isinstance(f, dict):\n key = list(f.keys())[0]\n val = f[key]\n\n if isinstance(val, dict):\n # pass val (a dict) as list\n # so that it gets processed properly\n filter_filters = cls._process_filters([val])\n if len(filter_filters) == 1:\n filter_filters = filter_filters[0]\n data.append({key: filter_filters})\n else:\n data.append({key: cls._process_filters(val)})\n else:\n data.extend((f,))\n\n return data"} {"SOURCE": "codesearchnet", "instruction": "Can you tell what is the following Python 3 function doing\ndef next(self):\n if not hasattr(self, '_cursor'):\n # Iterator not initialized yet\n self.__iter__()\n\n # len(self) returns `min(limit, total)` results\n if self._cursor == len(self):\n raise StopIteration()\n\n if self._buffer_idx == len(self._buffer):\n self.execute(self._page_offset + self._buffer_idx)\n self._buffer_idx = 0\n\n self._cursor += 1\n self._buffer_idx += 1\n return self._buffer[self._buffer_idx - 1]", "response": "Returns the next result from the cache."} {"SOURCE": "codesearchnet", "instruction": "How would you explain what the following Python 3 function does\ndef execute(self, offset=0, **query):\n _params = self._build_query(**query)\n self._page_offset = offset\n\n _params.update(\n offset=self._page_offset,\n limit=min(self._page_size, self._limit)\n )\n\n logger.debug('executing query. from/limit: %6d/%d' %\n (_params['offset'], _params['limit']))\n\n # If the request results in a SolveError (ie bad filter) set the error.\n try:\n self._response = self._client.post(self._data_url, _params)\n except SolveError as e:\n self._error = e\n raise\n\n logger.debug('query response took: %(took)d ms, total: %(total)d'\n % self._response)\n return _params, self._response", "response": "Executes a query and returns the request parameters and the raw query response."} {"SOURCE": "codesearchnet", "instruction": "How would you code a function in Python 3 to\nmigrate the data from the Query to a target dataset.", "response": "def migrate(self, target, follow=True, **kwargs):\n \"\"\"\n Migrate the data from the Query to a target dataset.\n\n Valid optional kwargs include:\n\n * target_fields\n * include_errors\n * validation_params\n * metadata\n * commit_mode\n\n \"\"\"\n from solvebio import Dataset\n from solvebio import DatasetMigration\n\n # Target can be provided as a Dataset, or as an ID.\n if isinstance(target, Dataset):\n target_id = target.id\n else:\n target_id = target\n\n # If a limit is set in the Query and not overridden here, use it.\n limit = kwargs.pop('limit', None)\n if not limit and self._limit < float('inf'):\n limit = self._limit\n\n # Build the source_params\n params = self._build_query(limit=limit)\n params.pop('offset', None)\n params.pop('ordering', None)\n\n migration = DatasetMigration.create(\n source_id=self._dataset_id,\n target_id=target_id,\n source_params=params,\n client=self._client,\n **kwargs)\n\n if follow:\n migration.follow()\n\n return migration"} {"SOURCE": "codesearchnet", "instruction": "Can you create a Python 3 function that\nlogs in to SolveBio.", "response": "def login(**kwargs):\n \"\"\"\n Sets up the auth credentials using the provided key/token,\n or checks the credentials file (if no token provided).\n\n Lookup order:\n 1. access_token\n 2. api_key\n 3. local credentials\n\n No errors are raised if no key is found.\n \"\"\"\n from .cli.auth import get_credentials\n global access_token, api_key, api_host\n\n # Clear any existing auth keys\n access_token, api_key = None, None\n # Update the host\n api_host = kwargs.get('api_host') or api_host\n\n if kwargs.get('access_token'):\n access_token = kwargs.get('access_token')\n elif kwargs.get('api_key'):\n api_key = kwargs.get('api_key')\n else:\n api_key = get_credentials()\n\n if not (api_key or access_token):\n print('No credentials found. Requests to SolveBio may fail.')\n else:\n from solvebio.client import client\n # Update the client host and token\n client.set_host()\n client.set_token()"} {"SOURCE": "codesearchnet", "instruction": "How would you code a function in Python 3 to\nadd subcommands to the base parser", "response": "def _add_subcommands(self):\n \"\"\"\n The _add_subcommands method must be separate from the __init__\n method, as infinite recursion will occur otherwise, due to the fact\n that the __init__ method itself will be called when instantiating\n a subparser, as we do below\n \"\"\"\n subcmd_params = {\n 'title': 'SolveBio Commands',\n 'dest': 'subcommands'\n }\n subcmd = self.add_subparsers(\n **subcmd_params) # pylint: disable=star-args\n\n subcommands = copy.deepcopy(self.subcommands)\n for name, params in subcommands.items():\n p = subcmd.add_parser(name, help=params['help'])\n p.set_defaults(func=params['func'])\n for arg in params.get('arguments', []):\n name_or_flags = arg.pop('name', None) or arg.pop('flags', None)\n p.add_argument(name_or_flags, **arg)"} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 function that can\nparse the args and add subcommands.", "response": "def parse_solvebio_args(self, args=None, namespace=None):\n \"\"\"\n Try to parse the args first, and then add the subparsers. We want\n to do this so that we can check to see if there are any unknown\n args. We can assume that if, by this point, there are no unknown\n args, we can append shell to the unknown args as a default.\n However, to do this, we have to suppress stdout/stderr during the\n initial parsing, in case the user calls the help method (in which\n case we want to add the additional arguments and *then* call the\n help method. This is a hack to get around the fact that argparse\n doesn't allow default subcommands.\n \"\"\"\n try:\n sys.stdout = sys.stderr = open(os.devnull, 'w')\n _, unknown_args = self.parse_known_args(args, namespace)\n if not unknown_args:\n args.insert(0, 'shell')\n except SystemExit:\n pass\n finally:\n sys.stdout.flush()\n sys.stderr.flush()\n sys.stdout, sys.stderr = sys.__stdout__, sys.__stderr__\n self._add_subcommands()\n return super(SolveArgumentParser, self).parse_args(args, namespace)"} {"SOURCE": "codesearchnet", "instruction": "Explain what the following Python 3 code does\ndef download_vault_folder(remote_path, local_path, dry_run=False, force=False):\n\n local_path = os.path.normpath(os.path.expanduser(local_path))\n if not os.access(local_path, os.W_OK):\n raise Exception(\n 'Write access to local path ({}) is required'\n .format(local_path))\n\n full_path, path_dict = solvebio.Object.validate_full_path(remote_path)\n vault = solvebio.Vault.get_by_full_path(path_dict['vault'])\n print('Downloading all files from {} to {}'.format(full_path, local_path))\n\n if path_dict['path'] == '/':\n parent_object_id = None\n else:\n parent_object = solvebio.Object.get_by_full_path(\n remote_path, assert_type='folder')\n parent_object_id = parent_object.id\n\n # Scan the folder for all sub-folders and create them locally\n print('Creating local directory structure at: {}'.format(local_path))\n if not os.path.exists(local_path):\n if not dry_run:\n os.makedirs(local_path)\n\n folders = vault.folders(parent_object_id=parent_object_id)\n for f in folders:\n path = os.path.normpath(local_path + f.path)\n if not os.path.exists(path):\n print('Creating folder: {}'.format(path))\n if not dry_run:\n os.makedirs(path)\n\n files = vault.files(parent_object_id=parent_object_id)\n for f in files:\n path = os.path.normpath(local_path + f.path)\n if os.path.exists(path):\n if force:\n # Delete the local copy\n print('Deleting local file (force download): {}'.format(path))\n if not dry_run:\n os.remove(path)\n else:\n print('Skipping file (already exists): {}'.format(path))\n continue\n\n print('Downloading file: {}'.format(path))\n if not dry_run:\n f.download(path)", "response": "Recursively downloads a folder in a vault to a local directory."} {"SOURCE": "codesearchnet", "instruction": "Given the following Python 3 function, write the documentation\ndef construct_from(cls, values, **kwargs):\n instance = cls(values.get(cls.ID_ATTR), **kwargs)\n instance.refresh_from(values)\n return instance", "response": "Used to create a new object from an HTTP response"} {"SOURCE": "codesearchnet", "instruction": "Explain what the following Python 3 code does\ndef logout(self):\n if self._oauth_client_secret:\n try:\n oauth_token = flask.request.cookies[self.TOKEN_COOKIE_NAME]\n # Revoke the token\n requests.post(\n urljoin(self._api_host, self.OAUTH2_REVOKE_TOKEN_PATH),\n data={\n 'client_id': self._oauth_client_id,\n 'client_secret': self._oauth_client_secret,\n 'token': oauth_token\n })\n except:\n pass\n\n response = flask.redirect('/')\n self.clear_cookies(response)\n return response", "response": "Revoke the token and remove the cookie."} {"SOURCE": "codesearchnet", "instruction": "Can you write a function in Python 3 where it\nlaunches the SolveBio Python shell.", "response": "def launch_ipython_shell(args): # pylint: disable=unused-argument\n \"\"\"Open the SolveBio shell (IPython wrapper)\"\"\"\n try:\n import IPython # noqa\n except ImportError:\n _print(\"The SolveBio Python shell requires IPython.\\n\"\n \"To install, type: 'pip install ipython'\")\n return False\n\n if hasattr(IPython, \"version_info\"):\n if IPython.version_info > (5, 0, 0, ''):\n return launch_ipython_5_shell(args)\n\n _print(\"WARNING: Please upgrade IPython (you are running version: {})\"\n .format(IPython.__version__))\n return launch_ipython_legacy_shell(args)"} {"SOURCE": "codesearchnet", "instruction": "Can you create a Python 3 function that\nlaunches the SolveBio Python shell with IPython 5 +", "response": "def launch_ipython_5_shell(args):\n \"\"\"Open the SolveBio shell (IPython wrapper) with IPython 5+\"\"\"\n import IPython # noqa\n from traitlets.config import Config\n\n c = Config()\n path = os.path.dirname(os.path.abspath(__file__))\n\n try:\n # see if we're already inside IPython\n get_ipython # pylint: disable=undefined-variable\n _print(\"WARNING: Running IPython within IPython.\")\n except NameError:\n c.InteractiveShell.banner1 = 'SolveBio Python shell started.\\n'\n\n c.InteractiveShellApp.exec_files = ['{}/ipython_init.py'.format(path)]\n IPython.start_ipython(argv=[], config=c)"} {"SOURCE": "codesearchnet", "instruction": "Can you implement a function in Python 3 that\nlaunches the SolveBio Python shell for older IPython versions.", "response": "def launch_ipython_legacy_shell(args): # pylint: disable=unused-argument\n \"\"\"Open the SolveBio shell (IPython wrapper) for older IPython versions\"\"\"\n try:\n from IPython.config.loader import Config\n except ImportError:\n _print(\"The SolveBio Python shell requires IPython.\\n\"\n \"To install, type: 'pip install ipython'\")\n return False\n\n try:\n # see if we're already inside IPython\n get_ipython # pylint: disable=undefined-variable\n except NameError:\n cfg = Config()\n prompt_config = cfg.PromptManager\n prompt_config.in_template = '[SolveBio] In <\\\\#>: '\n prompt_config.in2_template = ' .\\\\D.: '\n prompt_config.out_template = 'Out<\\\\#>: '\n banner1 = '\\nSolveBio Python shell started.'\n\n exit_msg = 'Quitting SolveBio shell.'\n else:\n _print(\"Running nested copies of IPython.\")\n cfg = Config()\n banner1 = exit_msg = ''\n\n # First import the embeddable shell class\n try:\n from IPython.terminal.embed import InteractiveShellEmbed\n except ImportError:\n # pylint: disable=import-error,no-name-in-module\n from IPython.frontend.terminal.embed import InteractiveShellEmbed\n\n path = os.path.dirname(os.path.abspath(__file__))\n init_file = '{}/ipython_init.py'.format(path)\n exec(compile(open(init_file).read(), init_file, 'exec'),\n globals(), locals())\n\n InteractiveShellEmbed(config=cfg, banner1=banner1, exit_msg=exit_msg)()"} {"SOURCE": "codesearchnet", "instruction": "Implement a function in Python 3 to\nissue an HTTP GET across the wire via the Python requests library.", "response": "def get(self, url, params, **kwargs):\n \"\"\"Issues an HTTP GET across the wire via the Python requests\n library. See *request()* for information on keyword args.\"\"\"\n kwargs['params'] = params\n return self.request('GET', url, **kwargs)"} {"SOURCE": "codesearchnet", "instruction": "How would you explain what the following Python 3 function does\ndef delete(self, url, data, **kwargs):\n kwargs['data'] = data\n return self.request('DELETE', url, **kwargs)", "response": "Issues an HTTP DELETE across the wire via the Python requests\n library."} {"SOURCE": "codesearchnet", "instruction": "How would you explain what the following Python 3 function does\ndef request(self, method, url, **kwargs):\n\n opts = {\n 'allow_redirects': True,\n 'auth': self._auth,\n 'data': {},\n 'files': None,\n 'headers': dict(self._headers),\n 'params': {},\n 'timeout': 80,\n 'verify': True\n }\n\n raw = kwargs.pop('raw', False)\n debug = kwargs.pop('debug', False)\n opts.update(kwargs)\n method = method.upper()\n\n if opts['files']:\n # Don't use application/json for file uploads or GET requests\n opts['headers'].pop('Content-Type', None)\n else:\n opts['data'] = json.dumps(opts['data'])\n\n if not url.startswith(self._host):\n url = urljoin(self._host, url)\n\n logger.debug('API %s Request: %s' % (method, url))\n\n if debug:\n self._log_raw_request(method, url, **opts)\n\n try:\n response = self._session.request(method, url, **opts)\n except Exception as e:\n _handle_request_error(e)\n\n if 429 == response.status_code:\n delay = int(response.headers['retry-after']) + 1\n logger.warn('Too many requests. Retrying in {0}s.'.format(delay))\n time.sleep(delay)\n return self.request(method, url, **kwargs)\n\n if not (200 <= response.status_code < 400):\n _handle_api_error(response)\n\n # 204 is used on deletion. There is no JSON here.\n if raw or response.status_code in [204, 301, 302]:\n return response\n\n return response.json()", "response": "Issues an HTTP request across the wire."} {"SOURCE": "codesearchnet", "instruction": "Make a summary of the following Python 3 code\ndef child_object(self):\n from . import types\n child_klass = types.get(self.task_type.split('.')[1])\n return child_klass.retrieve(self.task_id, client=self._client)", "response": "Get Task child object class"} {"SOURCE": "codesearchnet", "instruction": "How would you code a function in Python 3 to\nspecialize INFO field parser for SnpEff ANN fields.", "response": "def _parse_info_snpeff(self, info):\n \"\"\"\n Specialized INFO field parser for SnpEff ANN fields.\n Requires self._snpeff_ann_fields to be set.\n \"\"\"\n ann = info.pop('ANN', []) or []\n # Overwrite the existing ANN with something parsed\n # Split on '|', merge with the ANN keys parsed above.\n # Ensure empty values are None rather than empty string.\n items = []\n for a in ann:\n # For multi-allelic records, we may have already\n # processed ANN. If so, quit now.\n if isinstance(a, dict):\n info['ANN'] = ann\n return info\n\n values = [i or None for i in a.split('|')]\n item = dict(zip(self._snpeff_ann_fields, values))\n\n # Further split the Annotation field by '&'\n if item.get('Annotation'):\n item['Annotation'] = item['Annotation'].split('&')\n\n items.append(item)\n\n info['ANN'] = items\n return info"} {"SOURCE": "codesearchnet", "instruction": "Can you tell what is the following Python 3 function doing\ndef next(self):\n\n def _alt(alt):\n \"\"\"Parses the VCF row ALT object.\"\"\"\n # If alt is '.' in VCF, PyVCF returns None, convert back to '.'\n if not alt:\n return '.'\n else:\n return str(alt)\n\n if not self._next:\n row = next(self.reader)\n alternate_alleles = list(map(_alt, row.ALT))\n\n for allele in alternate_alleles:\n self._next.append(\n self.row_to_dict(\n row,\n allele=allele,\n alternate_alleles=alternate_alleles))\n\n # Source line number, only increment if reading a new row.\n self._line_number += 1\n\n return self._next.pop()", "response": "Parses the next record in the VCF file and returns the record ID."} {"SOURCE": "codesearchnet", "instruction": "Can you tell what is the following Python 3 function doing\ndef row_to_dict(self, row, allele, alternate_alleles):\n\n def _variant_sbid(**kwargs):\n \"\"\"Generates a SolveBio variant ID (SBID).\"\"\"\n return '{build}-{chromosome}-{start}-{stop}-{allele}'\\\n .format(**kwargs).upper()\n\n if allele == '.':\n # Try to use the ref, if '.' is supplied for alt.\n allele = row.REF or allele\n\n genomic_coordinates = {\n 'build': self.genome_build,\n 'chromosome': row.CHROM,\n 'start': row.POS,\n 'stop': row.POS + len(row.REF) - 1\n }\n\n # SolveBio standard variant format\n variant_sbid = _variant_sbid(allele=allele,\n **genomic_coordinates)\n\n return {\n 'genomic_coordinates': genomic_coordinates,\n 'variant': variant_sbid,\n 'allele': allele,\n 'row_id': row.ID,\n 'reference_allele': row.REF,\n 'alternate_alleles': alternate_alleles,\n 'info': self._parse_info(row.INFO),\n 'qual': row.QUAL,\n 'filter': row.FILTER\n }", "response": "Return a parsed dictionary for JSON."} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 script to\nreturn the user s stored API key if a valid credentials file is found. Raises CredentialsError if no valid credentials file is found.", "response": "def get_credentials():\n \"\"\"\n Returns the user's stored API key if a valid credentials file is found.\n Raises CredentialsError if no valid credentials file is found.\n \"\"\"\n try:\n netrc_path = netrc.path()\n auths = netrc(netrc_path).authenticators(\n urlparse(solvebio.api_host).netloc)\n except (IOError, TypeError, NetrcParseError) as e:\n raise CredentialsError(\n 'Could not open credentials file: ' + str(e))\n\n if auths:\n # auths = (login, account, password)\n return auths[2]\n else:\n return None"} {"SOURCE": "codesearchnet", "instruction": "Explain what the following Python 3 code does\ndef save(self, path):\n rep = \"\"\n for host in self.hosts.keys():\n attrs = self.hosts[host]\n rep = rep + \"machine \" + host + \"\\n\\tlogin \" \\\n + six.text_type(attrs[0]) + \"\\n\"\n if attrs[1]:\n rep = rep + \"account \" + six.text_type(attrs[1])\n rep = rep + \"\\tpassword \" + six.text_type(attrs[2]) + \"\\n\"\n for macro in self.macros.keys():\n rep = rep + \"macdef \" + macro + \"\\n\"\n for line in self.macros[macro]:\n rep = rep + line\n rep = rep + \"\\n\"\n\n f = open(path, 'w')\n f.write(rep)\n f.close()", "response": "Save the class data in the format of a. netrc file."} {"SOURCE": "codesearchnet", "instruction": "Can you implement a function in Python 3 that\nchecks if a string is an integer.", "response": "def _isint(string):\n \"\"\"\n >>> _isint(\"123\")\n True\n >>> _isint(\"123.45\")\n False\n \"\"\"\n return type(string) is int or \\\n (isinstance(string, _binary_type) or\n isinstance(string, string_types)) and \\\n _isconvertible(int, string)"} {"SOURCE": "codesearchnet", "instruction": "How would you implement a function in Python 3 that\naligns a list of strings in a column of size minwidth.", "response": "def _align_column(strings, alignment, minwidth=0, has_invisible=True):\n \"\"\"\n [string] -> [padded_string]\n\n >>> list(map(str,_align_column( \\\n [\"12.345\", \"-1234.5\", \"1.23\", \"1234.5\", \\\n \"1e+234\", \"1.0e234\"], \"decimal\")))\n [' 12.345 ', '-1234.5 ', ' 1.23 ', \\\n ' 1234.5 ', ' 1e+234 ', ' 1.0e234']\n\n \"\"\"\n if alignment == \"right\":\n strings = [s.strip() for s in strings]\n padfn = _padleft\n elif alignment in \"center\":\n strings = [s.strip() for s in strings]\n padfn = _padboth\n elif alignment in \"decimal\":\n decimals = [_afterpoint(s) for s in strings]\n maxdecimals = max(decimals)\n strings = [s + (maxdecimals - decs) * \" \"\n for s, decs in zip(strings, decimals)]\n padfn = _padleft\n else:\n strings = [s.strip() for s in strings]\n padfn = _padright\n\n if has_invisible:\n width_fn = _visible_width\n else:\n width_fn = len\n\n maxwidth = max(max(list(map(width_fn, strings))), minwidth)\n padded_strings = [padfn(maxwidth, s, has_invisible) for s in strings]\n return padded_strings"} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 script for\nformatting a value accoding to its type.", "response": "def _format(val, valtype, floatfmt, missingval=\"\"):\n \"\"\"\n Format a value accoding to its type.\n\n Unicode is supported:\n\n >>> hrow = ['\\u0431\\u0443\\u043a\\u0432\\u0430', \\\n '\\u0446\\u0438\\u0444\\u0440\\u0430'] ; \\\n tbl = [['\\u0430\\u0437', 2], ['\\u0431\\u0443\\u043a\\u0438', 4]] ; \\\n good_result = '\\\\u0431\\\\u0443\\\\u043a\\\\u0432\\\\u0430 \\\n \\\\u0446\\\\u0438\\\\u0444\\\\u0440\\\\u0430\\\\n-------\\\n -------\\\\n\\\\u0430\\\\u0437 \\\n 2\\\\n\\\\u0431\\\\u0443\\\\u043a\\\\u0438 4' ; \\\n tabulate(tbl, headers=hrow) == good_result\n True\n\n \"\"\"\n if val is None:\n return missingval\n\n if valtype in [int, _binary_type, _text_type]:\n return \"{0}\".format(val)\n elif valtype is float:\n return format(float(val), floatfmt)\n else:\n return \"{0}\".format(val)"} {"SOURCE": "codesearchnet", "instruction": "How would you implement a function in Python 3 that\ntransforms a tabular data type to a list of lists and a list of headers.", "response": "def _normalize_tabular_data(tabular_data, headers, sort=True):\n \"\"\"\n Transform a supported data type to a list of lists, and a list of headers.\n\n Supported tabular data types:\n\n * list-of-lists or another iterable of iterables\n\n * 2D NumPy arrays\n\n * dict of iterables (usually used with headers=\"keys\")\n\n * pandas.DataFrame (usually used with headers=\"keys\")\n\n The first row can be used as headers if headers=\"firstrow\",\n column indices can be used as headers if headers=\"keys\".\n\n \"\"\"\n\n if hasattr(tabular_data, \"keys\") and hasattr(tabular_data, \"values\"):\n # dict-like and pandas.DataFrame?\n if hasattr(tabular_data.values, \"__call__\"):\n # likely a conventional dict\n keys = list(tabular_data.keys())\n # columns have to be transposed\n rows = list(izip_longest(*list(tabular_data.values())))\n elif hasattr(tabular_data, \"index\"):\n # values is a property, has .index then\n # it's likely a pandas.DataFrame (pandas 0.11.0)\n keys = list(tabular_data.keys())\n # values matrix doesn't need to be transposed\n vals = tabular_data.values\n names = tabular_data.index\n rows = [[v] + list(row) for v, row in zip(names, vals)]\n else:\n raise ValueError(\"tabular data doesn't appear to be a dict \"\n \"or a DataFrame\")\n\n if headers == \"keys\":\n headers = list(map(_text_type, keys)) # headers should be strings\n\n else: # it's, as usual, an iterable of iterables, or a NumPy array\n rows = list(tabular_data)\n\n if headers == \"keys\" and len(rows) > 0: # keys are column indices\n headers = list(map(_text_type, list(range(len(rows[0])))))\n\n # take headers from the first row if necessary\n if headers == \"firstrow\" and len(rows) > 0:\n headers = list(map(_text_type, rows[0])) # headers should be strings\n rows = rows[1:]\n\n headers = list(headers)\n\n rows = list(map(list, rows))\n\n if sort and len(rows) > 1:\n rows = sorted(rows, key=lambda x: x[0])\n\n # pad with empty headers for initial columns if necessary\n if headers and len(rows) > 0:\n nhs = len(headers)\n ncols = len(rows[0])\n if nhs < ncols:\n headers = [\"\"] * (ncols - nhs) + headers\n\n return rows, headers"} {"SOURCE": "codesearchnet", "instruction": "Given the following Python 3 function, write the documentation\ndef _build_row(cells, padding, begin, sep, end):\n \"Return a string which represents a row of data cells.\"\n\n pad = \" \" * padding\n padded_cells = [pad + cell + pad for cell in cells]\n\n # SolveBio: we're only displaying Key-Value tuples (dimension of 2).\n # enforce that we don't wrap lines by setting a max\n # limit on row width which is equal to TTY_COLS (see printing)\n rendered_cells = (begin + sep.join(padded_cells) + end).rstrip()\n if len(rendered_cells) > TTY_COLS:\n if not cells[-1].endswith(\" \") and not cells[-1].endswith(\"-\"):\n terminating_str = \" ... \"\n else:\n terminating_str = \"\"\n rendered_cells = \"{0}{1}{2}\".format(\n rendered_cells[:TTY_COLS - len(terminating_str) - 1],\n terminating_str, end)\n\n return rendered_cells", "response": "Return a string which represents a row of data cells."} {"SOURCE": "codesearchnet", "instruction": "Implement a Python 3 function for\nreturning a string which represents a horizontal line.", "response": "def _build_line(colwidths, padding, begin, fill, sep, end):\n \"Return a string which represents a horizontal line.\"\n cells = [fill * (w + 2 * padding) for w in colwidths]\n return _build_row(cells, 0, begin, sep, end)"} {"SOURCE": "codesearchnet", "instruction": "Can you implement a function in Python 3 that\nprefixes every cell in a row with an HTML alignment attribute.", "response": "def _mediawiki_cell_attrs(row, colaligns):\n \"Prefix every cell in a row with an HTML alignment attribute.\"\n alignment = {\"left\": '',\n \"right\": 'align=\"right\"| ',\n \"center\": 'align=\"center\"| ',\n \"decimal\": 'align=\"right\"| '}\n row2 = [alignment[a] + c for c, a in zip(row, colaligns)]\n return row2"} {"SOURCE": "codesearchnet", "instruction": "Can you generate a brief explanation for the following Python 3 code\ndef _line_segment_with_colons(linefmt, align, colwidth):\n fill = linefmt.hline\n w = colwidth\n if align in [\"right\", \"decimal\"]:\n return (fill[0] * (w - 1)) + \":\"\n elif align == \"center\":\n return \":\" + (fill[0] * (w - 2)) + \":\"\n elif align == \"left\":\n return \":\" + (fill[0] * (w - 1))\n else:\n return fill[0] * w", "response": "Return a horizontal line with optional colons which\n indicate column s alignment."} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 script for\nproducing a plain - text representation of the table.", "response": "def _format_table(fmt, headers, rows, colwidths, colaligns):\n \"\"\"Produce a plain-text representation of the table.\"\"\"\n lines = []\n hidden = fmt.with_header_hide if headers else fmt.without_header_hide\n pad = fmt.padding\n headerrow = fmt.headerrow if fmt.headerrow else fmt.datarow\n\n if fmt.lineabove and \"lineabove\" not in hidden:\n lines.append(_build_line(colwidths, pad, *fmt.lineabove))\n\n if headers:\n lines.append(_build_row(headers, pad, *headerrow))\n\n if fmt.linebelowheader and \"linebelowheader\" not in hidden:\n begin, fill, sep, end = fmt.linebelowheader\n if fmt.usecolons:\n segs = [\n _line_segment_with_colons(fmt.linebelowheader, a, w + 2 * pad)\n for w, a in zip(colwidths, colaligns)]\n lines.append(_build_row(segs, 0, begin, sep, end))\n else:\n lines.append(_build_line(colwidths, pad, *fmt.linebelowheader))\n\n if rows and fmt.linebetweenrows and \"linebetweenrows\" not in hidden:\n # initial rows with a line below\n for row in rows[:-1]:\n lines.append(_build_row(row, pad, *fmt.datarow))\n lines.append(_build_line(colwidths, pad, *fmt.linebetweenrows))\n # the last row without a line below\n lines.append(_build_row(rows[-1], pad, *fmt.datarow))\n else:\n for row in rows:\n lines.append(_build_row(row, pad, *fmt.datarow))\n\n if fmt.linebelow and \"linebelow\" not in hidden:\n lines.append(_build_line(colwidths, pad, *fmt.linebelow))\n\n return \"\\n\".join(lines)"} {"SOURCE": "codesearchnet", "instruction": "Can you generate the documentation for the following Python 3 function\ndef import_file(self, path, **kwargs):\n from . import Manifest\n from . import DatasetImport\n\n if 'id' not in self or not self['id']:\n raise Exception(\n 'No Dataset ID found. '\n 'Please instantiate or retrieve a dataset '\n 'with an ID.')\n\n manifest = Manifest()\n manifest.add(path)\n return DatasetImport.create(\n dataset_id=self['id'],\n manifest=manifest.manifest,\n **kwargs)", "response": "Creates a new DatasetImport object for the given path."} {"SOURCE": "codesearchnet", "instruction": "Can you tell what is the following Python 3 function doing\ndef migrate(self, target, follow=True, **kwargs):\n if 'id' not in self or not self['id']:\n raise Exception(\n 'No source dataset ID found. '\n 'Please instantiate the Dataset '\n 'object with an ID.')\n\n # Target can be provided as a Dataset, or as an ID.\n if isinstance(target, Dataset):\n target_id = target.id\n else:\n target_id = target\n\n migration = DatasetMigration.create(\n source_id=self['id'],\n target_id=target_id,\n **kwargs)\n\n if follow:\n migration.follow()\n\n return migration", "response": "Migrate the data from this dataset to a target dataset."} {"SOURCE": "codesearchnet", "instruction": "Can you generate the documentation for the following Python 3 function\ndef validate_full_path(cls, full_path, **kwargs):\n from solvebio.resource.vault import Vault\n\n _client = kwargs.pop('client', None) or cls._client or client\n\n if not full_path:\n raise Exception(\n 'Invalid path: ',\n 'Full path must be in one of the following formats: '\n '\"vault:/path\", \"domain:vault:/path\", or \"~/path\"')\n\n # Parse the vault's full_path, using overrides if any\n input_vault = kwargs.get('vault') or full_path\n try:\n vault_full_path, path_dict = \\\n Vault.validate_full_path(input_vault, client=_client)\n except Exception as err:\n raise Exception('Could not determine vault from \"{0}\": {1}'\n .format(input_vault, err))\n\n if kwargs.get('path'):\n # Allow override of the object_path.\n full_path = '{0}:/{1}'.format(vault_full_path, kwargs['path'])\n\n match = cls.PATH_RE.match(full_path)\n if match:\n object_path = match.groupdict()['path']\n else:\n raise Exception(\n 'Cannot find a valid object path in \"{0}\". '\n 'Full path must be in one of the following formats: '\n '\"vault:/path\", \"domain:vault:/path\", or \"~/path\"'\n .format(full_path))\n\n # Remove double slashes\n object_path = re.sub('//+', '/', object_path)\n if object_path != '/':\n # Remove trailing slash\n object_path = object_path.rstrip('/')\n\n path_dict['path'] = object_path\n # TODO: parent_path and filename\n full_path = '{domain}:{vault}:{path}'.format(**path_dict)\n path_dict['full_path'] = full_path\n return full_path, path_dict", "response": "Validate a full path and return a dict containing the full path parts."} {"SOURCE": "codesearchnet", "instruction": "Implement a function in Python 3 to\ncreate a new dataset from a template ID or a JSON file.", "response": "def create_dataset(args):\n \"\"\"\n Attempt to create a new dataset given the following params:\n\n * template_id\n * template_file\n * capacity\n * create_vault\n * [argument] dataset name or full path\n\n NOTE: genome_build has been deprecated and is no longer used.\n\n \"\"\"\n # For backwards compatibility, the \"full_path\" argument\n # can be a dataset filename, but only if vault and path\n # are set. If vault/path are both provided and there\n # are no forward-slashes in the \"full_path\", assume\n # the user has provided a dataset filename.\n if '/' not in args.full_path and args.vault and args.path:\n full_path, path_dict = Object.validate_full_path(\n '{0}:/{1}/{2}'.format(args.vault, args.path, args.full_path))\n else:\n full_path, path_dict = Object.validate_full_path(\n args.full_path, vault=args.vault, path=args.path)\n\n # Accept a template_id or a template_file\n if args.template_id:\n # Validate the template ID\n try:\n tpl = solvebio.DatasetTemplate.retrieve(args.template_id)\n except solvebio.SolveError as e:\n if e.status_code != 404:\n raise e\n print(\"No template with ID {0} found!\"\n .format(args.template_id))\n sys.exit(1)\n elif args.template_file:\n mode = 'r'\n fopen = open\n if check_gzip_path(args.template_file):\n mode = 'rb'\n fopen = gzip.open\n\n # Validate the template file\n with fopen(args.template_file, mode) as fp:\n try:\n tpl_json = json.load(fp)\n except:\n print('Template file {0} could not be loaded. Please '\n 'pass valid JSON'.format(args.template_file))\n sys.exit(1)\n\n tpl = solvebio.DatasetTemplate.create(**tpl_json)\n print(\"A new dataset template was created with id: {0}\".format(tpl.id))\n else:\n print(\"Creating a new dataset {0} without a template.\"\n .format(full_path))\n tpl = None\n fields = []\n entity_type = None\n description = None\n\n if tpl:\n print(\"Creating new dataset {0} using the template '{1}'.\"\n .format(full_path, tpl.name))\n fields = tpl.fields\n entity_type = tpl.entity_type\n # include template used to create\n description = 'Created with dataset template: {0}'.format(str(tpl.id))\n\n return solvebio.Dataset.get_or_create_by_full_path(\n full_path,\n capacity=args.capacity,\n entity_type=entity_type,\n fields=fields,\n description=description,\n create_vault=args.create_vault,\n )"} {"SOURCE": "codesearchnet", "instruction": "Explain what the following Python 3 code does\ndef upload(args):\n base_remote_path, path_dict = Object.validate_full_path(\n args.full_path, vault=args.vault, path=args.path)\n\n # Assert the vault exists and is accessible\n vault = Vault.get_by_full_path(path_dict['vault_full_path'])\n\n # If not the vault root, validate remote path exists and is a folder\n if path_dict['path'] != '/':\n Object.get_by_full_path(base_remote_path, assert_type='folder')\n\n for local_path in args.local_path:\n local_path = local_path.rstrip('/')\n local_start = os.path.basename(local_path)\n\n if os.path.isdir(local_path):\n _upload_folder(path_dict['domain'], vault,\n base_remote_path, local_path, local_start)\n else:\n Object.upload_file(local_path, path_dict['path'],\n vault.full_path)", "response": "Upload all the folders and files contained within a folder or file to the remote."} {"SOURCE": "codesearchnet", "instruction": "Create a Python 3 function to\nupload and import a file into a new dataset.", "response": "def import_file(args):\n \"\"\"\n Given a dataset and a local path, upload and import the file(s).\n\n Command arguments (args):\n\n * create_dataset\n * template_id\n * full_path\n * vault (optional, overrides the vault in full_path)\n * path (optional, overrides the path in full_path)\n * commit_mode\n * capacity\n * file (list)\n * follow (default: False)\n\n \"\"\"\n full_path, path_dict = Object.validate_full_path(\n args.full_path, vault=args.vault, path=args.path)\n\n # Ensure the dataset exists. Create if necessary.\n if args.create_dataset:\n dataset = create_dataset(args)\n else:\n try:\n dataset = solvebio.Dataset.get_by_full_path(full_path)\n except solvebio.SolveError as e:\n if e.status_code != 404:\n raise e\n\n print(\"Dataset not found: {0}\".format(full_path))\n print(\"Tip: use the --create-dataset flag \"\n \"to create one from a template\")\n sys.exit(1)\n\n # Generate a manifest from the local files\n manifest = solvebio.Manifest()\n manifest.add(*args.file)\n\n # Create the manifest-based import\n imp = solvebio.DatasetImport.create(\n dataset_id=dataset.id,\n manifest=manifest.manifest,\n commit_mode=args.commit_mode\n )\n\n if args.follow:\n imp.follow()\n else:\n mesh_url = 'https://my.solvebio.com/activity/'\n print(\"Your import has been submitted, view details at: {0}\"\n .format(mesh_url))"} {"SOURCE": "codesearchnet", "instruction": "Can you tell what is the following Python 3 function doing\ndef validate_full_path(cls, full_path, **kwargs):\n _client = kwargs.pop('client', None) or cls._client or client\n\n full_path = full_path.strip()\n if not full_path:\n raise Exception(\n 'Vault path \"{0}\" is invalid. Path must be in the format: '\n '\"domain:vault:/path\" or \"vault:/path\".'.format(full_path)\n )\n\n match = cls.VAULT_PATH_RE.match(full_path)\n if not match:\n raise Exception(\n 'Vault path \"{0}\" is invalid. Path must be in the format: '\n '\"domain:vault:/path\" or \"vault:/path\".'.format(full_path)\n )\n path_parts = match.groupdict()\n\n # Handle the special case where \"~\" means personal vault\n if path_parts.get('vault') == '~':\n path_parts = dict(domain=None, vault=None)\n\n # If any values are None, set defaults from the user.\n if None in path_parts.values():\n user = _client.get('/v1/user', {})\n defaults = {\n 'domain': user['account']['domain'],\n 'vault': 'user-{0}'.format(user['id'])\n }\n path_parts = dict((k, v or defaults.get(k))\n for k, v in path_parts.items())\n\n # Rebuild the full path\n full_path = '{domain}:{vault}'.format(**path_parts)\n path_parts['vault_full_path'] = full_path\n return full_path, path_parts", "response": "Validate a full path."} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 function for\nvalidating SolveBio API host url.", "response": "def validate_api_host_url(url):\n \"\"\"\n Validate SolveBio API host url.\n\n Valid urls must not be empty and\n must contain either HTTP or HTTPS scheme.\n \"\"\"\n if not url:\n raise SolveError('No SolveBio API host is set')\n\n parsed = urlparse(url)\n if parsed.scheme not in ['http', 'https']:\n raise SolveError(\n 'Invalid API host: %s. '\n 'Missing url scheme (HTTP or HTTPS).' % url\n )\n\n elif not parsed.netloc:\n raise SolveError('Invalid API host: %s.' % url)\n\n return True"} {"SOURCE": "codesearchnet", "instruction": "Implement a function in Python 3 to\nadd one or more files or URLs to the manifest.", "response": "def add(self, *args):\n \"\"\"\n Add one or more files or URLs to the manifest.\n If files contains a glob, it is expanded.\n\n All files are uploaded to SolveBio. The Upload\n object is used to fill the manifest.\n \"\"\"\n def _is_url(path):\n p = urlparse(path)\n return bool(p.scheme)\n\n for path in args:\n path = os.path.expanduser(path)\n if _is_url(path):\n self.add_url(path)\n elif os.path.isfile(path):\n self.add_file(path)\n elif os.path.isdir(path):\n for f in os.listdir(path):\n self.add_file(f)\n elif glob.glob(path):\n for f in glob.glob(path):\n self.add_file(f)\n else:\n raise ValueError(\n 'Path: \"{0}\" is not a valid format or does not exist. '\n 'Manifest paths must be files, directories, or URLs.'\n .format(path)\n )"} {"SOURCE": "codesearchnet", "instruction": "Can you generate a brief explanation for the following Python 3 code\ndef annotate(self, records, **kwargs):\n # Update annotator_params with any kwargs\n self.annotator_params.update(**kwargs)\n chunk_size = self.annotator_params.get('chunk_size', self.CHUNK_SIZE)\n\n chunk = []\n for i, record in enumerate(records):\n chunk.append(record)\n if (i + 1) % chunk_size == 0:\n for r in self._execute(chunk):\n yield r\n chunk = []\n\n if chunk:\n for r in self._execute(chunk):\n yield r\n chunk = []", "response": "Annotate a set of records with stored fields."} {"SOURCE": "codesearchnet", "instruction": "Can you create a Python 3 function that\nevaluates the expression with the provided context and format.", "response": "def evaluate(self, data=None, data_type='string', is_list=False):\n \"\"\"Evaluates the expression with the provided context and format.\"\"\"\n payload = {\n 'data': data,\n 'expression': self.expr,\n 'data_type': data_type,\n 'is_list': is_list\n }\n res = self._client.post('/v1/evaluate', payload)\n return res['result']"} {"SOURCE": "codesearchnet", "instruction": "Explain what the following Python 3 code does\ndef format_output(data, headers, format_name, **kwargs):\n formatter = TabularOutputFormatter(format_name=format_name)\n return formatter.format_output(data, headers, **kwargs)", "response": "Format output using format_name."} {"SOURCE": "codesearchnet", "instruction": "Can you generate the documentation for the following Python 3 function\ndef format_name(self, format_name):\n if format_name in self.supported_formats:\n self._format_name = format_name\n else:\n raise ValueError('unrecognized format_name \"{}\"'.format(\n format_name))", "response": "Set the default format name."} {"SOURCE": "codesearchnet", "instruction": "Implement a function in Python 3 to\nregister a new output formatter.", "response": "def register_new_formatter(cls, format_name, handler, preprocessors=(),\n kwargs=None):\n \"\"\"Register a new output formatter.\n\n :param str format_name: The name of the format.\n :param callable handler: The function that formats the data.\n :param tuple preprocessors: The preprocessors to call before\n formatting.\n :param dict kwargs: Keys/values for keyword argument defaults.\n\n \"\"\"\n cls._output_formats[format_name] = OutputFormatHandler(\n format_name, preprocessors, handler, kwargs or {})"} {"SOURCE": "codesearchnet", "instruction": "Implement a function in Python 3 to\nformat the data using a specific formatter.", "response": "def format_output(self, data, headers, format_name=None,\n preprocessors=(), column_types=None, **kwargs):\n \"\"\"Format the headers and data using a specific formatter.\n\n *format_name* must be a supported formatter (see\n :attr:`supported_formats`).\n\n :param iterable data: An :term:`iterable` (e.g. list) of rows.\n :param iterable headers: The column headers.\n :param str format_name: The display format to use (optional, if the\n :class:`TabularOutputFormatter` object has a default format set).\n :param tuple preprocessors: Additional preprocessors to call before\n any formatter preprocessors.\n :param \\*\\*kwargs: Optional arguments for the formatter.\n :return: The formatted data.\n :rtype: str\n :raises ValueError: If the *format_name* is not recognized.\n\n \"\"\"\n format_name = format_name or self._format_name\n if format_name not in self.supported_formats:\n raise ValueError('unrecognized format \"{}\"'.format(format_name))\n\n (_, _preprocessors, formatter,\n fkwargs) = self._output_formats[format_name]\n fkwargs.update(kwargs)\n if column_types is None:\n data = list(data)\n column_types = self._get_column_types(data)\n for f in unique_items(preprocessors + _preprocessors):\n data, headers = f(data, headers, column_types=column_types,\n **fkwargs)\n return formatter(list(data), headers, column_types=column_types, **fkwargs)"} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 script to\nget a list of the data types for each column in data.", "response": "def _get_column_types(self, data):\n \"\"\"Get a list of the data types for each column in *data*.\"\"\"\n columns = list(zip_longest(*data))\n return [self._get_column_type(column) for column in columns]"} {"SOURCE": "codesearchnet", "instruction": "How would you implement a function in Python 3 that\ngets the most generic data type for iterable column.", "response": "def _get_column_type(self, column):\n \"\"\"Get the most generic data type for iterable *column*.\"\"\"\n type_values = [TYPES[self._get_type(v)] for v in column]\n inverse_types = {v: k for k, v in TYPES.items()}\n return inverse_types[max(type_values)]"} {"SOURCE": "codesearchnet", "instruction": "Create a Python 3 function for\ngetting the data type for the value.", "response": "def _get_type(self, value):\n \"\"\"Get the data type for *value*.\"\"\"\n if value is None:\n return type(None)\n elif type(value) in int_types:\n return int\n elif type(value) in float_types:\n return float\n elif isinstance(value, binary_type):\n return binary_type\n else:\n return text_type"} {"SOURCE": "codesearchnet", "instruction": "Can you implement a function in Python 3 that\nwraps tabulate inside a function for TabularOutputFormatter.", "response": "def adapter(data, headers, table_format=None, preserve_whitespace=False,\n **kwargs):\n \"\"\"Wrap tabulate inside a function for TabularOutputFormatter.\"\"\"\n keys = ('floatfmt', 'numalign', 'stralign', 'showindex', 'disable_numparse')\n tkwargs = {'tablefmt': table_format}\n tkwargs.update(filter_dict_by_key(kwargs, keys))\n\n if table_format in supported_markup_formats:\n tkwargs.update(numalign=None, stralign=None)\n\n tabulate.PRESERVE_WHITESPACE = preserve_whitespace\n\n return iter(tabulate.tabulate(data, headers, **tkwargs).split('\\n'))"} {"SOURCE": "codesearchnet", "instruction": "Can you create a Python 3 function that\nreturns the config folder for the user.", "response": "def get_user_config_dir(app_name, app_author, roaming=True, force_xdg=True):\n \"\"\"Returns the config folder for the application. The default behavior\n is to return whatever is most appropriate for the operating system.\n\n For an example application called ``\"My App\"`` by ``\"Acme\"``,\n something like the following folders could be returned:\n\n macOS (non-XDG):\n ``~/Library/Application Support/My App``\n Mac OS X (XDG):\n ``~/.config/my-app``\n Unix:\n ``~/.config/my-app``\n Windows 7 (roaming):\n ``C:\\\\Users\\\\AppData\\Roaming\\Acme\\My App``\n Windows 7 (not roaming):\n ``C:\\\\Users\\\\AppData\\Local\\Acme\\My App``\n\n :param app_name: the application name. This should be properly capitalized\n and can contain whitespace.\n :param app_author: The app author's name (or company). This should be\n properly capitalized and can contain whitespace.\n :param roaming: controls if the folder should be roaming or not on Windows.\n Has no effect on non-Windows systems.\n :param force_xdg: if this is set to `True`, then on macOS the XDG Base\n Directory Specification will be followed. Has no effect\n on non-macOS systems.\n\n \"\"\"\n if WIN:\n key = 'APPDATA' if roaming else 'LOCALAPPDATA'\n folder = os.path.expanduser(os.environ.get(key, '~'))\n return os.path.join(folder, app_author, app_name)\n if MAC and not force_xdg:\n return os.path.join(os.path.expanduser(\n '~/Library/Application Support'), app_name)\n return os.path.join(\n os.path.expanduser(os.environ.get('XDG_CONFIG_HOME', '~/.config')),\n _pathify(app_name))"} {"SOURCE": "codesearchnet", "instruction": "Explain what the following Python 3 code does\ndef get_system_config_dirs(app_name, app_author, force_xdg=True):\n if WIN:\n folder = os.environ.get('PROGRAMDATA')\n return [os.path.join(folder, app_author, app_name)]\n if MAC and not force_xdg:\n return [os.path.join('/Library/Application Support', app_name)]\n dirs = os.environ.get('XDG_CONFIG_DIRS', '/etc/xdg')\n paths = [os.path.expanduser(x) for x in dirs.split(os.pathsep)]\n return [os.path.join(d, _pathify(app_name)) for d in paths]", "response": "r Returns a list of system - wide config folders for the application."} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 script to\nread the default config file.", "response": "def read_default_config(self):\n \"\"\"Read the default config file.\n\n :raises DefaultConfigValidationError: There was a validation error with\n the *default* file.\n \"\"\"\n if self.validate:\n self.default_config = ConfigObj(configspec=self.default_file,\n list_values=False, _inspec=True,\n encoding='utf8')\n valid = self.default_config.validate(Validator(), copy=True,\n preserve_errors=True)\n if valid is not True:\n for name, section in valid.items():\n if section is True:\n continue\n for key, value in section.items():\n if isinstance(value, ValidateError):\n raise DefaultConfigValidationError(\n 'section [{}], key \"{}\": {}'.format(\n name, key, value))\n elif self.default_file:\n self.default_config, _ = self.read_config_file(self.default_file)\n\n self.update(self.default_config)"} {"SOURCE": "codesearchnet", "instruction": "How would you explain what the following Python 3 function does\ndef read(self):\n if self.default_file:\n self.read_default_config()\n return self.read_config_files(self.all_config_files())", "response": "Read the default additional system and user config files."} {"SOURCE": "codesearchnet", "instruction": "Given the following Python 3 function, write the documentation\ndef user_config_file(self):\n return os.path.join(\n get_user_config_dir(self.app_name, self.app_author),\n self.filename)", "response": "Get the absolute path to the user config file."} {"SOURCE": "codesearchnet", "instruction": "Create a Python 3 function for\ngetting a list of absolute paths to the system config files.", "response": "def system_config_files(self):\n \"\"\"Get a list of absolute paths to the system config files.\"\"\"\n return [os.path.join(f, self.filename) for f in get_system_config_dirs(\n self.app_name, self.app_author)]"} {"SOURCE": "codesearchnet", "instruction": "Make a summary of the following Python 3 code\ndef additional_files(self):\n return [os.path.join(f, self.filename) for f in self.additional_dirs]", "response": "Get a list of absolute paths to the additional config files."} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 script to\nwrite the default config to the user s config file.", "response": "def write_default_config(self, overwrite=False):\n \"\"\"Write the default config to the user's config file.\n\n :param bool overwrite: Write over an existing config if it exists.\n \"\"\"\n destination = self.user_config_file()\n if not overwrite and os.path.exists(destination):\n return\n\n with io.open(destination, mode='wb') as f:\n self.default_config.write(f)"} {"SOURCE": "codesearchnet", "instruction": "Here you have a function in Python 3, explain what it does\ndef write(self, outfile=None, section=None):\n with io.open(outfile or self.user_config_file(), 'wb') as f:\n self.data.write(outfile=f, section=section)", "response": "Write the current config to a file."} {"SOURCE": "codesearchnet", "instruction": "Given the following Python 3 function, write the documentation\ndef read_config_file(self, f):\n configspec = self.default_file if self.validate else None\n try:\n config = ConfigObj(infile=f, configspec=configspec,\n interpolation=False, encoding='utf8')\n except ConfigObjError as e:\n logger.warning(\n 'Unable to parse line {} of config file {}'.format(\n e.line_number, f))\n config = e.config\n\n valid = True\n if self.validate:\n valid = config.validate(Validator(), preserve_errors=True,\n copy=True)\n if bool(config):\n self.config_filenames.append(config.filename)\n\n return config, valid", "response": "Read a config file."} {"SOURCE": "codesearchnet", "instruction": "Here you have a function in Python 3, explain what it does\ndef read_config_files(self, files):\n errors = {}\n for _file in files:\n config, valid = self.read_config_file(_file)\n self.update(config)\n if valid is not True:\n errors[_file] = valid\n return errors or True", "response": "Read a list of config files."} {"SOURCE": "codesearchnet", "instruction": "Can you tell what is the following Python 3 function doing\ndef bytes_to_string(b):\n if isinstance(b, binary_type):\n try:\n return b.decode('utf8')\n except UnicodeDecodeError:\n return '0x' + binascii.hexlify(b).decode('ascii')\n return b", "response": "Convert bytes to a string."} {"SOURCE": "codesearchnet", "instruction": "Make a summary of the following Python 3 code\ndef filter_dict_by_key(d, keys):\n return {k: v for k, v in d.items() if k in keys}", "response": "Filter the dict d to remove keys not in keys."} {"SOURCE": "codesearchnet", "instruction": "Can you generate the documentation for the following Python 3 function\ndef unique_items(seq):\n seen = set()\n return [x for x in seq if not (x in seen or seen.add(x))]", "response": "Return the unique items from iterable seq in order."} {"SOURCE": "codesearchnet", "instruction": "Can you implement a function in Python 3 that\nreplaces multiple values in a string", "response": "def replace(s, replace):\n \"\"\"Replace multiple values in a string\"\"\"\n for r in replace:\n s = s.replace(*r)\n return s"} {"SOURCE": "codesearchnet", "instruction": "Can you create a Python 3 function that\nwraps the formatting inside a function for TabularOutputFormatter.", "response": "def adapter(data, headers, **kwargs):\n \"\"\"Wrap the formatting inside a function for TabularOutputFormatter.\"\"\"\n for row in chain((headers,), data):\n yield \"\\t\".join((replace(r, (('\\n', r'\\n'), ('\\t', r'\\t'))) for r in row))"} {"SOURCE": "codesearchnet", "instruction": "Here you have a function in Python 3, explain what it does\ndef call_and_exit(self, cmd, shell=True):\n sys.exit(subprocess.call(cmd, shell=shell))", "response": "Run the cmd and exit with the proper exit code."} {"SOURCE": "codesearchnet", "instruction": "Write a Python 3 script for\nrunning multiple commmands in a row exiting if one fails.", "response": "def call_in_sequence(self, cmds, shell=True):\n \"\"\"Run multiple commmands in a row, exiting if one fails.\"\"\"\n for cmd in cmds:\n if subprocess.call(cmd, shell=shell) == 1:\n sys.exit(1)"} {"SOURCE": "codesearchnet", "instruction": "Create a Python 3 function for\napplying command - line options.", "response": "def apply_options(self, cmd, options=()):\n \"\"\"Apply command-line options.\"\"\"\n for option in (self.default_cmd_options + options):\n cmd = self.apply_option(cmd, option,\n active=getattr(self, option, False))\n return cmd"} {"SOURCE": "codesearchnet", "instruction": "How would you code a function in Python 3 to\napply a command - line option.", "response": "def apply_option(self, cmd, option, active=True):\n \"\"\"Apply a command-line option.\"\"\"\n return re.sub(r'{{{}\\:(?P