[ { "text": "10. Brief Tour of the Standard Library\n**************************************\n\n10.1. Operating System Interface\n================================\n\nThe \"os\" module provides dozens of functions for interacting with the\noperating system:\n\n >>> import os\n >>> os.getcwd() # Return the current working directory\n 'C:\\\\Python314'\n >>> os.chdir('/server/accesslogs') # Change current working directory\n >>> os.system('mkdir today') # Run the command mkdir in the system shell\n 0\n\nBe sure to use the \"import os\" style instead of \"from os import *\".\nThis will keep \"os.open()\" from shadowing the built-in \"open()\"\nfunction which operates much differently.\n\nThe built-in \"dir()\" and \"help()\" functions are useful as interactive\naids for working with large modules like \"os\":\n\n >>> import os\n >>> dir(os)\n \n >>> help(os)\n \n\nFor daily file and directory management tasks, the \"shutil\" module\nprovides a higher level interface that is easier to use:\n\n >>> import shutil\n >>> shutil.copyfile('data.db', 'archive.db')\n 'archive.db'\n >>> shutil.move('/build/executables', 'installdir')\n 'installdir'\n\n10.2. File Wildcards\n====================\n\nThe \"glob\" module provides a function for making file lists from\ndirectory wildcard searches:\n\n >>> import glob\n >>> glob.glob('*.py')\n ['primes.py', 'random.py', 'quote.py']\n\n10.3. Command Line Arguments\n============================\n\nCommon utility scripts often need to process command line arguments.\nThese arguments are stored in the \"sys\" module's *argv* attribute as a\nlist. For instance, let's take the following \"demo.py\" file:\n\n # File demo.py\n import sys\n print(sys.argv)\n\nHere is the output from running \"python demo.py one two three\" at the\ncommand line:\n\n ['demo.py', 'one', 'two', 'three']\n\nThe \"argparse\" module provides a more sophisticated mechanism to\nprocess command line arguments. The following script extracts one or\nmore filenames and an optional number of lines to be displayed:\n\n import argparse\n\n parser = argparse.ArgumentParser(\n prog='top',\n description='Show top lines from each file')\n parser.add_argument('filenames', nargs='+')\n parser.add_argument('-l', '--lines', type=int, default=10)\n args = parser.parse_args()\n print(args)\n\nWhen run at the command line with \"python top.py --lines=5 alpha.txt\nbeta.txt\", the script sets \"args.lines\" to \"5\" and \"args.filenames\" to\n\"['alpha.txt', 'beta.txt']\".\n\n10.4. Error Output Redirection and Program Termination\n======================================================\n\nThe \"sys\" module also has attributes for *stdin*, *stdout*, and\n*stderr*. The latter is useful for emitting warnings and error\nmessages to make them visible even when *stdout* has been redirected:\n\n >>> sys.stderr.write('Warning, log file not found starting a new one\\n')\n Warning, log file not found starting a new one\n\nThe most direct way to terminate a script is to use \"sys.exit()\".\n\n10.5. String Pattern Matching\n=============================\n\nThe \"re\" module provides regular expression tools for advanced string\nprocessing. For complex matching and manipulation, regular expressions\noffer succinct, optimized solutions:\n\n >>> import re\n >>> re.findall(r'\\bf[a-z]*', 'which foot or hand fell fastest')\n ['foot', 'fell', 'fastest']\n >>> re.sub(r'(\\b[a-z]+) \\1', r'\\1', 'cat in the the hat')\n 'cat in the hat'\n\nWhen only simple capabilities are needed, string methods are preferred\nbecause they are easier to read and debug:\n\n >>> 'tea for too'.replace('too', 'two')\n 'tea for two'\n\n10.6. Mathematics\n=================\n\nThe \"math\" module gives access to the underlying C library functions\nfor floating-point math:\n\n >>> import math\n >>> math.cos(math.pi / 4)\n 0.70710678118654757\n >>> math.log(1024, 2)\n 10.0\n\nThe \"random\" module provides tools for making random selections:\n\n >>> import random\n >>> random.choice(['apple', 'pear', 'banana'])\n 'apple'\n >>> random.sample(range(100), 10) # sampling without replacement\n [30, 83, 16, 4, 8, 81, 41, 50, 18, 33]\n >>> random.random() # random float from the interval [0.0, 1.0)\n 0.17970987693706186\n >>> random.randrange(6) # random integer chosen from range(6)\n 4\n\nThe \"statistics\" module calculates basic statistical properties (the\nmean, median, variance, etc.) of numeric data:\n\n >>> import statistics\n >>> data = [2.75, 1.75, 1.25, 0.25, 0.5, 1.25, 3.5]\n >>> statistics.mean(data)\n 1.6071428571428572\n >>> statistics.median(data)\n 1.25\n >>> statistics.variance(data)\n 1.3720238095238095\n\nThe SciPy project has many other modules for\nnumerical computations.\n\n10.7. Internet Access\n=====================\n\nThere are a number of modules for accessing the internet and\nprocessing internet protocols. Two of the simplest are\n\"urllib.request\" for retrieving data from URLs and \"smtplib\" for\nsending mail:\n\n >>> from urllib.request import urlopen\n >>> with urlopen('https://docs.python.org/3/') as response:\n ... for line in response:\n ... line = line.decode() # Convert bytes to a str\n ... if 'updated' in line:\n ... print(line.rstrip()) # Remove trailing newline\n ...\n Last updated on Nov 11, 2025 (20:11 UTC).\n\n >>> import smtplib\n >>> server = smtplib.SMTP('localhost')\n >>> server.sendmail('soothsayer@example.org', 'jcaesar@example.org',\n ... \"\"\"To: jcaesar@example.org\n ... From: soothsayer@example.org\n ...\n ... Beware the Ides of March.\n ... \"\"\")\n >>> server.quit()\n\n(Note that the second example needs a mailserver running on\nlocalhost.)\n\n10.8. Dates and Times\n=====================\n\nThe \"datetime\" module supplies classes for manipulating dates and\ntimes in both simple and complex ways. While date and time arithmetic\nis supported, the focus of the implementation is on efficient member\nextraction for output formatting and manipulation. The module also\nsupports objects that are timezone aware.\n\n >>> # dates are easily constructed and formatted\n >>> from datetime import date\n >>> now = date.today()\n >>> now\n datetime.date(2003, 12, 2)\n >>> now.strftime(\"%m-%d-%y. %d %b %Y is a %A on the %d day of %B.\")\n '12-02-03. 02 Dec 2003 is a Tuesday on the 02 day of December.'\n\n >>> # dates support calendar arithmetic\n >>> birthday = date(1964, 7, 31)\n >>> age = now - birthday\n >>> age.days\n 14368\n\n10.9. Data Compression\n======================\n\nCommon data archiving and compression formats are directly supported\nby modules including: \"zlib\", \"gzip\", \"bz2\", \"lzma\", \"zipfile\" and\n\"tarfile\".\n\n >>> import zlib\n >>> s = b'witch which has which witches wrist watch'\n >>> len(s)\n 41\n >>> t = zlib.compress(s)\n >>> len(t)\n 37\n >>> zlib.decompress(t)\n b'witch which has which witches wrist watch'\n >>> zlib.crc32(s)\n 226805979\n\n10.10. Performance Measurement\n==============================\n\nSome Python users develop a deep interest in knowing the relative\nperformance of different approaches to the same problem. Python\nprovides a measurement tool that answers those questions immediately.\n\nFor example, it may be tempting to use the tuple packing and unpacking\nfeature instead of the traditional approach to swapping arguments. The\n\"timeit\" module quickly demonstrates a modest performance advantage:\n\n >>> from timeit import Timer\n >>> Timer('t=a; a=b; b=t', 'a=1; b=2').timeit()\n 0.57535828626024577\n >>> Timer('a,b = b,a', 'a=1; b=2').timeit()\n 0.54962537085770791\n\nIn contrast to \"timeit\"'s fine level of granularity, the \"profile\" and\n\"pstats\" modules provide tools for identifying time critical sections\nin larger blocks of code.\n\n10.11. Quality Control\n======================\n\nOne approach for developing high quality software is to write tests\nfor each function as it is developed and to run those tests frequently\nduring the development process.\n\nThe \"doctest\" module provides a tool for scanning a module and\nvalidating tests embedded in a program's docstrings. Test\nconstruction is as simple as cutting-and-pasting a typical call along\nwith its results into the docstring. This improves the documentation\nby providing the user with an example and it allows the doctest module\nto make sure the code remains true to the documentation:\n\n def average(values):\n \"\"\"Computes the arithmetic mean of a list of numbers.\n\n >>> print(average([20, 30, 70]))\n 40.0\n \"\"\"\n return sum(values) / len(values)\n\n import doctest\n doctest.testmod() # automatically validate the embedded tests\n\nThe \"unittest\" module is not as effortless as the \"doctest\" module,\nbut it allows a more comprehensive set of tests to be maintained in a\nseparate file:\n\n import unittest\n\n class TestStatisticalFunctions(unittest.TestCase):\n\n def test_average(self):\n self.assertEqual(average([20, 30, 70]), 40.0)\n self.assertEqual(round(average([1, 5, 7]), 1), 4.3)\n with self.assertRaises(ZeroDivisionError):\n average([])\n with self.assertRaises(TypeError):\n average(20, 30, 70)\n\n unittest.main() # Calling from the command line invokes all tests\n\n10.12. Batteries Included\n=========================\n\nPython has a \"batteries included\" philosophy. This is best seen\nthrough the sophisticated and robust capabilities of its larger\npackages. For example:\n\n* The \"xmlrpc.client\" and \"xmlrpc.server\" modules make implementing\n remote procedure calls into an almost trivial task. Despite the\n modules' names, no direct knowledge or handling of XML is needed.\n\n* The \"email\" package is a library for managing email messages,\n including MIME and other **RFC 5322**-based message documents.\n Unlike \"smtplib\" and \"poplib\" which actually send and receive\n messages, the email package has a complete toolset for building or\n decoding complex message structures (including attachments) and for\n implementing internet encoding and header protocols.\n\n* The \"json\" package provides robust support for parsing this popular\n data interchange format. The \"csv\" module supports direct reading\n and writing of files in Comma-Separated Value format, commonly\n supported by databases and spreadsheets. XML processing is\n supported by the \"xml.etree.ElementTree\", \"xml.dom\" and \"xml.sax\"\n packages. Together, these modules and packages greatly simplify data\n interchange between Python applications and other tools.\n\n* The \"sqlite3\" module is a wrapper for the SQLite database library,\n providing a persistent database that can be updated and accessed\n using slightly nonstandard SQL syntax.\n\n* Internationalization is supported by a number of modules including\n \"gettext\", \"locale\", and the \"codecs\" package.", "source": "python_docs:python-3.14-docs-text/tutorial/stdlib.txt", "domain": "software" }, { "text": "1. Whetting Your Appetite\n*************************\n\nIf you do much work on computers, eventually you find that there's\nsome task you'd like to automate. For example, you may wish to\nperform a search-and-replace over a large number of text files, or\nrename and rearrange a bunch of photo files in a complicated way.\nPerhaps you'd like to write a small custom database, or a specialized\nGUI application, or a simple game.\n\nIf you're a professional software developer, you may have to work with\nseveral C/C++/Java libraries but find the usual write/compile/test/re-\ncompile cycle is too slow. Perhaps you're writing a test suite for\nsuch a library and find writing the testing code a tedious task. Or\nmaybe you've written a program that could use an extension language,\nand you don't want to design and implement a whole new language for\nyour application.\n\nPython is just the language for you.\n\nYou could write a Unix shell script or Windows batch files for some of\nthese tasks, but shell scripts are best at moving around files and\nchanging text data, not well-suited for GUI applications or games. You\ncould write a C/C++/Java program, but it can take a lot of development\ntime to get even a first-draft program. Python is simpler to use,\navailable on Windows, macOS, and Unix operating systems, and will help\nyou get the job done more quickly.\n\nPython is simple to use, but it is a real programming language,\noffering much more structure and support for large programs than shell\nscripts or batch files can offer. On the other hand, Python also\noffers much more error checking than C, and, being a *very-high-level\nlanguage*, it has high-level data types built in, such as flexible\narrays and dictionaries. Because of its more general data types\nPython is applicable to a much larger problem domain than Awk or even\nPerl, yet many things are at least as easy in Python as in those\nlanguages.\n\nPython allows you to split your program into modules that can be\nreused in other Python programs. It comes with a large collection of\nstandard modules that you can use as the basis of your programs --- or\nas examples to start learning to program in Python. Some of these\nmodules provide things like file I/O, system calls, sockets, and even\ninterfaces to graphical user interface toolkits like Tk.\n\nPython is an interpreted language, which can save you considerable\ntime during program development because no compilation and linking is\nnecessary. The interpreter can be used interactively, which makes it\neasy to experiment with features of the language, to write throw-away\nprograms, or to test functions during bottom-up program development.\nIt is also a handy desk calculator.\n\nPython enables programs to be written compactly and readably.\nPrograms written in Python are typically much shorter than equivalent\nC, C++, or Java programs, for several reasons:\n\n* the high-level data types allow you to express complex operations in\n a single statement;\n\n* statement grouping is done by indentation instead of beginning and\n ending brackets;\n\n* no variable or argument declarations are necessary.\n\nPython is *extensible*: if you know how to program in C it is easy to\nadd a new built-in function or module to the interpreter, either to\nperform critical operations at maximum speed, or to link Python\nprograms to libraries that may only be available in binary form (such\nas a vendor-specific graphics library). Once you are really hooked,\nyou can link the Python interpreter into an application written in C\nand use it as an extension or command language for that application.\n\nBy the way, the language is named after the BBC show \"Monty Python's\nFlying Circus\" and has nothing to do with reptiles. Making references\nto Monty Python skits in documentation is not only allowed, it is\nencouraged!\n\nNow that you are all excited about Python, you'll want to examine it\nin some more detail. Since the best way to learn a language is to use\nit, the tutorial invites you to play with the Python interpreter as\nyou read.\n\nIn the next chapter, the mechanics of using the interpreter are\nexplained. This is rather mundane information, but essential for\ntrying out the examples shown later.\n\nThe rest of the tutorial introduces various features of the Python\nlanguage and system through examples, beginning with simple\nexpressions, statements and data types, through functions and modules,\nand finally touching upon advanced concepts like exceptions and user-\ndefined classes.", "source": "python_docs:python-3.14-docs-text/tutorial/appetite.txt", "domain": "software" }, { "text": "2. Using the Python Interpreter\n*******************************\n\n2.1. Invoking the Interpreter\n=============================\n\nThe Python interpreter is usually installed as\n\"/usr/local/bin/python3.14\" on those machines where it is available;\nputting \"/usr/local/bin\" in your Unix shell's search path makes it\npossible to start it by typing the command:\n\n python3.14\n\nto the shell. [1] Since the choice of the directory where the\ninterpreter lives is an installation option, other places are\npossible; check with your local Python guru or system administrator.\n(E.g., \"/usr/local/python\" is a popular alternative location.)\n\nOn Windows machines where you have installed Python from the Microsoft\nStore, the \"python3.14\" command will be available. If you have the\npy.exe launcher installed, you can use the \"py\" command. See Python\ninstall manager for other ways to launch Python.\n\nTyping an end-of-file character (\"Control\"-\"D\" on Unix, \"Control\"-\"Z\"\non Windows) at the primary prompt causes the interpreter to exit with\na zero exit status. If that doesn't work, you can exit the\ninterpreter by typing the following command: \"quit()\".\n\nThe interpreter's line-editing features include interactive editing,\nhistory substitution and code completion on most systems. Perhaps the\nquickest check to see whether command line editing is supported is\ntyping a word in on the Python prompt, then pressing Left arrow (or\n\"Control\"-\"b\"). If the cursor moves, you have command line editing;\nsee Appendix Interactive Input Editing and History Substitution for an\nintroduction to the keys. If nothing appears to happen, or if a\nsequence like \"^[[D\" or \"^B\" appears, command line editing isn't\navailable; you'll only be able to use backspace to remove characters\nfrom the current line.\n\nThe interpreter operates somewhat like the Unix shell: when called\nwith standard input connected to a tty device, it reads and executes\ncommands interactively; when called with a file name argument or with\na file as standard input, it reads and executes a *script* from that\nfile.\n\nA second way of starting the interpreter is \"python -c command [arg]\n...\", which executes the statement(s) in *command*, analogous to the\nshell's \"-c\" option. Since Python statements often contain spaces or\nother characters that are special to the shell, it is usually advised\nto quote *command* in its entirety.\n\nSome Python modules are also useful as scripts. These can be invoked\nusing \"python -m module [arg] ...\", which executes the source file for\n*module* as if you had spelled out its full name on the command line.\n\nWhen a script file is used, it is sometimes useful to be able to run\nthe script and enter interactive mode afterwards. This can be done by\npassing \"-i\" before the script.\n\nAll command line options are described in Command line and\nenvironment.\n\n2.1.1. Argument Passing\n-----------------------\n\nWhen known to the interpreter, the script name and additional\narguments thereafter are turned into a list of strings and assigned to\nthe \"argv\" variable in the \"sys\" module. You can access this list by\nexecuting \"import sys\". The length of the list is at least one; when\nno script and no arguments are given, \"sys.argv[0]\" is an empty\nstring. When the script name is given as \"'-'\" (meaning standard\ninput), \"sys.argv[0]\" is set to \"'-'\". When \"-c\" *command* is used,\n\"sys.argv[0]\" is set to \"'-c'\". When \"-m\" *module* is used,\n\"sys.argv[0]\" is set to the full name of the located module. Options\nfound after \"-c\" *command* or \"-m\" *module* are not consumed by the\nPython interpreter's option processing but left in \"sys.argv\" for the\ncommand or module to handle.\n\n2.1.2. Interactive Mode\n-----------------------\n\nWhen commands are read from a tty, the interpreter is said to be in\n*interactive mode*. In this mode it prompts for the next command with\nthe *primary prompt*, usually three greater-than signs (\">>>\"); for\ncontinuation lines it prompts with the *secondary prompt*, by default\nthree dots (\"...\"). The interpreter prints a welcome message stating\nits version number and a copyright notice before printing the first\nprompt:\n\n $ python3.14\n Python 3.14 (default, April 4 2024, 09:25:04)\n [GCC 10.2.0] on linux\n Type \"help\", \"copyright\", \"credits\" or \"license\" for more information.\n >>>\n\nContinuation lines are needed when entering a multi-line construct. As\nan example, take a look at this \"if\" statement:\n\n >>> the_world_is_flat = True\n >>> if the_world_is_flat:\n ... print(\"Be careful not to fall off!\")\n ...\n Be careful not to fall off!\n\nFor more on interactive mode, see Interactive Mode.\n\n2.2. The Interpreter and Its Environment\n========================================\n\n2.2.1. Source Code Encoding\n---------------------------\n\nBy default, Python source files are treated as encoded in UTF-8. In\nthat encoding, characters of most languages in the world can be used\nsimultaneously in string literals, identifiers and comments ---\nalthough the standard library only uses ASCII characters for\nidentifiers, a convention that any portable code should follow. To\ndisplay all these characters properly, your editor must recognize that\nthe file is UTF-8, and it must use a font that supports all the\ncharacters in the file.\n\nTo declare an encoding other than the default one, a special comment\nline should be added as the *first* line of the file. The syntax is\nas follows:\n\n # -*- coding: encoding -*-\n\nwhere *encoding* is one of the valid \"codecs\" supported by Python.\n\nFor example, to declare that Windows-1252 encoding is to be used, the\nfirst line of your source code file should be:\n\n # -*- coding: cp1252 -*-\n\nOne exception to the *first line* rule is when the source code starts\nwith a UNIX \"shebang\" line. In this case, the encoding declaration\nshould be added as the second line of the file. For example:\n\n #!/usr/bin/env python3\n # -*- coding: cp1252 -*-\n\n-[ Footnotes ]-\n\n[1] On Unix, the Python 3.x interpreter is by default not installed\n with the executable named \"python\", so that it does not conflict\n with a simultaneously installed Python 2.x executable.", "source": "python_docs:python-3.14-docs-text/tutorial/interpreter.txt", "domain": "software" }, { "text": "3. An Informal Introduction to Python\n*************************************\n\nIn the following examples, input and output are distinguished by the\npresence or absence of prompts (*>>>* and *...*): to repeat the\nexample, you must type everything after the prompt, when the prompt\nappears; lines that do not begin with a prompt are output from the\ninterpreter. Note that a secondary prompt on a line by itself in an\nexample means you must type a blank line; this is used to end a multi-\nline command.\n\nMany of the examples in this manual, even those entered at the\ninteractive prompt, include comments. Comments in Python start with\nthe hash character, \"#\", and extend to the end of the physical line.\nA comment may appear at the start of a line or following whitespace or\ncode, but not within a string literal. A hash character within a\nstring literal is just a hash character. Since comments are to clarify\ncode and are not interpreted by Python, they may be omitted when\ntyping in examples.\n\nSome examples:\n\n # this is the first comment\n spam = 1 # and this is the second comment\n # ... and now a third!\n text = \"# This is not a comment because it's inside quotes.\"\n\n3.1. Using Python as a Calculator\n=================================\n\nLet's try some simple Python commands. Start the interpreter and wait\nfor the primary prompt, \">>>\". (It shouldn't take long.)\n\n3.1.1. Numbers\n--------------\n\nThe interpreter acts as a simple calculator: you can type an\nexpression into it and it will write the value. Expression syntax is\nstraightforward: the operators \"+\", \"-\", \"*\" and \"/\" can be used to\nperform arithmetic; parentheses (\"()\") can be used for grouping. For\nexample:\n\n >>> 2 + 2\n 4\n >>> 50 - 5*6\n 20\n >>> (50 - 5*6) / 4\n 5.0\n >>> 8 / 5 # division always returns a floating-point number\n 1.6\n\nThe integer numbers (e.g. \"2\", \"4\", \"20\") have type \"int\", the ones\nwith a fractional part (e.g. \"5.0\", \"1.6\") have type \"float\". We will\nsee more about numeric types later in the tutorial.\n\nDivision (\"/\") always returns a float. To do *floor division* and get\nan integer result you can use the \"//\" operator; to calculate the\nremainder you can use \"%\":\n\n >>> 17 / 3 # classic division returns a float\n 5.666666666666667\n >>>\n >>> 17 // 3 # floor division discards the fractional part\n 5\n >>> 17 % 3 # the % operator returns the remainder of the division\n 2\n >>> 5 * 3 + 2 # floored quotient * divisor + remainder\n 17\n\nWith Python, it is possible to use the \"**\" operator to calculate\npowers [1]:\n\n >>> 5 ** 2 # 5 squared\n 25\n >>> 2 ** 7 # 2 to the power of 7\n 128\n\nThe equal sign (\"=\") is used to assign a value to a variable.\nAfterwards, no result is displayed before the next interactive prompt:\n\n >>> width = 20\n >>> height = 5 * 9\n >>> width * height\n 900\n\nIf a variable is not \"defined\" (assigned a value), trying to use it\nwill give you an error:\n\n >>> n # try to access an undefined variable\n Traceback (most recent call last):\n File \"\", line 1, in \n NameError: name 'n' is not defined\n\nThere is full support for floating point; operators with mixed type\noperands convert the integer operand to floating point:\n\n >>> 4 * 3.75 - 1\n 14.0\n\nIn interactive mode, the last printed expression is assigned to the\nvariable \"_\". This means that when you are using Python as a desk\ncalculator, it is somewhat easier to continue calculations, for\nexample:\n\n >>> tax = 12.5 / 100\n >>> price = 100.50\n >>> price * tax\n 12.5625\n >>> price + _\n 113.0625\n >>> round(_, 2)\n 113.06\n\nThis variable should be treated as read-only by the user. Don't\nexplicitly assign a value to it --- you would create an independent\nlocal variable with the same name masking the built-in variable with\nits magic behavior.\n\nIn addition to \"int\" and \"float\", Python supports other types of\nnumbers, such as \"Decimal\" and \"Fraction\". Python also has built-in\nsupport for complex numbers, and uses the \"j\" or \"J\" suffix to\nindicate the imaginary part (e.g. \"3+5j\").\n\n3.1.2. Text\n-----------\n\nPython can manipulate text (represented by type \"str\", so-called\n\"strings\") as well as numbers. This includes characters \"\"!\"\", words\n\"\"rabbit\"\", names \"\"Paris\"\", sentences \"\"Got your back.\"\", etc. \"\"Yay!\n:)\"\". They can be enclosed in single quotes (\"'...'\") or double quotes\n(\"\"...\"\") with the same result [2].\n\n >>> 'spam eggs' # single quotes\n 'spam eggs'\n >>> \"Paris rabbit got your back :)! Yay!\" # double quotes\n 'Paris rabbit got your back :)! Yay!'\n >>> '1975' # digits and numerals enclosed in quotes are also strings\n '1975'\n\nTo quote a quote, we need to \"escape\" it, by preceding it with \"\\\".\nAlternatively, we can use the other type of quotation marks:\n\n >>> 'doesn\\'t' # use \\' to escape the single quote...\n \"doesn't\"\n >>> \"doesn't\" # ...or use double quotes instead\n \"doesn't\"\n >>> '\"Yes,\" they said.'\n '\"Yes,\" they said.'\n >>> \"\\\"Yes,\\\" they said.\"\n '\"Yes,\" they said.'\n >>> '\"Isn\\'t,\" they said.'\n '\"Isn\\'t,\" they said.'\n\nIn the Python shell, the string definition and output string can look\ndifferent. The \"print()\" function produces a more readable output, by\nomitting the enclosing quotes and by printing escaped and special\ncharacters:\n\n >>> s = 'First line.\\nSecond line.' # \\n means newline\n >>> s # without print(), special characters are included in the string\n 'First line.\\nSecond line.'\n >>> print(s) # with print(), special characters are interpreted, so \\n produces new line\n First line.\n Second line.\n\nIf you don't want characters prefaced by \"\\\" to be interpreted as\nspecial characters, you can use *raw strings* by adding an \"r\" before\nthe first quote:\n\n >>> print('C:\\this\\name') # here \\t means tab, \\n means newline\n C: his\n ame\n >>> print(r'C:\\this\\name') # note the r before the quote\n C:\\this\\name\n\nThere is one subtle aspect to raw strings: a raw string may not end in\nan odd number of \"\\\" characters; see the FAQ entry for more\ninformation and workarounds.\n\nString literals can span multiple lines. One way is using triple-\nquotes: \"\"\"\"...\"\"\"\" or \"'''...'''\". End-of-line characters are\nautomatically included in the string, but it's possible to prevent\nthis by adding a \"\\\" at the end of the line. In the following\nexample, the initial newline is not included:\n\n >>> print(\"\"\"\\\n ... Usage: thingy [OPTIONS]\n ... -h Display this usage message\n ... -H hostname Hostname to connect to\n ... \"\"\")\n Usage: thingy [OPTIONS]\n -h Display this usage message\n -H hostname Hostname to connect to\n\n >>>\n\nStrings can be concatenated (glued together) with the \"+\" operator,\nand repeated with \"*\":\n\n >>> # 3 times 'un', followed by 'ium'\n >>> 3 * 'un' + 'ium'\n 'unununium'\n\nTwo or more *string literals* (i.e. the ones enclosed between quotes)\nnext to each other are automatically concatenated.\n\n >>> 'Py' 'thon'\n 'Python'\n\nThis feature is particularly useful when you want to break long\nstrings:\n\n >>> text = ('Put several strings within parentheses '\n ... 'to have them joined together.')\n >>> text\n 'Put several strings within parentheses to have them joined together.'\n\nThis only works with two literals though, not with variables or\nexpressions:\n\n >>> prefix = 'Py'\n >>> prefix 'thon' # can't concatenate a variable and a string literal\n File \"\", line 1\n prefix 'thon'\n ^^^^^^\n SyntaxError: invalid syntax\n >>> ('un' * 3) 'ium'\n File \"\", line 1\n ('un' * 3) 'ium'\n ^^^^^\n SyntaxError: invalid syntax\n\nIf you want to concatenate variables or a variable and a literal, use\n\"+\":\n\n >>> prefix + 'thon'\n 'Python'\n\nStrings can be *indexed* (subscripted), with the first character\nhaving index 0. There is no separate character type; a character is\nsimply a string of size one:\n\n >>> word = 'Python'\n >>> word[0] # character in position 0\n 'P'\n >>> word[5] # character in position 5\n 'n'\n\nIndices may also be negative numbers, to start counting from the\nright:\n\n >>> word[-1] # last character\n 'n'\n >>> word[-2] # second-last character\n 'o'\n >>> word[-6]\n 'P'\n\nNote that since -0 is the same as 0, negative indices start from -1.\n\nIn addition to indexing, *slicing* is also supported. While indexing\nis used to obtain individual characters, *slicing* allows you to\nobtain a substring:\n\n >>> word[0:2] # characters from position 0 (included) to 2 (excluded)\n 'Py'\n >>> word[2:5] # characters from position 2 (included) to 5 (excluded)\n 'tho'\n\nSlice indices have useful defaults; an omitted first index defaults to\nzero, an omitted second index defaults to the size of the string being\nsliced.\n\n >>> word[:2] # character from the beginning to position 2 (excluded)\n 'Py'\n >>> word[4:] # characters from position 4 (included) to the end\n 'on'\n >>> word[-2:] # characters from the second-last (included) to the end\n 'on'\n\nNote how the start is always included, and the end always excluded.\nThis makes sure that \"s[:i] + s[i:]\" is always equal to \"s\":\n\n >>> word[:2] + word[2:]\n 'Python'\n >>> word[:4] + word[4:]\n 'Python'\n\nOne way to remember how slices work is to think of the indices as\npointing *between* characters, with the left edge of the first\ncharacter numbered 0. Then the right edge of the last character of a\nstring of *n* characters has index *n*, for example:\n\n +---+---+---+---+---+---+\n | P | y | t | h | o | n |\n +---+---+---+---+---+---+\n 0 1 2 3 4 5 6\n -6 -5 -4 -3 -2 -1\n\nThe first row of numbers gives the position of the indices 0...6 in\nthe string; the second row gives the corresponding negative indices.\nThe slice from *i* to *j* consists of all characters between the edges\nlabeled *i* and *j*, respectively.\n\nFor non-negative indices, the length of a slice is the difference of\nthe indices, if both are within bounds. For example, the length of\n\"word[1:3]\" is 2.\n\nAttempting to use an index that is too large will result in an error:\n\n >>> word[42] # the word only has 6 characters\n Traceback (most recent call last):\n File \"\", line 1, in \n IndexError: string index out of range\n\nHowever, out of range slice indexes are handled gracefully when used\nfor slicing:\n\n >>> word[4:42]\n 'on'\n >>> word[42:]\n ''\n\nPython strings cannot be changed --- they are *immutable*. Therefore,\nassigning to an indexed position in the string results in an error:\n\n >>> word[0] = 'J'\n Traceback (most recent call last):\n File \"\", line 1, in \n TypeError: 'str' object does not support item assignment\n >>> word[2:] = 'py'\n Traceback (most recent call last):\n File \"\", line 1, in \n TypeError: 'str' object does not support item assignment\n\nIf you need a different string, you should create a new one:\n\n >>> 'J' + word[1:]\n 'Jython'\n >>> word[:2] + 'py'\n 'Pypy'\n\nThe built-in function \"len()\" returns the length of a string:\n\n >>> s = 'supercalifragilisticexpialidocious'\n >>> len(s)\n 34\n\nSee also:\n\n Text Sequence Type --- str\n Strings are examples of *sequence types*, and support the common\n operations supported by such types.\n\n String Methods\n Strings support a large number of methods for basic\n transformations and searching.\n\n f-strings\n String literals that have embedded expressions.\n\n Format String Syntax\n Information about string formatting with \"str.format()\".\n\n printf-style String Formatting\n The old formatting operations invoked when strings are the left\n operand of the \"%\" operator are described in more detail here.\n\n3.1.3. Lists\n------------\n\nPython knows a number of *compound* data types, used to group together\nother values. The most versatile is the *list*, which can be written\nas a list of comma-separated values (items) between square brackets.\nLists might contain items of different types, but usually the items\nall have the same type.\n\n >>> squares = [1, 4, 9, 16, 25]\n >>> squares\n [1, 4, 9, 16, 25]\n\nLike strings (and all other built-in *sequence* types), lists can be\nindexed and sliced:\n\n >>> squares[0] # indexing returns the item\n 1\n >>> squares[-1]\n 25\n >>> squares[-3:] # slicing returns a new list\n [9, 16, 25]\n\nLists also support operations like concatenation:\n\n >>> squares + [36, 49, 64, 81, 100]\n [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]\n\nUnlike strings, which are *immutable*, lists are a *mutable* type,\ni.e. it is possible to change their content:\n\n >>> cubes = [1, 8, 27, 65, 125] # something's wrong here\n >>> 4 ** 3 # the cube of 4 is 64, not 65!\n 64\n >>> cubes[3] = 64 # replace the wrong value\n >>> cubes\n [1, 8, 27, 64, 125]\n\nYou can also add new items at the end of the list, by using the\n\"list.append()\" *method* (we will see more about methods later):\n\n >>> cubes.append(216) # add the cube of 6\n >>> cubes.append(7 ** 3) # and the cube of 7\n >>> cubes\n [1, 8, 27, 64, 125, 216, 343]\n\nSimple assignment in Python never copies data. When you assign a list\nto a variable, the variable refers to the *existing list*. Any changes\nyou make to the list through one variable will be seen through all\nother variables that refer to it.:\n\n >>> rgb = [\"Red\", \"Green\", \"Blue\"]\n >>> rgba = rgb\n >>> id(rgb) == id(rgba) # they reference the same object\n True\n >>> rgba.append(\"Alph\")\n >>> rgb\n [\"Red\", \"Green\", \"Blue\", \"Alph\"]\n\nAll slice operations return a new list containing the requested\nelements. This means that the following slice returns a shallow copy\nof the list:\n\n >>> correct_rgba = rgba[:]\n >>> correct_rgba[-1] = \"Alpha\"\n >>> correct_rgba\n [\"Red\", \"Green\", \"Blue\", \"Alpha\"]\n >>> rgba\n [\"Red\", \"Green\", \"Blue\", \"Alph\"]\n\nAssignment to slices is also possible, and this can even change the\nsize of the list or clear it entirely:\n\n >>> letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g']\n >>> letters\n ['a', 'b', 'c', 'd', 'e', 'f', 'g']\n >>> # replace some values\n >>> letters[2:5] = ['C', 'D', 'E']\n >>> letters\n ['a', 'b', 'C', 'D', 'E', 'f', 'g']\n >>> # now remove them\n >>> letters[2:5] = []\n >>> letters\n ['a', 'b', 'f', 'g']\n >>> # clear the list by replacing all the elements with an empty list\n >>> letters[:] = []\n >>> letters\n []\n\nThe built-in function \"len()\" also applies to lists:\n\n >>> letters = ['a', 'b', 'c', 'd']\n >>> len(letters)\n 4\n\nIt is possible to nest lists (create lists containing other lists),\nfor example:\n\n >>> a = ['a', 'b', 'c']\n >>> n = [1, 2, 3]\n >>> x = [a, n]\n >>> x\n [['a', 'b', 'c'], [1, 2, 3]]\n >>> x[0]\n ['a', 'b', 'c']\n >>> x[0][1]\n 'b'\n\n3.2. First Steps Towards Programming\n====================================\n\nOf course, we can use Python for more complicated tasks than adding\ntwo and two together. For instance, we can write an initial sub-\nsequence of the Fibonacci series as follows:\n\n >>> # Fibonacci series:\n >>> # the sum of two elements defines the next\n >>> a, b = 0, 1\n >>> while a < 10:\n ... print(a)\n ... a, b = b, a+b\n ...\n 0\n 1\n 1\n 2\n 3\n 5\n 8\n\nThis example introduces several new features.\n\n* The first line contains a *multiple assignment*: the variables \"a\"\n and \"b\" simultaneously get the new values 0 and 1. On the last line\n this is used again, demonstrating that the expressions on the right-\n hand side are all evaluated first before any of the assignments take\n place. The right-hand side expressions are evaluated from the left\n to the right.\n\n* The \"while\" loop executes as long as the condition (here: \"a < 10\")\n remains true. In Python, like in C, any non-zero integer value is\n true; zero is false. The condition may also be a string or list\n value, in fact any sequence; anything with a non-zero length is\n true, empty sequences are false. The test used in the example is a\n simple comparison. The standard comparison operators are written\n the same as in C: \"<\" (less than), \">\" (greater than), \"==\" (equal\n to), \"<=\" (less than or equal to), \">=\" (greater than or equal to)\n and \"!=\" (not equal to).\n\n* The *body* of the loop is *indented*: indentation is Python's way of\n grouping statements. At the interactive prompt, you have to type a\n tab or space(s) for each indented line. In practice you will\n prepare more complicated input for Python with a text editor; all\n decent text editors have an auto-indent facility. When a compound\n statement is entered interactively, it must be followed by a blank\n line to indicate completion (since the parser cannot guess when you\n have typed the last line). Note that each line within a basic block\n must be indented by the same amount.\n\n* The \"print()\" function writes the value of the argument(s) it is\n given. It differs from just writing the expression you want to write\n (as we did earlier in the calculator examples) in the way it handles\n multiple arguments, floating-point quantities, and strings. Strings\n are printed without quotes, and a space is inserted between items,\n so you can format things nicely, like this:\n\n >>> i = 256*256\n >>> print('The value of i is', i)\n The value of i is 65536\n\n The keyword argument *end* can be used to avoid the newline after\n the output, or end the output with a different string:\n\n >>> a, b = 0, 1\n >>> while a < 1000:\n ... print(a, end=',')\n ... a, b = b, a+b\n ...\n 0,1,1,2,3,5,8,13,21,34,55,89,144,233,377,610,987,\n\n-[ Footnotes ]-\n\n[1] Since \"**\" has higher precedence than \"-\", \"-3**2\" will be\n interpreted as \"-(3**2)\" and thus result in \"-9\". To avoid this\n and get \"9\", you can use \"(-3)**2\".\n\n[2] Unlike other languages, special characters such as \"\\n\" have the\n same meaning with both single (\"'...'\") and double (\"\"...\"\")\n quotes. The only difference between the two is that within single\n quotes you don't need to escape \"\"\" (but you have to escape \"\\'\")\n and vice versa.", "source": "python_docs:python-3.14-docs-text/tutorial/introduction.txt", "domain": "software" }, { "text": "11. Brief Tour of the Standard Library --- Part II\n**************************************************\n\nThis second tour covers more advanced modules that support\nprofessional programming needs. These modules rarely occur in small\nscripts.\n\n11.1. Output Formatting\n=======================\n\nThe \"reprlib\" module provides a version of \"repr()\" customized for\nabbreviated displays of large or deeply nested containers:\n\n >>> import reprlib\n >>> reprlib.repr(set('supercalifragilisticexpialidocious'))\n \"{'a', 'c', 'd', 'e', 'f', 'g', ...}\"\n\nThe \"pprint\" module offers more sophisticated control over printing\nboth built-in and user defined objects in a way that is readable by\nthe interpreter. When the result is longer than one line, the \"pretty\nprinter\" adds line breaks and indentation to more clearly reveal data\nstructure:\n\n >>> import pprint\n >>> t = [[[['black', 'cyan'], 'white', ['green', 'red']], [['magenta',\n ... 'yellow'], 'blue']]]\n ...\n >>> pprint.pprint(t, width=30)\n [[[['black', 'cyan'],\n 'white',\n ['green', 'red']],\n [['magenta', 'yellow'],\n 'blue']]]\n\nThe \"textwrap\" module formats paragraphs of text to fit a given screen\nwidth:\n\n >>> import textwrap\n >>> doc = \"\"\"The wrap() method is just like fill() except that it returns\n ... a list of strings instead of one big string with newlines to separate\n ... the wrapped lines.\"\"\"\n ...\n >>> print(textwrap.fill(doc, width=40))\n The wrap() method is just like fill()\n except that it returns a list of strings\n instead of one big string with newlines\n to separate the wrapped lines.\n\nThe \"locale\" module accesses a database of culture specific data\nformats. The grouping attribute of locale's format function provides a\ndirect way of formatting numbers with group separators:\n\n >>> import locale\n >>> locale.setlocale(locale.LC_ALL, 'English_United States.1252')\n 'English_United States.1252'\n >>> conv = locale.localeconv() # get a mapping of conventions\n >>> x = 1234567.8\n >>> locale.format_string(\"%d\", x, grouping=True)\n '1,234,567'\n >>> locale.format_string(\"%s%.*f\", (conv['currency_symbol'],\n ... conv['frac_digits'], x), grouping=True)\n '$1,234,567.80'\n\n11.2. Templating\n================\n\nThe \"string\" module includes a versatile \"Template\" class with a\nsimplified syntax suitable for editing by end-users. This allows\nusers to customize their applications without having to alter the\napplication.\n\nThe format uses placeholder names formed by \"$\" with valid Python\nidentifiers (alphanumeric characters and underscores). Surrounding\nthe placeholder with braces allows it to be followed by more\nalphanumeric letters with no intervening spaces. Writing \"$$\" creates\na single escaped \"$\":\n\n >>> from string import Template\n >>> t = Template('${village}folk send $$10 to $cause.')\n >>> t.substitute(village='Nottingham', cause='the ditch fund')\n 'Nottinghamfolk send $10 to the ditch fund.'\n\nThe \"substitute()\" method raises a \"KeyError\" when a placeholder is\nnot supplied in a dictionary or a keyword argument. For mail-merge\nstyle applications, user supplied data may be incomplete and the\n\"safe_substitute()\" method may be more appropriate --- it will leave\nplaceholders unchanged if data is missing:\n\n >>> t = Template('Return the $item to $owner.')\n >>> d = dict(item='unladen swallow')\n >>> t.substitute(d)\n Traceback (most recent call last):\n ...\n KeyError: 'owner'\n >>> t.safe_substitute(d)\n 'Return the unladen swallow to $owner.'\n\nTemplate subclasses can specify a custom delimiter. For example, a\nbatch renaming utility for a photo browser may elect to use percent\nsigns for placeholders such as the current date, image sequence\nnumber, or file format:\n\n >>> import time, os.path\n >>> photofiles = ['img_1074.jpg', 'img_1076.jpg', 'img_1077.jpg']\n >>> class BatchRename(Template):\n ... delimiter = '%'\n ...\n >>> fmt = input('Enter rename style (%d-date %n-seqnum %f-format): ')\n Enter rename style (%d-date %n-seqnum %f-format): Ashley_%n%f\n\n >>> t = BatchRename(fmt)\n >>> date = time.strftime('%d%b%y')\n >>> for i, filename in enumerate(photofiles):\n ... base, ext = os.path.splitext(filename)\n ... newname = t.substitute(d=date, n=i, f=ext)\n ... print('{0} --> {1}'.format(filename, newname))\n\n img_1074.jpg --> Ashley_0.jpg\n img_1076.jpg --> Ashley_1.jpg\n img_1077.jpg --> Ashley_2.jpg\n\nAnother application for templating is separating program logic from\nthe details of multiple output formats. This makes it possible to\nsubstitute custom templates for XML files, plain text reports, and\nHTML web reports.\n\n11.3. Working with Binary Data Record Layouts\n=============================================\n\nThe \"struct\" module provides \"pack()\" and \"unpack()\" functions for\nworking with variable length binary record formats. The following\nexample shows how to loop through header information in a ZIP file\nwithout using the \"zipfile\" module. Pack codes \"\"H\"\" and \"\"I\"\"\nrepresent two and four byte unsigned numbers respectively. The \"\"<\"\"\nindicates that they are standard size and in little-endian byte order:\n\n import struct\n\n with open('myfile.zip', 'rb') as f:\n data = f.read()\n\n start = 0\n for i in range(3): # show the first 3 file headers\n start += 14\n fields = struct.unpack('>> import weakref, gc\n >>> class A:\n ... def __init__(self, value):\n ... self.value = value\n ... def __repr__(self):\n ... return str(self.value)\n ...\n >>> a = A(10) # create a reference\n >>> d = weakref.WeakValueDictionary()\n >>> d['primary'] = a # does not create a reference\n >>> d['primary'] # fetch the object if it is still alive\n 10\n >>> del a # remove the one reference\n >>> gc.collect() # run garbage collection right away\n 0\n >>> d['primary'] # entry was automatically removed\n Traceback (most recent call last):\n File \"\", line 1, in \n d['primary'] # entry was automatically removed\n File \"C:/python314/lib/weakref.py\", line 46, in __getitem__\n o = self.data[key]()\n KeyError: 'primary'\n\n11.7. Tools for Working with Lists\n==================================\n\nMany data structure needs can be met with the built-in list type.\nHowever, sometimes there is a need for alternative implementations\nwith different performance trade-offs.\n\nThe \"array\" module provides an \"array\" object that is like a list that\nstores only homogeneous data and stores it more compactly. The\nfollowing example shows an array of numbers stored as two byte\nunsigned binary numbers (typecode \"\"H\"\") rather than the usual 16\nbytes per entry for regular lists of Python int objects:\n\n >>> from array import array\n >>> a = array('H', [4000, 10, 700, 22222])\n >>> sum(a)\n 26932\n >>> a[1:3]\n array('H', [10, 700])\n\nThe \"collections\" module provides a \"deque\" object that is like a list\nwith faster appends and pops from the left side but slower lookups in\nthe middle. These objects are well suited for implementing queues and\nbreadth first tree searches:\n\n >>> from collections import deque\n >>> d = deque([\"task1\", \"task2\", \"task3\"])\n >>> d.append(\"task4\")\n >>> print(\"Handling\", d.popleft())\n Handling task1\n\n unsearched = deque([starting_node])\n def breadth_first_search(unsearched):\n node = unsearched.popleft()\n for m in gen_moves(node):\n if is_goal(m):\n return m\n unsearched.append(m)\n\nIn addition to alternative list implementations, the library also\noffers other tools such as the \"bisect\" module with functions for\nmanipulating sorted lists:\n\n >>> import bisect\n >>> scores = [(100, 'perl'), (200, 'tcl'), (400, 'lua'), (500, 'python')]\n >>> bisect.insort(scores, (300, 'ruby'))\n >>> scores\n [(100, 'perl'), (200, 'tcl'), (300, 'ruby'), (400, 'lua'), (500, 'python')]\n\nThe \"heapq\" module provides functions for implementing heaps based on\nregular lists. The lowest valued entry is always kept at position\nzero. This is useful for applications which repeatedly access the\nsmallest element but do not want to run a full list sort:\n\n >>> from heapq import heapify, heappop, heappush\n >>> data = [1, 3, 5, 7, 9, 2, 4, 6, 8, 0]\n >>> heapify(data) # rearrange the list into heap order\n >>> heappush(data, -5) # add a new entry\n >>> [heappop(data) for i in range(3)] # fetch the three smallest entries\n [-5, 0, 1]\n\n11.8. Decimal Floating-Point Arithmetic\n=======================================\n\nThe \"decimal\" module offers a \"Decimal\" datatype for decimal floating-\npoint arithmetic. Compared to the built-in \"float\" implementation of\nbinary floating point, the class is especially helpful for\n\n* financial applications and other uses which require exact decimal\n representation,\n\n* control over precision,\n\n* control over rounding to meet legal or regulatory requirements,\n\n* tracking of significant decimal places, or\n\n* applications where the user expects the results to match\n calculations done by hand.\n\nFor example, calculating a 5% tax on a 70 cent phone charge gives\ndifferent results in decimal floating point and binary floating point.\nThe difference becomes significant if the results are rounded to the\nnearest cent:\n\n >>> from decimal import *\n >>> round(Decimal('0.70') * Decimal('1.05'), 2)\n Decimal('0.74')\n >>> round(.70 * 1.05, 2)\n 0.73\n\nThe \"Decimal\" result keeps a trailing zero, automatically inferring\nfour place significance from multiplicands with two place\nsignificance. Decimal reproduces mathematics as done by hand and\navoids issues that can arise when binary floating point cannot exactly\nrepresent decimal quantities.\n\nExact representation enables the \"Decimal\" class to perform modulo\ncalculations and equality tests that are unsuitable for binary\nfloating point:\n\n >>> Decimal('1.00') % Decimal('.10')\n Decimal('0.00')\n >>> 1.00 % 0.10\n 0.09999999999999995\n\n >>> sum([Decimal('0.1')]*10) == Decimal('1.0')\n True\n >>> 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 == 1.0\n False\n\nThe \"decimal\" module provides arithmetic with as much precision as\nneeded:\n\n >>> getcontext().prec = 36\n >>> Decimal(1) / Decimal(7)\n Decimal('0.142857142857142857142857142857142857')", "source": "python_docs:python-3.14-docs-text/tutorial/stdlib2.txt", "domain": "software" }, { "text": "13. What Now?\n*************\n\nReading this tutorial has probably reinforced your interest in using\nPython --- you should be eager to apply Python to solving your real-\nworld problems. Where should you go to learn more?\n\nThis tutorial is part of Python's documentation set. Some other\ndocuments in the set are:\n\n* The Python Standard Library:\n\n You should browse through this manual, which gives complete (though\n terse) reference material about types, functions, and the modules in\n the standard library. The standard Python distribution includes a\n *lot* of additional code. There are modules to read Unix mailboxes,\n retrieve documents via HTTP, generate random numbers, parse command-\n line options, compress data, and many other tasks. Skimming through\n the Library Reference will give you an idea of what's available.\n\n* Installing Python Modules explains how to install additional modules\n written by other Python users.\n\n* The Python Language Reference: A detailed explanation of Python's\n syntax and semantics. It's heavy reading, but is useful as a\n complete guide to the language itself.\n\nMore Python resources:\n\n* https://www.python.org: The major Python website. It contains\n code, documentation, and pointers to Python-related pages around the\n web.\n\n* https://docs.python.org: Fast access to Python's documentation.\n\n* https://pypi.org: The Python Package Index, previously also\n nicknamed the Cheese Shop [1], is an index of user-created Python\n modules that are available for download. Once you begin releasing\n code, you can register it here so that others can find it.\n\n* https://code.activestate.com/recipes/langs/python/: The Python\n Cookbook is a sizable collection of code examples, larger modules,\n and useful scripts. Particularly notable contributions are collected\n in a book also titled Python Cookbook (O'Reilly & Associates, ISBN\n 0-596-00797-3.)\n\n* https://pyvideo.org collects links to Python-related videos from\n conferences and user-group meetings.\n\n* https://scipy.org: The Scientific Python project includes modules\n for fast array computations and manipulations plus a host of\n packages for such things as linear algebra, Fourier transforms, non-\n linear solvers, random number distributions, statistical analysis\n and the like.\n\nFor Python-related questions and problem reports, you can post to the\nnewsgroup *comp.lang.python*, or send them to the mailing list at\npython-list@python.org. The newsgroup and mailing list are gatewayed,\nso messages posted to one will automatically be forwarded to the\nother. There are hundreds of postings a day, asking (and answering)\nquestions, suggesting new features, and announcing new modules.\nMailing list archives are available at\nhttps://mail.python.org/pipermail/.\n\nBefore posting, be sure to check the list of Frequently Asked\nQuestions (also called the FAQ). The FAQ answers many of the\nquestions that come up again and again, and may already contain the\nsolution for your problem.\n\n-[ Footnotes ]-\n\n[1] \"Cheese Shop\" is a Monty Python sketch: a customer enters a cheese\n shop, but whatever cheese he asks for, the clerk says it's\n missing.", "source": "python_docs:python-3.14-docs-text/tutorial/whatnow.txt", "domain": "software" }, { "text": "16. Appendix\n************\n\n16.1. Interactive Mode\n======================\n\nThere are two variants of the interactive *REPL*. The classic basic\ninterpreter is supported on all platforms with minimal line control\ncapabilities.\n\nSince Python 3.13, a new interactive shell is used by default. This\none supports color, multiline editing, history browsing, and paste\nmode. To disable color, see Controlling color for details. Function\nkeys provide some additional functionality. \"F1\" enters the\ninteractive help browser \"pydoc\". \"F2\" allows for browsing command-\nline history with neither output nor the *>>>* and *...* prompts. \"F3\"\nenters \"paste mode\", which makes pasting larger blocks of code easier.\nPress \"F3\" to return to the regular prompt.\n\nWhen using the new interactive shell, exit the shell by typing \"exit\"\nor \"quit\". Adding call parentheses after those commands is not\nrequired.\n\nIf the new interactive shell is not desired, it can be disabled via\nthe \"PYTHON_BASIC_REPL\" environment variable.\n\n16.1.1. Error Handling\n----------------------\n\nWhen an error occurs, the interpreter prints an error message and a\nstack trace. In interactive mode, it then returns to the primary\nprompt; when input came from a file, it exits with a nonzero exit\nstatus after printing the stack trace. (Exceptions handled by an\n\"except\" clause in a \"try\" statement are not errors in this context.)\nSome errors are unconditionally fatal and cause an exit with a nonzero\nexit status; this applies to internal inconsistencies and some cases\nof running out of memory. All error messages are written to the\nstandard error stream; normal output from executed commands is written\nto standard output.\n\nTyping the interrupt character (usually \"Control\"-\"C\" or \"Delete\") to\nthe primary or secondary prompt cancels the input and returns to the\nprimary prompt. [1] Typing an interrupt while a command is executing\nraises the \"KeyboardInterrupt\" exception, which may be handled by a\n\"try\" statement.\n\n16.1.2. Executable Python Scripts\n---------------------------------\n\nOn BSD'ish Unix systems, Python scripts can be made directly\nexecutable, like shell scripts, by putting the line\n\n #!/usr/bin/env python3\n\n(assuming that the interpreter is on the user's \"PATH\") at the\nbeginning of the script and giving the file an executable mode. The\n\"#!\" must be the first two characters of the file. On some platforms,\nthis first line must end with a Unix-style line ending (\"'\\n'\"), not a\nWindows (\"'\\r\\n'\") line ending. Note that the hash, or pound,\ncharacter, \"'#'\", is used to start a comment in Python.\n\nThe script can be given an executable mode, or permission, using the\n**chmod** command.\n\n $ chmod +x myscript.py\n\nOn Windows systems, there is no notion of an \"executable mode\". The\nPython installer automatically associates \".py\" files with\n\"python.exe\" so that a double-click on a Python file will run it as a\nscript. The extension can also be \".pyw\", in that case, the console\nwindow that normally appears is suppressed.\n\n16.1.3. The Interactive Startup File\n------------------------------------\n\nWhen you use Python interactively, it is frequently handy to have some\nstandard commands executed every time the interpreter is started. You\ncan do this by setting an environment variable named \"PYTHONSTARTUP\"\nto the name of a file containing your start-up commands. This is\nsimilar to the \".profile\" feature of the Unix shells.\n\nThis file is only read in interactive sessions, not when Python reads\ncommands from a script, and not when \"/dev/tty\" is given as the\nexplicit source of commands (which otherwise behaves like an\ninteractive session). It is executed in the same namespace where\ninteractive commands are executed, so that objects that it defines or\nimports can be used without qualification in the interactive session.\nYou can also change the prompts \"sys.ps1\" and \"sys.ps2\" in this file.\n\nIf you want to read an additional start-up file from the current\ndirectory, you can program this in the global start-up file using code\nlike \"if os.path.isfile('.pythonrc.py'):\nexec(open('.pythonrc.py').read())\". If you want to use the startup\nfile in a script, you must do this explicitly in the script:\n\n import os\n filename = os.environ.get('PYTHONSTARTUP')\n if filename and os.path.isfile(filename):\n with open(filename) as fobj:\n startup_file = fobj.read()\n exec(startup_file)\n\n16.1.4. The Customization Modules\n---------------------------------\n\nPython provides two hooks to let you customize it: sitecustomize and\nusercustomize. To see how it works, you need first to find the\nlocation of your user site-packages directory. Start Python and run\nthis code:\n\n >>> import site\n >>> site.getusersitepackages()\n '/home/user/.local/lib/python3.x/site-packages'\n\nNow you can create a file named \"usercustomize.py\" in that directory\nand put anything you want in it. It will affect every invocation of\nPython, unless it is started with the \"-s\" option to disable the\nautomatic import.\n\nsitecustomize works in the same way, but is typically created by an\nadministrator of the computer in the global site-packages directory,\nand is imported before usercustomize. See the documentation of the\n\"site\" module for more details.\n\n-[ Footnotes ]-\n\n[1] A problem with the GNU Readline package may prevent this.", "source": "python_docs:python-3.14-docs-text/tutorial/appendix.txt", "domain": "software" }, { "text": "6. Modules\n**********\n\nIf you quit from the Python interpreter and enter it again, the\ndefinitions you have made (functions and variables) are lost.\nTherefore, if you want to write a somewhat longer program, you are\nbetter off using a text editor to prepare the input for the\ninterpreter and running it with that file as input instead. This is\nknown as creating a *script*. As your program gets longer, you may\nwant to split it into several files for easier maintenance. You may\nalso want to use a handy function that you've written in several\nprograms without copying its definition into each program.\n\nTo support this, Python has a way to put definitions in a file and use\nthem in a script or in an interactive instance of the interpreter.\nSuch a file is called a *module*; definitions from a module can be\n*imported* into other modules or into the *main* module (the\ncollection of variables that you have access to in a script executed\nat the top level and in calculator mode).\n\nA module is a file containing Python definitions and statements. The\nfile name is the module name with the suffix \".py\" appended. Within a\nmodule, the module's name (as a string) is available as the value of\nthe global variable \"__name__\". For instance, use your favorite text\neditor to create a file called \"fibo.py\" in the current directory with\nthe following contents:\n\n # Fibonacci numbers module\n\n def fib(n):\n \"\"\"Write Fibonacci series up to n.\"\"\"\n a, b = 0, 1\n while a < n:\n print(a, end=' ')\n a, b = b, a+b\n print()\n\n def fib2(n):\n \"\"\"Return Fibonacci series up to n.\"\"\"\n result = []\n a, b = 0, 1\n while a < n:\n result.append(a)\n a, b = b, a+b\n return result\n\nNow enter the Python interpreter and import this module with the\nfollowing command:\n\n >>> import fibo\n\nThis does not add the names of the functions defined in \"fibo\"\ndirectly to the current *namespace* (see Python Scopes and Namespaces\nfor more details); it only adds the module name \"fibo\" there. Using\nthe module name you can access the functions:\n\n >>> fibo.fib(1000)\n 0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987\n >>> fibo.fib2(100)\n [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]\n >>> fibo.__name__\n 'fibo'\n\nIf you intend to use a function often you can assign it to a local\nname:\n\n >>> fib = fibo.fib\n >>> fib(500)\n 0 1 1 2 3 5 8 13 21 34 55 89 144 233 377\n\n6.1. More on Modules\n====================\n\nA module can contain executable statements as well as function\ndefinitions. These statements are intended to initialize the module.\nThey are executed only the *first* time the module name is encountered\nin an import statement. [1] (They are also run if the file is executed\nas a script.)\n\nEach module has its own private namespace, which is used as the global\nnamespace by all functions defined in the module. Thus, the author of\na module can use global variables in the module without worrying about\naccidental clashes with a user's global variables. On the other hand,\nif you know what you are doing you can touch a module's global\nvariables with the same notation used to refer to its functions,\n\"modname.itemname\".\n\nModules can import other modules. It is customary but not required to\nplace all \"import\" statements at the beginning of a module (or script,\nfor that matter). The imported module names, if placed at the top\nlevel of a module (outside any functions or classes), are added to the\nmodule's global namespace.\n\nThere is a variant of the \"import\" statement that imports names from a\nmodule directly into the importing module's namespace. For example:\n\n >>> from fibo import fib, fib2\n >>> fib(500)\n 0 1 1 2 3 5 8 13 21 34 55 89 144 233 377\n\nThis does not introduce the module name from which the imports are\ntaken in the local namespace (so in the example, \"fibo\" is not\ndefined).\n\nThere is even a variant to import all names that a module defines:\n\n >>> from fibo import *\n >>> fib(500)\n 0 1 1 2 3 5 8 13 21 34 55 89 144 233 377\n\nThis imports all names except those beginning with an underscore\n(\"_\"). In most cases Python programmers do not use this facility since\nit introduces an unknown set of names into the interpreter, possibly\nhiding some things you have already defined.\n\nNote that in general the practice of importing \"*\" from a module or\npackage is frowned upon, since it often causes poorly readable code.\nHowever, it is okay to use it to save typing in interactive sessions.\n\nIf the module name is followed by \"as\", then the name following \"as\"\nis bound directly to the imported module.\n\n >>> import fibo as fib\n >>> fib.fib(500)\n 0 1 1 2 3 5 8 13 21 34 55 89 144 233 377\n\nThis is effectively importing the module in the same way that \"import\nfibo\" will do, with the only difference of it being available as\n\"fib\".\n\nIt can also be used when utilising \"from\" with similar effects:\n\n >>> from fibo import fib as fibonacci\n >>> fibonacci(500)\n 0 1 1 2 3 5 8 13 21 34 55 89 144 233 377\n\nNote:\n\n For efficiency reasons, each module is only imported once per\n interpreter session. Therefore, if you change your modules, you\n must restart the interpreter -- or, if it's just one module you want\n to test interactively, use \"importlib.reload()\", e.g. \"import\n importlib; importlib.reload(modulename)\".\n\n6.1.1. Executing modules as scripts\n-----------------------------------\n\nWhen you run a Python module with\n\n python fibo.py \n\nthe code in the module will be executed, just as if you imported it,\nbut with the \"__name__\" set to \"\"__main__\"\". That means that by\nadding this code at the end of your module:\n\n if __name__ == \"__main__\":\n import sys\n fib(int(sys.argv[1]))\n\nyou can make the file usable as a script as well as an importable\nmodule, because the code that parses the command line only runs if the\nmodule is executed as the \"main\" file:\n\n $ python fibo.py 50\n 0 1 1 2 3 5 8 13 21 34\n\nIf the module is imported, the code is not run:\n\n >>> import fibo\n >>>\n\nThis is often used either to provide a convenient user interface to a\nmodule, or for testing purposes (running the module as a script\nexecutes a test suite).\n\n6.1.2. The Module Search Path\n-----------------------------\n\nWhen a module named \"spam\" is imported, the interpreter first searches\nfor a built-in module with that name. These module names are listed in\n\"sys.builtin_module_names\". If not found, it then searches for a file\nnamed \"spam.py\" in a list of directories given by the variable\n\"sys.path\". \"sys.path\" is initialized from these locations:\n\n* The directory containing the input script (or the current directory\n when no file is specified).\n\n* \"PYTHONPATH\" (a list of directory names, with the same syntax as the\n shell variable \"PATH\").\n\n* The installation-dependent default (by convention including a \"site-\n packages\" directory, handled by the \"site\" module).\n\nMore details are at The initialization of the sys.path module search\npath.\n\nNote:\n\n On file systems which support symlinks, the directory containing the\n input script is calculated after the symlink is followed. In other\n words the directory containing the symlink is **not** added to the\n module search path.\n\nAfter initialization, Python programs can modify \"sys.path\". The\ndirectory containing the script being run is placed at the beginning\nof the search path, ahead of the standard library path. This means\nthat scripts in that directory will be loaded instead of modules of\nthe same name in the library directory. This is an error unless the\nreplacement is intended. See section Standard Modules for more\ninformation.\n\n6.1.3. \"Compiled\" Python files\n------------------------------\n\nTo speed up loading modules, Python caches the compiled version of\neach module in the \"__pycache__\" directory under the name\n\"module.*version*.pyc\", where the version encodes the format of the\ncompiled file; it generally contains the Python version number. For\nexample, in CPython release 3.3 the compiled version of spam.py would\nbe cached as \"__pycache__/spam.cpython-33.pyc\". This naming\nconvention allows compiled modules from different releases and\ndifferent versions of Python to coexist.\n\nPython checks the modification date of the source against the compiled\nversion to see if it's out of date and needs to be recompiled. This\nis a completely automatic process. Also, the compiled modules are\nplatform-independent, so the same library can be shared among systems\nwith different architectures.\n\nPython does not check the cache in two circumstances. First, it\nalways recompiles and does not store the result for the module that's\nloaded directly from the command line. Second, it does not check the\ncache if there is no source module. To support a non-source (compiled\nonly) distribution, the compiled module must be in the source\ndirectory, and there must not be a source module.\n\nSome tips for experts:\n\n* You can use the \"-O\" or \"-OO\" switches on the Python command to\n reduce the size of a compiled module. The \"-O\" switch removes\n assert statements, the \"-OO\" switch removes both assert statements\n and __doc__ strings. Since some programs may rely on having these\n available, you should only use this option if you know what you're\n doing. \"Optimized\" modules have an \"opt-\" tag and are usually\n smaller. Future releases may change the effects of optimization.\n\n* A program doesn't run any faster when it is read from a \".pyc\" file\n than when it is read from a \".py\" file; the only thing that's faster\n about \".pyc\" files is the speed with which they are loaded.\n\n* The module \"compileall\" can create .pyc files for all modules in a\n directory.\n\n* There is more detail on this process, including a flow chart of the\n decisions, in **PEP 3147**.\n\n6.2. Standard Modules\n=====================\n\nPython comes with a library of standard modules, described in a\nseparate document, the Python Library Reference (\"Library Reference\"\nhereafter). Some modules are built into the interpreter; these\nprovide access to operations that are not part of the core of the\nlanguage but are nevertheless built in, either for efficiency or to\nprovide access to operating system primitives such as system calls.\nThe set of such modules is a configuration option which also depends\non the underlying platform. For example, the \"winreg\" module is only\nprovided on Windows systems. One particular module deserves some\nattention: \"sys\", which is built into every Python interpreter. The\nvariables \"sys.ps1\" and \"sys.ps2\" define the strings used as primary\nand secondary prompts:\n\n >>> import sys\n >>> sys.ps1\n '>>> '\n >>> sys.ps2\n '... '\n >>> sys.ps1 = 'C> '\n C> print('Yuck!')\n Yuck!\n C>\n\nThese two variables are only defined if the interpreter is in\ninteractive mode.\n\nThe variable \"sys.path\" is a list of strings that determines the\ninterpreter's search path for modules. It is initialized to a default\npath taken from the environment variable \"PYTHONPATH\", or from a\nbuilt-in default if \"PYTHONPATH\" is not set. You can modify it using\nstandard list operations:\n\n >>> import sys\n >>> sys.path.append('/ufs/guido/lib/python')\n\n6.3. The \"dir()\" Function\n=========================\n\nThe built-in function \"dir()\" is used to find out which names a module\ndefines. It returns a sorted list of strings:\n\n >>> import fibo, sys\n >>> dir(fibo)\n ['__name__', 'fib', 'fib2']\n >>> dir(sys)\n ['__breakpointhook__', '__displayhook__', '__doc__', '__excepthook__',\n '__interactivehook__', '__loader__', '__name__', '__package__', '__spec__',\n '__stderr__', '__stdin__', '__stdout__', '__unraisablehook__',\n '_clear_type_cache', '_current_frames', '_debugmallocstats', '_framework',\n '_getframe', '_git', '_home', '_xoptions', 'abiflags', 'addaudithook',\n 'api_version', 'argv', 'audit', 'base_exec_prefix', 'base_prefix',\n 'breakpointhook', 'builtin_module_names', 'byteorder', 'call_tracing',\n 'callstats', 'copyright', 'displayhook', 'dont_write_bytecode', 'exc_info',\n 'excepthook', 'exec_prefix', 'executable', 'exit', 'flags', 'float_info',\n 'float_repr_style', 'get_asyncgen_hooks', 'get_coroutine_origin_tracking_depth',\n 'getallocatedblocks', 'getdefaultencoding', 'getdlopenflags',\n 'getfilesystemencodeerrors', 'getfilesystemencoding', 'getprofile',\n 'getrecursionlimit', 'getrefcount', 'getsizeof', 'getswitchinterval',\n 'gettrace', 'hash_info', 'hexversion', 'implementation', 'int_info',\n 'intern', 'is_finalizing', 'last_traceback', 'last_type', 'last_value',\n 'maxsize', 'maxunicode', 'meta_path', 'modules', 'path', 'path_hooks',\n 'path_importer_cache', 'platform', 'prefix', 'ps1', 'ps2', 'pycache_prefix',\n 'set_asyncgen_hooks', 'set_coroutine_origin_tracking_depth', 'setdlopenflags',\n 'setprofile', 'setrecursionlimit', 'setswitchinterval', 'settrace', 'stderr',\n 'stdin', 'stdout', 'thread_info', 'unraisablehook', 'version', 'version_info',\n 'warnoptions']\n\nWithout arguments, \"dir()\" lists the names you have defined currently:\n\n >>> a = [1, 2, 3, 4, 5]\n >>> import fibo\n >>> fib = fibo.fib\n >>> dir()\n ['__builtins__', '__name__', 'a', 'fib', 'fibo', 'sys']\n\nNote that it lists all types of names: variables, modules, functions,\netc.\n\n\"dir()\" does not list the names of built-in functions and variables.\nIf you want a list of those, they are defined in the standard module\n\"builtins\":\n\n >>> import builtins\n >>> dir(builtins)\n ['ArithmeticError', 'AssertionError', 'AttributeError', 'BaseException',\n 'BlockingIOError', 'BrokenPipeError', 'BufferError', 'BytesWarning',\n 'ChildProcessError', 'ConnectionAbortedError', 'ConnectionError',\n 'ConnectionRefusedError', 'ConnectionResetError', 'DeprecationWarning',\n 'EOFError', 'Ellipsis', 'EnvironmentError', 'Exception', 'False',\n 'FileExistsError', 'FileNotFoundError', 'FloatingPointError',\n 'FutureWarning', 'GeneratorExit', 'IOError', 'ImportError',\n 'ImportWarning', 'IndentationError', 'IndexError', 'InterruptedError',\n 'IsADirectoryError', 'KeyError', 'KeyboardInterrupt', 'LookupError',\n 'MemoryError', 'NameError', 'None', 'NotADirectoryError', 'NotImplemented',\n 'NotImplementedError', 'OSError', 'OverflowError',\n 'PendingDeprecationWarning', 'PermissionError', 'ProcessLookupError',\n 'ReferenceError', 'ResourceWarning', 'RuntimeError', 'RuntimeWarning',\n 'StopIteration', 'SyntaxError', 'SyntaxWarning', 'SystemError',\n 'SystemExit', 'TabError', 'TimeoutError', 'True', 'TypeError',\n 'UnboundLocalError', 'UnicodeDecodeError', 'UnicodeEncodeError',\n 'UnicodeError', 'UnicodeTranslateError', 'UnicodeWarning', 'UserWarning',\n 'ValueError', 'Warning', 'ZeroDivisionError', '_', '__build_class__',\n '__debug__', '__doc__', '__import__', '__name__', '__package__', 'abs',\n 'all', 'any', 'ascii', 'bin', 'bool', 'bytearray', 'bytes', 'callable',\n 'chr', 'classmethod', 'compile', 'complex', 'copyright', 'credits',\n 'delattr', 'dict', 'dir', 'divmod', 'enumerate', 'eval', 'exec', 'exit',\n 'filter', 'float', 'format', 'frozenset', 'getattr', 'globals', 'hasattr',\n 'hash', 'help', 'hex', 'id', 'input', 'int', 'isinstance', 'issubclass',\n 'iter', 'len', 'license', 'list', 'locals', 'map', 'max', 'memoryview',\n 'min', 'next', 'object', 'oct', 'open', 'ord', 'pow', 'print', 'property',\n 'quit', 'range', 'repr', 'reversed', 'round', 'set', 'setattr', 'slice',\n 'sorted', 'staticmethod', 'str', 'sum', 'super', 'tuple', 'type', 'vars',\n 'zip']\n\n6.4. Packages\n=============\n\nPackages are a way of structuring Python's module namespace by using\n\"dotted module names\". For example, the module name \"A.B\" designates\na submodule named \"B\" in a package named \"A\". Just like the use of\nmodules saves the authors of different modules from having to worry\nabout each other's global variable names, the use of dotted module\nnames saves the authors of multi-module packages like NumPy or Pillow\nfrom having to worry about each other's module names.\n\nSuppose you want to design a collection of modules (a \"package\") for\nthe uniform handling of sound files and sound data. There are many\ndifferent sound file formats (usually recognized by their extension,\nfor example: \".wav\", \".aiff\", \".au\"), so you may need to create and\nmaintain a growing collection of modules for the conversion between\nthe various file formats. There are also many different operations you\nmight want to perform on sound data (such as mixing, adding echo,\napplying an equalizer function, creating an artificial stereo effect),\nso in addition you will be writing a never-ending stream of modules to\nperform these operations. Here's a possible structure for your\npackage (expressed in terms of a hierarchical filesystem):\n\n sound/ Top-level package\n __init__.py Initialize the sound package\n formats/ Subpackage for file format conversions\n __init__.py\n wavread.py\n wavwrite.py\n aiffread.py\n aiffwrite.py\n auread.py\n auwrite.py\n ...\n effects/ Subpackage for sound effects\n __init__.py\n echo.py\n surround.py\n reverse.py\n ...\n filters/ Subpackage for filters\n __init__.py\n equalizer.py\n vocoder.py\n karaoke.py\n ...\n\nWhen importing the package, Python searches through the directories on\n\"sys.path\" looking for the package subdirectory.\n\nThe \"__init__.py\" files are required to make Python treat directories\ncontaining the file as packages (unless using a *namespace package*, a\nrelatively advanced feature). This prevents directories with a common\nname, such as \"string\", from unintentionally hiding valid modules that\noccur later on the module search path. In the simplest case,\n\"__init__.py\" can just be an empty file, but it can also execute\ninitialization code for the package or set the \"__all__\" variable,\ndescribed later.\n\nUsers of the package can import individual modules from the package,\nfor example:\n\n import sound.effects.echo\n\nThis loads the submodule \"sound.effects.echo\". It must be referenced\nwith its full name.\n\n sound.effects.echo.echofilter(input, output, delay=0.7, atten=4)\n\nAn alternative way of importing the submodule is:\n\n from sound.effects import echo\n\nThis also loads the submodule \"echo\", and makes it available without\nits package prefix, so it can be used as follows:\n\n echo.echofilter(input, output, delay=0.7, atten=4)\n\nYet another variation is to import the desired function or variable\ndirectly:\n\n from sound.effects.echo import echofilter\n\nAgain, this loads the submodule \"echo\", but this makes its function\n\"echofilter()\" directly available:\n\n echofilter(input, output, delay=0.7, atten=4)\n\nNote that when using \"from package import item\", the item can be\neither a submodule (or subpackage) of the package, or some other name\ndefined in the package, like a function, class or variable. The\n\"import\" statement first tests whether the item is defined in the\npackage; if not, it assumes it is a module and attempts to load it.\nIf it fails to find it, an \"ImportError\" exception is raised.\n\nContrarily, when using syntax like \"import item.subitem.subsubitem\",\neach item except for the last must be a package; the last item can be\na module or a package but can't be a class or function or variable\ndefined in the previous item.\n\n6.4.1. Importing * From a Package\n---------------------------------\n\nNow what happens when the user writes \"from sound.effects import *\"?\nIdeally, one would hope that this somehow goes out to the filesystem,\nfinds which submodules are present in the package, and imports them\nall. This could take a long time and importing sub-modules might have\nunwanted side-effects that should only happen when the sub-module is\nexplicitly imported.\n\nThe only solution is for the package author to provide an explicit\nindex of the package. The \"import\" statement uses the following\nconvention: if a package's \"__init__.py\" code defines a list named\n\"__all__\", it is taken to be the list of module names that should be\nimported when \"from package import *\" is encountered. It is up to the\npackage author to keep this list up-to-date when a new version of the\npackage is released. Package authors may also decide not to support\nit, if they don't see a use for importing * from their package. For\nexample, the file \"sound/effects/__init__.py\" could contain the\nfollowing code:\n\n __all__ = [\"echo\", \"surround\", \"reverse\"]\n\nThis would mean that \"from sound.effects import *\" would import the\nthree named submodules of the \"sound.effects\" package.\n\nBe aware that submodules might become shadowed by locally defined\nnames. For example, if you added a \"reverse\" function to the\n\"sound/effects/__init__.py\" file, the \"from sound.effects import *\"\nwould only import the two submodules \"echo\" and \"surround\", but *not*\nthe \"reverse\" submodule, because it is shadowed by the locally defined\n\"reverse\" function:\n\n __all__ = [\n \"echo\", # refers to the 'echo.py' file\n \"surround\", # refers to the 'surround.py' file\n \"reverse\", # !!! refers to the 'reverse' function now !!!\n ]\n\n def reverse(msg: str): # <-- this name shadows the 'reverse.py' submodule\n return msg[::-1] # in the case of a 'from sound.effects import *'\n\nIf \"__all__\" is not defined, the statement \"from sound.effects import\n*\" does *not* import all submodules from the package \"sound.effects\"\ninto the current namespace; it only ensures that the package\n\"sound.effects\" has been imported (possibly running any initialization\ncode in \"__init__.py\") and then imports whatever names are defined in\nthe package. This includes any names defined (and submodules\nexplicitly loaded) by \"__init__.py\". It also includes any submodules\nof the package that were explicitly loaded by previous \"import\"\nstatements. Consider this code:\n\n import sound.effects.echo\n import sound.effects.surround\n from sound.effects import *\n\nIn this example, the \"echo\" and \"surround\" modules are imported in the\ncurrent namespace because they are defined in the \"sound.effects\"\npackage when the \"from...import\" statement is executed. (This also\nworks when \"__all__\" is defined.)\n\nAlthough certain modules are designed to export only names that follow\ncertain patterns when you use \"import *\", it is still considered bad\npractice in production code.\n\nRemember, there is nothing wrong with using \"from package import\nspecific_submodule\"! In fact, this is the recommended notation unless\nthe importing module needs to use submodules with the same name from\ndifferent packages.\n\n6.4.2. Intra-package References\n-------------------------------\n\nWhen packages are structured into subpackages (as with the \"sound\"\npackage in the example), you can use absolute imports to refer to\nsubmodules of siblings packages. For example, if the module\n\"sound.filters.vocoder\" needs to use the \"echo\" module in the\n\"sound.effects\" package, it can use \"from sound.effects import echo\".\n\nYou can also write relative imports, with the \"from module import\nname\" form of import statement. These imports use leading dots to\nindicate the current and parent packages involved in the relative\nimport. From the \"surround\" module for example, you might use:\n\n from . import echo\n from .. import formats\n from ..filters import equalizer\n\nNote that relative imports are based on the name of the current\nmodule's package. Since the main module does not have a package,\nmodules intended for use as the main module of a Python application\nmust always use absolute imports.\n\n6.4.3. Packages in Multiple Directories\n---------------------------------------\n\nPackages support one more special attribute, \"__path__\". This is\ninitialized to be a *sequence* of strings containing the name of the\ndirectory holding the package's \"__init__.py\" before the code in that\nfile is executed. This variable can be modified; doing so affects\nfuture searches for modules and subpackages contained in the package.\n\nWhile this feature is not often needed, it can be used to extend the\nset of modules found in a package.\n\n-[ Footnotes ]-\n\n[1] In fact function definitions are also 'statements' that are\n 'executed'; the execution of a module-level function definition\n adds the function name to the module's global namespace.", "source": "python_docs:python-3.14-docs-text/tutorial/modules.txt", "domain": "software" }, { "text": "15. Floating-Point Arithmetic: Issues and Limitations\n******************************************************\n\nFloating-point numbers are represented in computer hardware as base 2\n(binary) fractions. For example, the **decimal** fraction \"0.625\" has\nvalue 6/10 + 2/100 + 5/1000, and in the same way the **binary**\nfraction \"0.101\" has value 1/2 + 0/4 + 1/8. These two fractions have\nidentical values, the only real difference being that the first is\nwritten in base 10 fractional notation, and the second in base 2.\n\nUnfortunately, most decimal fractions cannot be represented exactly as\nbinary fractions. A consequence is that, in general, the decimal\nfloating-point numbers you enter are only approximated by the binary\nfloating-point numbers actually stored in the machine.\n\nThe problem is easier to understand at first in base 10. Consider the\nfraction 1/3. You can approximate that as a base 10 fraction:\n\n 0.3\n\nor, better,\n\n 0.33\n\nor, better,\n\n 0.333\n\nand so on. No matter how many digits you're willing to write down,\nthe result will never be exactly 1/3, but will be an increasingly\nbetter approximation of 1/3.\n\nIn the same way, no matter how many base 2 digits you're willing to\nuse, the decimal value 0.1 cannot be represented exactly as a base 2\nfraction. In base 2, 1/10 is the infinitely repeating fraction\n\n 0.0001100110011001100110011001100110011001100110011...\n\nStop at any finite number of bits, and you get an approximation. On\nmost machines today, floats are approximated using a binary fraction\nwith the numerator using the first 53 bits starting with the most\nsignificant bit and with the denominator as a power of two. In the\ncase of 1/10, the binary fraction is \"3602879701896397 / 2 ** 55\"\nwhich is close to but not exactly equal to the true value of 1/10.\n\nMany users are not aware of the approximation because of the way\nvalues are displayed. Python only prints a decimal approximation to\nthe true decimal value of the binary approximation stored by the\nmachine. On most machines, if Python were to print the true decimal\nvalue of the binary approximation stored for 0.1, it would have to\ndisplay:\n\n >>> 0.1\n 0.1000000000000000055511151231257827021181583404541015625\n\nThat is more digits than most people find useful, so Python keeps the\nnumber of digits manageable by displaying a rounded value instead:\n\n >>> 1 / 10\n 0.1\n\nJust remember, even though the printed result looks like the exact\nvalue of 1/10, the actual stored value is the nearest representable\nbinary fraction.\n\nInterestingly, there are many different decimal numbers that share the\nsame nearest approximate binary fraction. For example, the numbers\n\"0.1\" and \"0.10000000000000001\" and\n\"0.1000000000000000055511151231257827021181583404541015625\" are all\napproximated by \"3602879701896397 / 2 ** 55\". Since all of these\ndecimal values share the same approximation, any one of them could be\ndisplayed while still preserving the invariant \"eval(repr(x)) == x\".\n\nHistorically, the Python prompt and built-in \"repr()\" function would\nchoose the one with 17 significant digits, \"0.10000000000000001\".\nStarting with Python 3.1, Python (on most systems) is now able to\nchoose the shortest of these and simply display \"0.1\".\n\nNote that this is in the very nature of binary floating point: this is\nnot a bug in Python, and it is not a bug in your code either. You'll\nsee the same kind of thing in all languages that support your\nhardware's floating-point arithmetic (although some languages may not\n*display* the difference by default, or in all output modes).\n\nFor more pleasant output, you may wish to use string formatting to\nproduce a limited number of significant digits:\n\n >>> format(math.pi, '.12g') # give 12 significant digits\n '3.14159265359'\n\n >>> format(math.pi, '.2f') # give 2 digits after the point\n '3.14'\n\n >>> repr(math.pi)\n '3.141592653589793'\n\nIt's important to realize that this is, in a real sense, an illusion:\nyou're simply rounding the *display* of the true machine value.\n\nOne illusion may beget another. For example, since 0.1 is not exactly\n1/10, summing three values of 0.1 may not yield exactly 0.3, either:\n\n >>> 0.1 + 0.1 + 0.1 == 0.3\n False\n\nAlso, since the 0.1 cannot get any closer to the exact value of 1/10\nand 0.3 cannot get any closer to the exact value of 3/10, then pre-\nrounding with \"round()\" function cannot help:\n\n >>> round(0.1, 1) + round(0.1, 1) + round(0.1, 1) == round(0.3, 1)\n False\n\nThough the numbers cannot be made closer to their intended exact\nvalues, the \"math.isclose()\" function can be useful for comparing\ninexact values:\n\n >>> math.isclose(0.1 + 0.1 + 0.1, 0.3)\n True\n\nAlternatively, the \"round()\" function can be used to compare rough\napproximations:\n\n >>> round(math.pi, ndigits=2) == round(22 / 7, ndigits=2)\n True\n\nBinary floating-point arithmetic holds many surprises like this. The\nproblem with \"0.1\" is explained in precise detail below, in the\n\"Representation Error\" section. See Examples of Floating Point\nProblems for a pleasant summary of how binary floating point works and\nthe kinds of problems commonly encountered in practice. Also see The\nPerils of Floating Point for a more complete account of other common\nsurprises.\n\nAs that says near the end, \"there are no easy answers.\" Still, don't\nbe unduly wary of floating point! The errors in Python float\noperations are inherited from the floating-point hardware, and on most\nmachines are on the order of no more than 1 part in 2**53 per\noperation. That's more than adequate for most tasks, but you do need\nto keep in mind that it's not decimal arithmetic and that every float\noperation can suffer a new rounding error.\n\nWhile pathological cases do exist, for most casual use of floating-\npoint arithmetic you'll see the result you expect in the end if you\nsimply round the display of your final results to the number of\ndecimal digits you expect. \"str()\" usually suffices, and for finer\ncontrol see the \"str.format()\" method's format specifiers in Format\nString Syntax.\n\nFor use cases which require exact decimal representation, try using\nthe \"decimal\" module which implements decimal arithmetic suitable for\naccounting applications and high-precision applications.\n\nAnother form of exact arithmetic is supported by the \"fractions\"\nmodule which implements arithmetic based on rational numbers (so the\nnumbers like 1/3 can be represented exactly).\n\nIf you are a heavy user of floating-point operations you should take a\nlook at the NumPy package and many other packages for mathematical and\nstatistical operations supplied by the SciPy project. See\n.\n\nPython provides tools that may help on those rare occasions when you\nreally *do* want to know the exact value of a float. The\n\"float.as_integer_ratio()\" method expresses the value of a float as a\nfraction:\n\n >>> x = 3.14159\n >>> x.as_integer_ratio()\n (3537115888337719, 1125899906842624)\n\nSince the ratio is exact, it can be used to losslessly recreate the\noriginal value:\n\n >>> x == 3537115888337719 / 1125899906842624\n True\n\nThe \"float.hex()\" method expresses a float in hexadecimal (base 16),\nagain giving the exact value stored by your computer:\n\n >>> x.hex()\n '0x1.921f9f01b866ep+1'\n\nThis precise hexadecimal representation can be used to reconstruct the\nfloat value exactly:\n\n >>> x == float.fromhex('0x1.921f9f01b866ep+1')\n True\n\nSince the representation is exact, it is useful for reliably porting\nvalues across different versions of Python (platform independence) and\nexchanging data with other languages that support the same format\n(such as Java and C99).\n\nAnother helpful tool is the \"sum()\" function which helps mitigate\nloss-of-precision during summation. It uses extended precision for\nintermediate rounding steps as values are added onto a running total.\nThat can make a difference in overall accuracy so that the errors do\nnot accumulate to the point where they affect the final total:\n\n >>> 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 == 1.0\n False\n >>> sum([0.1] * 10) == 1.0\n True\n\nThe \"math.fsum()\" goes further and tracks all of the \"lost digits\" as\nvalues are added onto a running total so that the result has only a\nsingle rounding. This is slower than \"sum()\" but will be more\naccurate in uncommon cases where large magnitude inputs mostly cancel\neach other out leaving a final sum near zero:\n\n >>> arr = [-0.10430216751806065, -266310978.67179024, 143401161448607.16,\n ... -143401161400469.7, 266262841.31058735, -0.003244936839808227]\n >>> float(sum(map(Fraction, arr))) # Exact summation with single rounding\n 8.042173697819788e-13\n >>> math.fsum(arr) # Single rounding\n 8.042173697819788e-13\n >>> sum(arr) # Multiple roundings in extended precision\n 8.042178034628478e-13\n >>> total = 0.0\n >>> for x in arr:\n ... total += x # Multiple roundings in standard precision\n ...\n >>> total # Straight addition has no correct digits!\n -0.0051575902860057365\n\n15.1. Representation Error\n==========================\n\nThis section explains the \"0.1\" example in detail, and shows how you\ncan perform an exact analysis of cases like this yourself. Basic\nfamiliarity with binary floating-point representation is assumed.\n\n*Representation error* refers to the fact that some (most, actually)\ndecimal fractions cannot be represented exactly as binary (base 2)\nfractions. This is the chief reason why Python (or Perl, C, C++, Java,\nFortran, and many others) often won't display the exact decimal number\nyou expect.\n\nWhy is that? 1/10 is not exactly representable as a binary fraction.\nSince at least 2000, almost all machines use IEEE 754 binary floating-\npoint arithmetic, and almost all platforms map Python floats to IEEE\n754 binary64 \"double precision\" values. IEEE 754 binary64 values\ncontain 53 bits of precision, so on input the computer strives to\nconvert 0.1 to the closest fraction it can of the form *J*/2***N*\nwhere *J* is an integer containing exactly 53 bits. Rewriting\n\n 1 / 10 ~= J / (2**N)\n\nas\n\n J ~= 2**N / 10\n\nand recalling that *J* has exactly 53 bits (is \">= 2**52\" but \"<\n2**53\"), the best value for *N* is 56:\n\n >>> 2**52 <= 2**56 // 10 < 2**53\n True\n\nThat is, 56 is the only value for *N* that leaves *J* with exactly 53\nbits. The best possible value for *J* is then that quotient rounded:\n\n >>> q, r = divmod(2**56, 10)\n >>> r\n 6\n\nSince the remainder is more than half of 10, the best approximation is\nobtained by rounding up:\n\n >>> q+1\n 7205759403792794\n\nTherefore the best possible approximation to 1/10 in IEEE 754 double\nprecision is:\n\n 7205759403792794 / 2 ** 56\n\nDividing both the numerator and denominator by two reduces the\nfraction to:\n\n 3602879701896397 / 2 ** 55\n\nNote that since we rounded up, this is actually a little bit larger\nthan 1/10; if we had not rounded up, the quotient would have been a\nlittle bit smaller than 1/10. But in no case can it be *exactly*\n1/10!\n\nSo the computer never \"sees\" 1/10: what it sees is the exact fraction\ngiven above, the best IEEE 754 double approximation it can get:\n\n >>> 0.1 * 2 ** 55\n 3602879701896397.0\n\nIf we multiply that fraction by 10**55, we can see the value out to 55\ndecimal digits:\n\n >>> 3602879701896397 * 10 ** 55 // 2 ** 55\n 1000000000000000055511151231257827021181583404541015625\n\nmeaning that the exact number stored in the computer is equal to the\ndecimal value\n0.1000000000000000055511151231257827021181583404541015625. Instead of\ndisplaying the full decimal value, many languages (including older\nversions of Python), round the result to 17 significant digits:\n\n >>> format(0.1, '.17f')\n '0.10000000000000001'\n\nThe \"fractions\" and \"decimal\" modules make these calculations easy:\n\n >>> from decimal import Decimal\n >>> from fractions import Fraction\n\n >>> Fraction.from_float(0.1)\n Fraction(3602879701896397, 36028797018963968)\n\n >>> (0.1).as_integer_ratio()\n (3602879701896397, 36028797018963968)\n\n >>> Decimal.from_float(0.1)\n Decimal('0.1000000000000000055511151231257827021181583404541015625')\n\n >>> format(Decimal.from_float(0.1), '.17')\n '0.10000000000000001'", "source": "python_docs:python-3.14-docs-text/tutorial/floatingpoint.txt", "domain": "software" }, { "text": "8. Errors and Exceptions\n************************\n\nUntil now error messages haven't been more than mentioned, but if you\nhave tried out the examples you have probably seen some. There are\n(at least) two distinguishable kinds of errors: *syntax errors* and\n*exceptions*.\n\n8.1. Syntax Errors\n==================\n\nSyntax errors, also known as parsing errors, are perhaps the most\ncommon kind of complaint you get while you are still learning Python:\n\n >>> while True print('Hello world')\n File \"\", line 1\n while True print('Hello world')\n ^^^^^\n SyntaxError: invalid syntax\n\nThe parser repeats the offending line and displays little arrows\npointing at the place where the error was detected. Note that this is\nnot always the place that needs to be fixed. In the example, the\nerror is detected at the function \"print()\", since a colon (\"':'\") is\nmissing just before it.\n\nThe file name (\"\" in our example) and line number are printed\nso you know where to look in case the input came from a file.\n\n8.2. Exceptions\n===============\n\nEven if a statement or expression is syntactically correct, it may\ncause an error when an attempt is made to execute it. Errors detected\nduring execution are called *exceptions* and are not unconditionally\nfatal: you will soon learn how to handle them in Python programs.\nMost exceptions are not handled by programs, however, and result in\nerror messages as shown here:\n\n >>> 10 * (1/0)\n Traceback (most recent call last):\n File \"\", line 1, in \n 10 * (1/0)\n ~^~\n ZeroDivisionError: division by zero\n >>> 4 + spam*3\n Traceback (most recent call last):\n File \"\", line 1, in \n 4 + spam*3\n ^^^^\n NameError: name 'spam' is not defined\n >>> '2' + 2\n Traceback (most recent call last):\n File \"\", line 1, in \n '2' + 2\n ~~~~^~~\n TypeError: can only concatenate str (not \"int\") to str\n\nThe last line of the error message indicates what happened. Exceptions\ncome in different types, and the type is printed as part of the\nmessage: the types in the example are \"ZeroDivisionError\", \"NameError\"\nand \"TypeError\". The string printed as the exception type is the name\nof the built-in exception that occurred. This is true for all built-\nin exceptions, but need not be true for user-defined exceptions\n(although it is a useful convention). Standard exception names are\nbuilt-in identifiers (not reserved keywords).\n\nThe rest of the line provides detail based on the type of exception\nand what caused it.\n\nThe preceding part of the error message shows the context where the\nexception occurred, in the form of a stack traceback. In general it\ncontains a stack traceback listing source lines; however, it will not\ndisplay lines read from standard input.\n\nBuilt-in Exceptions lists the built-in exceptions and their meanings.\n\n8.3. Handling Exceptions\n========================\n\nIt is possible to write programs that handle selected exceptions. Look\nat the following example, which asks the user for input until a valid\ninteger has been entered, but allows the user to interrupt the program\n(using \"Control\"-\"C\" or whatever the operating system supports); note\nthat a user-generated interruption is signalled by raising the\n\"KeyboardInterrupt\" exception.\n\n >>> while True:\n ... try:\n ... x = int(input(\"Please enter a number: \"))\n ... break\n ... except ValueError:\n ... print(\"Oops! That was no valid number. Try again...\")\n ...\n\nThe \"try\" statement works as follows.\n\n* First, the *try clause* (the statement(s) between the \"try\" and\n \"except\" keywords) is executed.\n\n* If no exception occurs, the *except clause* is skipped and execution\n of the \"try\" statement is finished.\n\n* If an exception occurs during execution of the \"try\" clause, the\n rest of the clause is skipped. Then, if its type matches the\n exception named after the \"except\" keyword, the *except clause* is\n executed, and then execution continues after the try/except block.\n\n* If an exception occurs which does not match the exception named in\n the *except clause*, it is passed on to outer \"try\" statements; if\n no handler is found, it is an *unhandled exception* and execution\n stops with an error message.\n\nA \"try\" statement may have more than one *except clause*, to specify\nhandlers for different exceptions. At most one handler will be\nexecuted. Handlers only handle exceptions that occur in the\ncorresponding *try clause*, not in other handlers of the same \"try\"\nstatement. An *except clause* may name multiple exceptions, for\nexample:\n\n ... except RuntimeError, TypeError, NameError:\n ... pass\n\nA class in an \"except\" clause matches exceptions which are instances\nof the class itself or one of its derived classes (but not the other\nway around --- an *except clause* listing a derived class does not\nmatch instances of its base classes). For example, the following code\nwill print B, C, D in that order:\n\n class B(Exception):\n pass\n\n class C(B):\n pass\n\n class D(C):\n pass\n\n for cls in [B, C, D]:\n try:\n raise cls()\n except D:\n print(\"D\")\n except C:\n print(\"C\")\n except B:\n print(\"B\")\n\nNote that if the *except clauses* were reversed (with \"except B\"\nfirst), it would have printed B, B, B --- the first matching *except\nclause* is triggered.\n\nWhen an exception occurs, it may have associated values, also known as\nthe exception's *arguments*. The presence and types of the arguments\ndepend on the exception type.\n\nThe *except clause* may specify a variable after the exception name.\nThe variable is bound to the exception instance which typically has an\n\"args\" attribute that stores the arguments. For convenience, builtin\nexception types define \"__str__()\" to print all the arguments without\nexplicitly accessing \".args\".\n\n >>> try:\n ... raise Exception('spam', 'eggs')\n ... except Exception as inst:\n ... print(type(inst)) # the exception type\n ... print(inst.args) # arguments stored in .args\n ... print(inst) # __str__ allows args to be printed directly,\n ... # but may be overridden in exception subclasses\n ... x, y = inst.args # unpack args\n ... print('x =', x)\n ... print('y =', y)\n ...\n \n ('spam', 'eggs')\n ('spam', 'eggs')\n x = spam\n y = eggs\n\nThe exception's \"__str__()\" output is printed as the last part\n('detail') of the message for unhandled exceptions.\n\n\"BaseException\" is the common base class of all exceptions. One of its\nsubclasses, \"Exception\", is the base class of all the non-fatal\nexceptions. Exceptions which are not subclasses of \"Exception\" are not\ntypically handled, because they are used to indicate that the program\nshould terminate. They include \"SystemExit\" which is raised by\n\"sys.exit()\" and \"KeyboardInterrupt\" which is raised when a user\nwishes to interrupt the program.\n\n\"Exception\" can be used as a wildcard that catches (almost)\neverything. However, it is good practice to be as specific as possible\nwith the types of exceptions that we intend to handle, and to allow\nany unexpected exceptions to propagate on.\n\nThe most common pattern for handling \"Exception\" is to print or log\nthe exception and then re-raise it (allowing a caller to handle the\nexception as well):\n\n import sys\n\n try:\n f = open('myfile.txt')\n s = f.readline()\n i = int(s.strip())\n except OSError as err:\n print(\"OS error:\", err)\n except ValueError:\n print(\"Could not convert data to an integer.\")\n except Exception as err:\n print(f\"Unexpected {err=}, {type(err)=}\")\n raise\n\nThe \"try\" ... \"except\" statement has an optional *else clause*, which,\nwhen present, must follow all *except clauses*. It is useful for code\nthat must be executed if the *try clause* does not raise an exception.\nFor example:\n\n for arg in sys.argv[1:]:\n try:\n f = open(arg, 'r')\n except OSError:\n print('cannot open', arg)\n else:\n print(arg, 'has', len(f.readlines()), 'lines')\n f.close()\n\nThe use of the \"else\" clause is better than adding additional code to\nthe \"try\" clause because it avoids accidentally catching an exception\nthat wasn't raised by the code being protected by the \"try\" ...\n\"except\" statement.\n\nException handlers do not handle only exceptions that occur\nimmediately in the *try clause*, but also those that occur inside\nfunctions that are called (even indirectly) in the *try clause*. For\nexample:\n\n >>> def this_fails():\n ... x = 1/0\n ...\n >>> try:\n ... this_fails()\n ... except ZeroDivisionError as err:\n ... print('Handling run-time error:', err)\n ...\n Handling run-time error: division by zero\n\n8.4. Raising Exceptions\n=======================\n\nThe \"raise\" statement allows the programmer to force a specified\nexception to occur. For example:\n\n >>> raise NameError('HiThere')\n Traceback (most recent call last):\n File \"\", line 1, in \n raise NameError('HiThere')\n NameError: HiThere\n\nThe sole argument to \"raise\" indicates the exception to be raised.\nThis must be either an exception instance or an exception class (a\nclass that derives from \"BaseException\", such as \"Exception\" or one of\nits subclasses). If an exception class is passed, it will be\nimplicitly instantiated by calling its constructor with no arguments:\n\n raise ValueError # shorthand for 'raise ValueError()'\n\nIf you need to determine whether an exception was raised but don't\nintend to handle it, a simpler form of the \"raise\" statement allows\nyou to re-raise the exception:\n\n >>> try:\n ... raise NameError('HiThere')\n ... except NameError:\n ... print('An exception flew by!')\n ... raise\n ...\n An exception flew by!\n Traceback (most recent call last):\n File \"\", line 2, in \n raise NameError('HiThere')\n NameError: HiThere\n\n8.5. Exception Chaining\n=======================\n\nIf an unhandled exception occurs inside an \"except\" section, it will\nhave the exception being handled attached to it and included in the\nerror message:\n\n >>> try:\n ... open(\"database.sqlite\")\n ... except OSError:\n ... raise RuntimeError(\"unable to handle error\")\n ...\n Traceback (most recent call last):\n File \"\", line 2, in \n open(\"database.sqlite\")\n ~~~~^^^^^^^^^^^^^^^^^^^\n FileNotFoundError: [Errno 2] No such file or directory: 'database.sqlite'\n\n During handling of the above exception, another exception occurred:\n\n Traceback (most recent call last):\n File \"\", line 4, in \n raise RuntimeError(\"unable to handle error\")\n RuntimeError: unable to handle error\n\nTo indicate that an exception is a direct consequence of another, the\n\"raise\" statement allows an optional \"from\" clause:\n\n # exc must be exception instance or None.\n raise RuntimeError from exc\n\nThis can be useful when you are transforming exceptions. For example:\n\n >>> def func():\n ... raise ConnectionError\n ...\n >>> try:\n ... func()\n ... except ConnectionError as exc:\n ... raise RuntimeError('Failed to open database') from exc\n ...\n Traceback (most recent call last):\n File \"\", line 2, in \n func()\n ~~~~^^\n File \"\", line 2, in func\n ConnectionError\n\n The above exception was the direct cause of the following exception:\n\n Traceback (most recent call last):\n File \"\", line 4, in \n raise RuntimeError('Failed to open database') from exc\n RuntimeError: Failed to open database\n\nIt also allows disabling automatic exception chaining using the \"from\nNone\" idiom:\n\n >>> try:\n ... open('database.sqlite')\n ... except OSError:\n ... raise RuntimeError from None\n ...\n Traceback (most recent call last):\n File \"\", line 4, in \n raise RuntimeError from None\n RuntimeError\n\nFor more information about chaining mechanics, see Built-in\nExceptions.\n\n8.6. User-defined Exceptions\n============================\n\nPrograms may name their own exceptions by creating a new exception\nclass (see Classes for more about Python classes). Exceptions should\ntypically be derived from the \"Exception\" class, either directly or\nindirectly.\n\nException classes can be defined which do anything any other class can\ndo, but are usually kept simple, often only offering a number of\nattributes that allow information about the error to be extracted by\nhandlers for the exception.\n\nMost exceptions are defined with names that end in \"Error\", similar to\nthe naming of the standard exceptions.\n\nMany standard modules define their own exceptions to report errors\nthat may occur in functions they define.\n\n8.7. Defining Clean-up Actions\n==============================\n\nThe \"try\" statement has another optional clause which is intended to\ndefine clean-up actions that must be executed under all circumstances.\nFor example:\n\n >>> try:\n ... raise KeyboardInterrupt\n ... finally:\n ... print('Goodbye, world!')\n ...\n Goodbye, world!\n Traceback (most recent call last):\n File \"\", line 2, in \n raise KeyboardInterrupt\n KeyboardInterrupt\n\nIf a \"finally\" clause is present, the \"finally\" clause will execute as\nthe last task before the \"try\" statement completes. The \"finally\"\nclause runs whether or not the \"try\" statement produces an exception.\nThe following points discuss more complex cases when an exception\noccurs:\n\n* If an exception occurs during execution of the \"try\" clause, the\n exception may be handled by an \"except\" clause. If the exception is\n not handled by an \"except\" clause, the exception is re-raised after\n the \"finally\" clause has been executed.\n\n* An exception could occur during execution of an \"except\" or \"else\"\n clause. Again, the exception is re-raised after the \"finally\" clause\n has been executed.\n\n* If the \"finally\" clause executes a \"break\", \"continue\" or \"return\"\n statement, exceptions are not re-raised. This can be confusing and\n is therefore discouraged. From version 3.14 the compiler emits a\n \"SyntaxWarning\" for it (see **PEP 765**).\n\n* If the \"try\" statement reaches a \"break\", \"continue\" or \"return\"\n statement, the \"finally\" clause will execute just prior to the\n \"break\", \"continue\" or \"return\" statement's execution.\n\n* If a \"finally\" clause includes a \"return\" statement, the returned\n value will be the one from the \"finally\" clause's \"return\"\n statement, not the value from the \"try\" clause's \"return\" statement.\n This can be confusing and is therefore discouraged. From version\n 3.14 the compiler emits a \"SyntaxWarning\" for it (see **PEP 765**).\n\nFor example:\n\n >>> def bool_return():\n ... try:\n ... return True\n ... finally:\n ... return False\n ...\n >>> bool_return()\n False\n\nA more complicated example:\n\n >>> def divide(x, y):\n ... try:\n ... result = x / y\n ... except ZeroDivisionError:\n ... print(\"division by zero!\")\n ... else:\n ... print(\"result is\", result)\n ... finally:\n ... print(\"executing finally clause\")\n ...\n >>> divide(2, 1)\n result is 2.0\n executing finally clause\n >>> divide(2, 0)\n division by zero!\n executing finally clause\n >>> divide(\"2\", \"1\")\n executing finally clause\n Traceback (most recent call last):\n File \"\", line 1, in \n divide(\"2\", \"1\")\n ~~~~~~^^^^^^^^^^\n File \"\", line 3, in divide\n result = x / y\n ~~^~~\n TypeError: unsupported operand type(s) for /: 'str' and 'str'\n\nAs you can see, the \"finally\" clause is executed in any event. The\n\"TypeError\" raised by dividing two strings is not handled by the\n\"except\" clause and therefore re-raised after the \"finally\" clause has\nbeen executed.\n\nIn real world applications, the \"finally\" clause is useful for\nreleasing external resources (such as files or network connections),\nregardless of whether the use of the resource was successful.\n\n8.8. Predefined Clean-up Actions\n================================\n\nSome objects define standard clean-up actions to be undertaken when\nthe object is no longer needed, regardless of whether or not the\noperation using the object succeeded or failed. Look at the following\nexample, which tries to open a file and print its contents to the\nscreen.\n\n for line in open(\"myfile.txt\"):\n print(line, end=\"\")\n\nThe problem with this code is that it leaves the file open for an\nindeterminate amount of time after this part of the code has finished\nexecuting. This is not an issue in simple scripts, but can be a\nproblem for larger applications. The \"with\" statement allows objects\nlike files to be used in a way that ensures they are always cleaned up\npromptly and correctly.\n\n with open(\"myfile.txt\") as f:\n for line in f:\n print(line, end=\"\")\n\nAfter the statement is executed, the file *f* is always closed, even\nif a problem was encountered while processing the lines. Objects\nwhich, like files, provide predefined clean-up actions will indicate\nthis in their documentation.\n\n8.9. Raising and Handling Multiple Unrelated Exceptions\n=======================================================\n\nThere are situations where it is necessary to report several\nexceptions that have occurred. This is often the case in concurrency\nframeworks, when several tasks may have failed in parallel, but there\nare also other use cases where it is desirable to continue execution\nand collect multiple errors rather than raise the first exception.\n\nThe builtin \"ExceptionGroup\" wraps a list of exception instances so\nthat they can be raised together. It is an exception itself, so it can\nbe caught like any other exception.\n\n >>> def f():\n ... excs = [OSError('error 1'), SystemError('error 2')]\n ... raise ExceptionGroup('there were problems', excs)\n ...\n >>> f()\n + Exception Group Traceback (most recent call last):\n | File \"\", line 1, in \n | f()\n | ~^^\n | File \"\", line 3, in f\n | raise ExceptionGroup('there were problems', excs)\n | ExceptionGroup: there were problems (2 sub-exceptions)\n +-+---------------- 1 ----------------\n | OSError: error 1\n +---------------- 2 ----------------\n | SystemError: error 2\n +------------------------------------\n >>> try:\n ... f()\n ... except Exception as e:\n ... print(f'caught {type(e)}: e')\n ...\n caught : e\n >>>\n\nBy using \"except*\" instead of \"except\", we can selectively handle only\nthe exceptions in the group that match a certain type. In the\nfollowing example, which shows a nested exception group, each\n\"except*\" clause extracts from the group exceptions of a certain type\nwhile letting all other exceptions propagate to other clauses and\neventually to be reraised.\n\n >>> def f():\n ... raise ExceptionGroup(\n ... \"group1\",\n ... [\n ... OSError(1),\n ... SystemError(2),\n ... ExceptionGroup(\n ... \"group2\",\n ... [\n ... OSError(3),\n ... RecursionError(4)\n ... ]\n ... )\n ... ]\n ... )\n ...\n >>> try:\n ... f()\n ... except* OSError as e:\n ... print(\"There were OSErrors\")\n ... except* SystemError as e:\n ... print(\"There were SystemErrors\")\n ...\n There were OSErrors\n There were SystemErrors\n + Exception Group Traceback (most recent call last):\n | File \"\", line 2, in \n | f()\n | ~^^\n | File \"\", line 2, in f\n | raise ExceptionGroup(\n | ...<12 lines>...\n | )\n | ExceptionGroup: group1 (1 sub-exception)\n +-+---------------- 1 ----------------\n | ExceptionGroup: group2 (1 sub-exception)\n +-+---------------- 1 ----------------\n | RecursionError: 4\n +------------------------------------\n >>>\n\nNote that the exceptions nested in an exception group must be\ninstances, not types. This is because in practice the exceptions would\ntypically be ones that have already been raised and caught by the\nprogram, along the following pattern:\n\n >>> excs = []\n ... for test in tests:\n ... try:\n ... test.run()\n ... except Exception as e:\n ... excs.append(e)\n ...\n >>> if excs:\n ... raise ExceptionGroup(\"Test Failures\", excs)\n ...\n\n8.10. Enriching Exceptions with Notes\n=====================================\n\nWhen an exception is created in order to be raised, it is usually\ninitialized with information that describes the error that has\noccurred. There are cases where it is useful to add information after\nthe exception was caught. For this purpose, exceptions have a method\n\"add_note(note)\" that accepts a string and adds it to the exception's\nnotes list. The standard traceback rendering includes all notes, in\nthe order they were added, after the exception.\n\n >>> try:\n ... raise TypeError('bad type')\n ... except Exception as e:\n ... e.add_note('Add some information')\n ... e.add_note('Add some more information')\n ... raise\n ...\n Traceback (most recent call last):\n File \"\", line 2, in \n raise TypeError('bad type')\n TypeError: bad type\n Add some information\n Add some more information\n >>>\n\nFor example, when collecting exceptions into an exception group, we\nmay want to add context information for the individual errors. In the\nfollowing each exception in the group has a note indicating when this\nerror has occurred.\n\n >>> def f():\n ... raise OSError('operation failed')\n ...\n >>> excs = []\n >>> for i in range(3):\n ... try:\n ... f()\n ... except Exception as e:\n ... e.add_note(f'Happened in Iteration {i+1}')\n ... excs.append(e)\n ...\n >>> raise ExceptionGroup('We have some problems', excs)\n + Exception Group Traceback (most recent call last):\n | File \"\", line 1, in \n | raise ExceptionGroup('We have some problems', excs)\n | ExceptionGroup: We have some problems (3 sub-exceptions)\n +-+---------------- 1 ----------------\n | Traceback (most recent call last):\n | File \"\", line 3, in \n | f()\n | ~^^\n | File \"\", line 2, in f\n | raise OSError('operation failed')\n | OSError: operation failed\n | Happened in Iteration 1\n +---------------- 2 ----------------\n | Traceback (most recent call last):\n | File \"\", line 3, in \n | f()\n | ~^^\n | File \"\", line 2, in f\n | raise OSError('operation failed')\n | OSError: operation failed\n | Happened in Iteration 2\n +---------------- 3 ----------------\n | Traceback (most recent call last):\n | File \"\", line 3, in \n | f()\n | ~^^\n | File \"\", line 2, in f\n | raise OSError('operation failed')\n | OSError: operation failed\n | Happened in Iteration 3\n +------------------------------------\n >>>", "source": "python_docs:python-3.14-docs-text/tutorial/errors.txt", "domain": "software" }, { "text": "7. Input and Output\n*******************\n\nThere are several ways to present the output of a program; data can be\nprinted in a human-readable form, or written to a file for future use.\nThis chapter will discuss some of the possibilities.\n\n7.1. Fancier Output Formatting\n==============================\n\nSo far we've encountered two ways of writing values: *expression\nstatements* and the \"print()\" function. (A third way is using the\n\"write()\" method of file objects; the standard output file can be\nreferenced as \"sys.stdout\". See the Library Reference for more\ninformation on this.)\n\nOften you'll want more control over the formatting of your output than\nsimply printing space-separated values. There are several ways to\nformat output.\n\n* To use formatted string literals, begin a string with \"f\" or \"F\"\n before the opening quotation mark or triple quotation mark. Inside\n this string, you can write a Python expression between \"{\" and \"}\"\n characters that can refer to variables or literal values.\n\n >>> year = 2016\n >>> event = 'Referendum'\n >>> f'Results of the {year} {event}'\n 'Results of the 2016 Referendum'\n\n* The \"str.format()\" method of strings requires more manual effort.\n You'll still use \"{\" and \"}\" to mark where a variable will be\n substituted and can provide detailed formatting directives, but\n you'll also need to provide the information to be formatted. In the\n following code block there are two examples of how to format\n variables:\n\n >>> yes_votes = 42_572_654\n >>> total_votes = 85_705_149\n >>> percentage = yes_votes / total_votes\n >>> '{:-9} YES votes {:2.2%}'.format(yes_votes, percentage)\n ' 42572654 YES votes 49.67%'\n\n Notice how the \"yes_votes\" are padded with spaces and a negative\n sign only for negative numbers. The example also prints \"percentage\"\n multiplied by 100, with 2 decimal places and followed by a percent\n sign (see Format Specification Mini-Language for details).\n\n* Finally, you can do all the string handling yourself by using string\n slicing and concatenation operations to create any layout you can\n imagine. The string type has some methods that perform useful\n operations for padding strings to a given column width.\n\nWhen you don't need fancy output but just want a quick display of some\nvariables for debugging purposes, you can convert any value to a\nstring with the \"repr()\" or \"str()\" functions.\n\nThe \"str()\" function is meant to return representations of values\nwhich are fairly human-readable, while \"repr()\" is meant to generate\nrepresentations which can be read by the interpreter (or will force a\n\"SyntaxError\" if there is no equivalent syntax). For objects which\ndon't have a particular representation for human consumption, \"str()\"\nwill return the same value as \"repr()\". Many values, such as numbers\nor structures like lists and dictionaries, have the same\nrepresentation using either function. Strings, in particular, have\ntwo distinct representations.\n\nSome examples:\n\n >>> s = 'Hello, world.'\n >>> str(s)\n 'Hello, world.'\n >>> repr(s)\n \"'Hello, world.'\"\n >>> str(1/7)\n '0.14285714285714285'\n >>> x = 10 * 3.25\n >>> y = 200 * 200\n >>> s = 'The value of x is ' + repr(x) + ', and y is ' + repr(y) + '...'\n >>> print(s)\n The value of x is 32.5, and y is 40000...\n >>> # The repr() of a string adds string quotes and backslashes:\n >>> hello = 'hello, world\\n'\n >>> hellos = repr(hello)\n >>> print(hellos)\n 'hello, world\\n'\n >>> # The argument to repr() may be any Python object:\n >>> repr((x, y, ('spam', 'eggs')))\n \"(32.5, 40000, ('spam', 'eggs'))\"\n\nThe \"string\" module contains support for a simple templating approach\nbased upon regular expressions, via \"string.Template\". This offers yet\nanother way to substitute values into strings, using placeholders like\n\"$x\" and replacing them with values from a dictionary. This syntax is\neasy to use, although it offers much less control for formatting.\n\n7.1.1. Formatted String Literals\n--------------------------------\n\nFormatted string literals (also called f-strings for short) let you\ninclude the value of Python expressions inside a string by prefixing\nthe string with \"f\" or \"F\" and writing expressions as \"{expression}\".\n\nAn optional format specifier can follow the expression. This allows\ngreater control over how the value is formatted. The following example\nrounds pi to three places after the decimal:\n\n >>> import math\n >>> print(f'The value of pi is approximately {math.pi:.3f}.')\n The value of pi is approximately 3.142.\n\nPassing an integer after the \"':'\" will cause that field to be a\nminimum number of characters wide. This is useful for making columns\nline up.\n\n >>> table = {'Sjoerd': 4127, 'Jack': 4098, 'Dcab': 7678}\n >>> for name, phone in table.items():\n ... print(f'{name:10} ==> {phone:10d}')\n ...\n Sjoerd ==> 4127\n Jack ==> 4098\n Dcab ==> 7678\n\nOther modifiers can be used to convert the value before it is\nformatted. \"'!a'\" applies \"ascii()\", \"'!s'\" applies \"str()\", and\n\"'!r'\" applies \"repr()\":\n\n >>> animals = 'eels'\n >>> print(f'My hovercraft is full of {animals}.')\n My hovercraft is full of eels.\n >>> print(f'My hovercraft is full of {animals!r}.')\n My hovercraft is full of 'eels'.\n\nThe \"=\" specifier can be used to expand an expression to the text of\nthe expression, an equal sign, then the representation of the\nevaluated expression:\n\n>>> bugs = 'roaches'\n>>> count = 13\n>>> area = 'living room'\n>>> print(f'Debugging {bugs=} {count=} {area=}')\nDebugging bugs='roaches' count=13 area='living room'\n\nSee self-documenting expressions for more information on the \"=\"\nspecifier. For a reference on these format specifications, see the\nreference guide for the Format Specification Mini-Language.\n\n7.1.2. The String format() Method\n---------------------------------\n\nBasic usage of the \"str.format()\" method looks like this:\n\n >>> print('We are the {} who say \"{}!\"'.format('knights', 'Ni'))\n We are the knights who say \"Ni!\"\n\nThe brackets and characters within them (called format fields) are\nreplaced with the objects passed into the \"str.format()\" method. A\nnumber in the brackets can be used to refer to the position of the\nobject passed into the \"str.format()\" method.\n\n >>> print('{0} and {1}'.format('spam', 'eggs'))\n spam and eggs\n >>> print('{1} and {0}'.format('spam', 'eggs'))\n eggs and spam\n\nIf keyword arguments are used in the \"str.format()\" method, their\nvalues are referred to by using the name of the argument.\n\n >>> print('This {food} is {adjective}.'.format(\n ... food='spam', adjective='absolutely horrible'))\n This spam is absolutely horrible.\n\nPositional and keyword arguments can be arbitrarily combined:\n\n >>> print('The story of {0}, {1}, and {other}.'.format('Bill', 'Manfred',\n ... other='Georg'))\n The story of Bill, Manfred, and Georg.\n\nIf you have a really long format string that you don't want to split\nup, it would be nice if you could reference the variables to be\nformatted by name instead of by position. This can be done by simply\npassing the dict and using square brackets \"'[]'\" to access the keys.\n\n >>> table = {'Sjoerd': 4127, 'Jack': 4098, 'Dcab': 8637678}\n >>> print('Jack: {0[Jack]:d}; Sjoerd: {0[Sjoerd]:d}; '\n ... 'Dcab: {0[Dcab]:d}'.format(table))\n Jack: 4098; Sjoerd: 4127; Dcab: 8637678\n\nThis could also be done by passing the \"table\" dictionary as keyword\narguments with the \"**\" notation.\n\n >>> table = {'Sjoerd': 4127, 'Jack': 4098, 'Dcab': 8637678}\n >>> print('Jack: {Jack:d}; Sjoerd: {Sjoerd:d}; Dcab: {Dcab:d}'.format(**table))\n Jack: 4098; Sjoerd: 4127; Dcab: 8637678\n\nThis is particularly useful in combination with the built-in function\n\"vars()\", which returns a dictionary containing all local variables:\n\n >>> table = {k: str(v) for k, v in vars().items()}\n >>> message = \" \".join([f'{k}: ' + '{' + k +'};' for k in table.keys()])\n >>> print(message.format(**table))\n __name__: __main__; __doc__: None; __package__: None; __loader__: ...\n\nAs an example, the following lines produce a tidily aligned set of\ncolumns giving integers and their squares and cubes:\n\n >>> for x in range(1, 11):\n ... print('{0:2d} {1:3d} {2:4d}'.format(x, x*x, x*x*x))\n ...\n 1 1 1\n 2 4 8\n 3 9 27\n 4 16 64\n 5 25 125\n 6 36 216\n 7 49 343\n 8 64 512\n 9 81 729\n 10 100 1000\n\nFor a complete overview of string formatting with \"str.format()\", see\nFormat String Syntax.\n\n7.1.3. Manual String Formatting\n-------------------------------\n\nHere's the same table of squares and cubes, formatted manually:\n\n >>> for x in range(1, 11):\n ... print(repr(x).rjust(2), repr(x*x).rjust(3), end=' ')\n ... # Note use of 'end' on previous line\n ... print(repr(x*x*x).rjust(4))\n ...\n 1 1 1\n 2 4 8\n 3 9 27\n 4 16 64\n 5 25 125\n 6 36 216\n 7 49 343\n 8 64 512\n 9 81 729\n 10 100 1000\n\n(Note that the one space between each column was added by the way\n\"print()\" works: it always adds spaces between its arguments.)\n\nThe \"str.rjust()\" method of string objects right-justifies a string in\na field of a given width by padding it with spaces on the left. There\nare similar methods \"str.ljust()\" and \"str.center()\". These methods do\nnot write anything, they just return a new string. If the input string\nis too long, they don't truncate it, but return it unchanged; this\nwill mess up your column lay-out but that's usually better than the\nalternative, which would be lying about a value. (If you really want\ntruncation you can always add a slice operation, as in\n\"x.ljust(n)[:n]\".)\n\nThere is another method, \"str.zfill()\", which pads a numeric string on\nthe left with zeros. It understands about plus and minus signs:\n\n >>> '12'.zfill(5)\n '00012'\n >>> '-3.14'.zfill(7)\n '-003.14'\n >>> '3.14159265359'.zfill(5)\n '3.14159265359'\n\n7.1.4. Old string formatting\n----------------------------\n\nThe % operator (modulo) can also be used for string formatting. Given\n\"format % values\" (where *format* is a string), \"%\" conversion\nspecifications in *format* are replaced with zero or more elements of\n*values*. This operation is commonly known as string interpolation.\nFor example:\n\n >>> import math\n >>> print('The value of pi is approximately %5.3f.' % math.pi)\n The value of pi is approximately 3.142.\n\nMore information can be found in the printf-style String Formatting\nsection.\n\n7.2. Reading and Writing Files\n==============================\n\n\"open()\" returns a *file object*, and is most commonly used with two\npositional arguments and one keyword argument: \"open(filename, mode,\nencoding=None)\"\n\n >>> f = open('workfile', 'w', encoding=\"utf-8\")\n\nThe first argument is a string containing the filename. The second\nargument is another string containing a few characters describing the\nway in which the file will be used. *mode* can be \"'r'\" when the file\nwill only be read, \"'w'\" for only writing (an existing file with the\nsame name will be erased), and \"'a'\" opens the file for appending; any\ndata written to the file is automatically added to the end. \"'r+'\"\nopens the file for both reading and writing. The *mode* argument is\noptional; \"'r'\" will be assumed if it's omitted.\n\nNormally, files are opened in *text mode*, that means, you read and\nwrite strings from and to the file, which are encoded in a specific\n*encoding*. If *encoding* is not specified, the default is platform\ndependent (see \"open()\"). Because UTF-8 is the modern de-facto\nstandard, \"encoding=\"utf-8\"\" is recommended unless you know that you\nneed to use a different encoding. Appending a \"'b'\" to the mode opens\nthe file in *binary mode*. Binary mode data is read and written as\n\"bytes\" objects. You can not specify *encoding* when opening file in\nbinary mode.\n\nIn text mode, the default when reading is to convert platform-specific\nline endings (\"\\n\" on Unix, \"\\r\\n\" on Windows) to just \"\\n\". When\nwriting in text mode, the default is to convert occurrences of \"\\n\"\nback to platform-specific line endings. This behind-the-scenes\nmodification to file data is fine for text files, but will corrupt\nbinary data like that in \"JPEG\" or \"EXE\" files. Be very careful to\nuse binary mode when reading and writing such files.\n\nIt is good practice to use the \"with\" keyword when dealing with file\nobjects. The advantage is that the file is properly closed after its\nsuite finishes, even if an exception is raised at some point. Using\n\"with\" is also much shorter than writing equivalent \"try\"-\"finally\"\nblocks:\n\n >>> with open('workfile', encoding=\"utf-8\") as f:\n ... read_data = f.read()\n\n >>> # We can check that the file has been automatically closed.\n >>> f.closed\n True\n\nIf you're not using the \"with\" keyword, then you should call\n\"f.close()\" to close the file and immediately free up any system\nresources used by it.\n\nWarning:\n\n Calling \"f.write()\" without using the \"with\" keyword or calling\n \"f.close()\" **might** result in the arguments of \"f.write()\" not\n being completely written to the disk, even if the program exits\n successfully.\n\nAfter a file object is closed, either by a \"with\" statement or by\ncalling \"f.close()\", attempts to use the file object will\nautomatically fail.\n\n >>> f.close()\n >>> f.read()\n Traceback (most recent call last):\n File \"\", line 1, in \n ValueError: I/O operation on closed file.\n\n7.2.1. Methods of File Objects\n------------------------------\n\nThe rest of the examples in this section will assume that a file\nobject called \"f\" has already been created.\n\nTo read a file's contents, call \"f.read(size)\", which reads some\nquantity of data and returns it as a string (in text mode) or bytes\nobject (in binary mode). *size* is an optional numeric argument. When\n*size* is omitted or negative, the entire contents of the file will be\nread and returned; it's your problem if the file is twice as large as\nyour machine's memory. Otherwise, at most *size* characters (in text\nmode) or *size* bytes (in binary mode) are read and returned. If the\nend of the file has been reached, \"f.read()\" will return an empty\nstring (\"''\").\n\n >>> f.read()\n 'This is the entire file.\\n'\n >>> f.read()\n ''\n\n\"f.readline()\" reads a single line from the file; a newline character\n(\"\\n\") is left at the end of the string, and is only omitted on the\nlast line of the file if the file doesn't end in a newline. This\nmakes the return value unambiguous; if \"f.readline()\" returns an empty\nstring, the end of the file has been reached, while a blank line is\nrepresented by \"'\\n'\", a string containing only a single newline.\n\n >>> f.readline()\n 'This is the first line of the file.\\n'\n >>> f.readline()\n 'Second line of the file\\n'\n >>> f.readline()\n ''\n\nFor reading lines from a file, you can loop over the file object. This\nis memory efficient, fast, and leads to simple code:\n\n >>> for line in f:\n ... print(line, end='')\n ...\n This is the first line of the file.\n Second line of the file\n\nIf you want to read all the lines of a file in a list you can also use\n\"list(f)\" or \"f.readlines()\".\n\n\"f.write(string)\" writes the contents of *string* to the file,\nreturning the number of characters written.\n\n >>> f.write('This is a test\\n')\n 15\n\nOther types of objects need to be converted -- either to a string (in\ntext mode) or a bytes object (in binary mode) -- before writing them:\n\n >>> value = ('the answer', 42)\n >>> s = str(value) # convert the tuple to string\n >>> f.write(s)\n 18\n\n\"f.tell()\" returns an integer giving the file object's current\nposition in the file represented as number of bytes from the beginning\nof the file when in binary mode and an opaque number when in text\nmode.\n\nTo change the file object's position, use \"f.seek(offset, whence)\".\nThe position is computed from adding *offset* to a reference point;\nthe reference point is selected by the *whence* argument. A *whence*\nvalue of 0 measures from the beginning of the file, 1 uses the current\nfile position, and 2 uses the end of the file as the reference point.\n*whence* can be omitted and defaults to 0, using the beginning of the\nfile as the reference point.\n\n >>> f = open('workfile', 'rb+')\n >>> f.write(b'0123456789abcdef')\n 16\n >>> f.seek(5) # Go to the 6th byte in the file\n 5\n >>> f.read(1)\n b'5'\n >>> f.seek(-3, 2) # Go to the 3rd byte before the end\n 13\n >>> f.read(1)\n b'd'\n\nIn text files (those opened without a \"b\" in the mode string), only\nseeks relative to the beginning of the file are allowed (the exception\nbeing seeking to the very file end with \"seek(0, 2)\") and the only\nvalid *offset* values are those returned from the \"f.tell()\", or zero.\nAny other *offset* value produces undefined behaviour.\n\nFile objects have some additional methods, such as \"isatty()\" and\n\"truncate()\" which are less frequently used; consult the Library\nReference for a complete guide to file objects.\n\n7.2.2. Saving structured data with \"json\"\n-----------------------------------------\n\nStrings can easily be written to and read from a file. Numbers take a\nbit more effort, since the \"read()\" method only returns strings, which\nwill have to be passed to a function like \"int()\", which takes a\nstring like \"'123'\" and returns its numeric value 123. When you want\nto save more complex data types like nested lists and dictionaries,\nparsing and serializing by hand becomes complicated.\n\nRather than having users constantly writing and debugging code to save\ncomplicated data types to files, Python allows you to use the popular\ndata interchange format called JSON (JavaScript Object Notation). The\nstandard module called \"json\" can take Python data hierarchies, and\nconvert them to string representations; this process is called\n*serializing*. Reconstructing the data from the string representation\nis called *deserializing*. Between serializing and deserializing, the\nstring representing the object may have been stored in a file or data,\nor sent over a network connection to some distant machine.\n\nNote:\n\n The JSON format is commonly used by modern applications to allow for\n data exchange. Many programmers are already familiar with it, which\n makes it a good choice for interoperability.\n\nIf you have an object \"x\", you can view its JSON string representation\nwith a simple line of code:\n\n >>> import json\n >>> x = [1, 'simple', 'list']\n >>> json.dumps(x)\n '[1, \"simple\", \"list\"]'\n\nAnother variant of the \"dumps()\" function, called \"dump()\", simply\nserializes the object to a *text file*. So if \"f\" is a *text file*\nobject opened for writing, we can do this:\n\n json.dump(x, f)\n\nTo decode the object again, if \"f\" is a *binary file* or *text file*\nobject which has been opened for reading:\n\n x = json.load(f)\n\nNote:\n\n JSON files must be encoded in UTF-8. Use \"encoding=\"utf-8\"\" when\n opening JSON file as a *text file* for both of reading and writing.\n\nThis simple serialization technique can handle lists and dictionaries,\nbut serializing arbitrary class instances in JSON requires a bit of\nextra effort. The reference for the \"json\" module contains an\nexplanation of this.\n\nSee also:\n\n \"pickle\" - the pickle module\n\n Contrary to JSON, *pickle* is a protocol which allows the\n serialization of arbitrarily complex Python objects. As such, it is\n specific to Python and cannot be used to communicate with\n applications written in other languages. It is also insecure by\n default: deserializing pickle data coming from an untrusted source\n can execute arbitrary code, if the data was crafted by a skilled\n attacker.", "source": "python_docs:python-3.14-docs-text/tutorial/inputoutput.txt", "domain": "software" }, { "text": "5. Data Structures\n******************\n\nThis chapter describes some things you've learned about already in\nmore detail, and adds some new things as well.\n\n5.1. More on Lists\n==================\n\nThe list data type has some more methods. Here are all of the methods\nof list objects:\n\nlist.append(x)\n\n Add an item to the end of the list. Similar to \"a[len(a):] = [x]\".\n\nlist.extend(iterable)\n\n Extend the list by appending all the items from the iterable.\n Similar to \"a[len(a):] = iterable\".\n\nlist.insert(i, x)\n\n Insert an item at a given position. The first argument is the\n index of the element before which to insert, so \"a.insert(0, x)\"\n inserts at the front of the list, and \"a.insert(len(a), x)\" is\n equivalent to \"a.append(x)\".\n\nlist.remove(x)\n\n Remove the first item from the list whose value is equal to *x*.\n It raises a \"ValueError\" if there is no such item.\n\nlist.pop([i])\n\n Remove the item at the given position in the list, and return it.\n If no index is specified, \"a.pop()\" removes and returns the last\n item in the list. It raises an \"IndexError\" if the list is empty or\n the index is outside the list range.\n\nlist.clear()\n\n Remove all items from the list. Similar to \"del a[:]\".\n\nlist.index(x[, start[, end]])\n\n Return zero-based index of the first occurrence of *x* in the list.\n Raises a \"ValueError\" if there is no such item.\n\n The optional arguments *start* and *end* are interpreted as in the\n slice notation and are used to limit the search to a particular\n subsequence of the list. The returned index is computed relative\n to the beginning of the full sequence rather than the *start*\n argument.\n\nlist.count(x)\n\n Return the number of times *x* appears in the list.\n\nlist.sort(*, key=None, reverse=False)\n\n Sort the items of the list in place (the arguments can be used for\n sort customization, see \"sorted()\" for their explanation).\n\nlist.reverse()\n\n Reverse the elements of the list in place.\n\nlist.copy()\n\n Return a shallow copy of the list. Similar to \"a[:]\".\n\nAn example that uses most of the list methods:\n\n >>> fruits = ['orange', 'apple', 'pear', 'banana', 'kiwi', 'apple', 'banana']\n >>> fruits.count('apple')\n 2\n >>> fruits.count('tangerine')\n 0\n >>> fruits.index('banana')\n 3\n >>> fruits.index('banana', 4) # Find next banana starting at position 4\n 6\n >>> fruits.reverse()\n >>> fruits\n ['banana', 'apple', 'kiwi', 'banana', 'pear', 'apple', 'orange']\n >>> fruits.append('grape')\n >>> fruits\n ['banana', 'apple', 'kiwi', 'banana', 'pear', 'apple', 'orange', 'grape']\n >>> fruits.sort()\n >>> fruits\n ['apple', 'apple', 'banana', 'banana', 'grape', 'kiwi', 'orange', 'pear']\n >>> fruits.pop()\n 'pear'\n\nYou might have noticed that methods like \"insert\", \"remove\" or \"sort\"\nthat only modify the list have no return value printed -- they return\nthe default \"None\". [1] This is a design principle for all mutable\ndata structures in Python.\n\nAnother thing you might notice is that not all data can be sorted or\ncompared. For instance, \"[None, 'hello', 10]\" doesn't sort because\nintegers can't be compared to strings and \"None\" can't be compared to\nother types. Also, there are some types that don't have a defined\nordering relation. For example, \"3+4j < 5+7j\" isn't a valid\ncomparison.\n\n5.1.1. Using Lists as Stacks\n----------------------------\n\nThe list methods make it very easy to use a list as a stack, where the\nlast element added is the first element retrieved (\"last-in, first-\nout\"). To add an item to the top of the stack, use \"append()\". To\nretrieve an item from the top of the stack, use \"pop()\" without an\nexplicit index. For example:\n\n >>> stack = [3, 4, 5]\n >>> stack.append(6)\n >>> stack.append(7)\n >>> stack\n [3, 4, 5, 6, 7]\n >>> stack.pop()\n 7\n >>> stack\n [3, 4, 5, 6]\n >>> stack.pop()\n 6\n >>> stack.pop()\n 5\n >>> stack\n [3, 4]\n\n5.1.2. Using Lists as Queues\n----------------------------\n\nIt is also possible to use a list as a queue, where the first element\nadded is the first element retrieved (\"first-in, first-out\"); however,\nlists are not efficient for this purpose. While appends and pops from\nthe end of list are fast, doing inserts or pops from the beginning of\na list is slow (because all of the other elements have to be shifted\nby one).\n\nTo implement a queue, use \"collections.deque\" which was designed to\nhave fast appends and pops from both ends. For example:\n\n >>> from collections import deque\n >>> queue = deque([\"Eric\", \"John\", \"Michael\"])\n >>> queue.append(\"Terry\") # Terry arrives\n >>> queue.append(\"Graham\") # Graham arrives\n >>> queue.popleft() # The first to arrive now leaves\n 'Eric'\n >>> queue.popleft() # The second to arrive now leaves\n 'John'\n >>> queue # Remaining queue in order of arrival\n deque(['Michael', 'Terry', 'Graham'])\n\n5.1.3. List Comprehensions\n--------------------------\n\nList comprehensions provide a concise way to create lists. Common\napplications are to make new lists where each element is the result of\nsome operations applied to each member of another sequence or\niterable, or to create a subsequence of those elements that satisfy a\ncertain condition.\n\nFor example, assume we want to create a list of squares, like:\n\n >>> squares = []\n >>> for x in range(10):\n ... squares.append(x**2)\n ...\n >>> squares\n [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]\n\nNote that this creates (or overwrites) a variable named \"x\" that still\nexists after the loop completes. We can calculate the list of squares\nwithout any side effects using:\n\n squares = list(map(lambda x: x**2, range(10)))\n\nor, equivalently:\n\n squares = [x**2 for x in range(10)]\n\nwhich is more concise and readable.\n\nA list comprehension consists of brackets containing an expression\nfollowed by a \"for\" clause, then zero or more \"for\" or \"if\" clauses.\nThe result will be a new list resulting from evaluating the expression\nin the context of the \"for\" and \"if\" clauses which follow it. For\nexample, this listcomp combines the elements of two lists if they are\nnot equal:\n\n >>> [(x, y) for x in [1,2,3] for y in [3,1,4] if x != y]\n [(1, 3), (1, 4), (2, 3), (2, 1), (2, 4), (3, 1), (3, 4)]\n\nand it's equivalent to:\n\n >>> combs = []\n >>> for x in [1,2,3]:\n ... for y in [3,1,4]:\n ... if x != y:\n ... combs.append((x, y))\n ...\n >>> combs\n [(1, 3), (1, 4), (2, 3), (2, 1), (2, 4), (3, 1), (3, 4)]\n\nNote how the order of the \"for\" and \"if\" statements is the same in\nboth these snippets.\n\nIf the expression is a tuple (e.g. the \"(x, y)\" in the previous\nexample), it must be parenthesized.\n\n >>> vec = [-4, -2, 0, 2, 4]\n >>> # create a new list with the values doubled\n >>> [x*2 for x in vec]\n [-8, -4, 0, 4, 8]\n >>> # filter the list to exclude negative numbers\n >>> [x for x in vec if x >= 0]\n [0, 2, 4]\n >>> # apply a function to all the elements\n >>> [abs(x) for x in vec]\n [4, 2, 0, 2, 4]\n >>> # call a method on each element\n >>> freshfruit = [' banana', ' loganberry ', 'passion fruit ']\n >>> [weapon.strip() for weapon in freshfruit]\n ['banana', 'loganberry', 'passion fruit']\n >>> # create a list of 2-tuples like (number, square)\n >>> [(x, x**2) for x in range(6)]\n [(0, 0), (1, 1), (2, 4), (3, 9), (4, 16), (5, 25)]\n >>> # the tuple must be parenthesized, otherwise an error is raised\n >>> [x, x**2 for x in range(6)]\n File \"\", line 1\n [x, x**2 for x in range(6)]\n ^^^^^^^\n SyntaxError: did you forget parentheses around the comprehension target?\n >>> # flatten a list using a listcomp with two 'for'\n >>> vec = [[1,2,3], [4,5,6], [7,8,9]]\n >>> [num for elem in vec for num in elem]\n [1, 2, 3, 4, 5, 6, 7, 8, 9]\n\nList comprehensions can contain complex expressions and nested\nfunctions:\n\n >>> from math import pi\n >>> [str(round(pi, i)) for i in range(1, 6)]\n ['3.1', '3.14', '3.142', '3.1416', '3.14159']\n\n5.1.4. Nested List Comprehensions\n---------------------------------\n\nThe initial expression in a list comprehension can be any arbitrary\nexpression, including another list comprehension.\n\nConsider the following example of a 3x4 matrix implemented as a list\nof 3 lists of length 4:\n\n >>> matrix = [\n ... [1, 2, 3, 4],\n ... [5, 6, 7, 8],\n ... [9, 10, 11, 12],\n ... ]\n\nThe following list comprehension will transpose rows and columns:\n\n >>> [[row[i] for row in matrix] for i in range(4)]\n [[1, 5, 9], [2, 6, 10], [3, 7, 11], [4, 8, 12]]\n\nAs we saw in the previous section, the inner list comprehension is\nevaluated in the context of the \"for\" that follows it, so this example\nis equivalent to:\n\n >>> transposed = []\n >>> for i in range(4):\n ... transposed.append([row[i] for row in matrix])\n ...\n >>> transposed\n [[1, 5, 9], [2, 6, 10], [3, 7, 11], [4, 8, 12]]\n\nwhich, in turn, is the same as:\n\n >>> transposed = []\n >>> for i in range(4):\n ... # the following 3 lines implement the nested listcomp\n ... transposed_row = []\n ... for row in matrix:\n ... transposed_row.append(row[i])\n ... transposed.append(transposed_row)\n ...\n >>> transposed\n [[1, 5, 9], [2, 6, 10], [3, 7, 11], [4, 8, 12]]\n\nIn the real world, you should prefer built-in functions to complex\nflow statements. The \"zip()\" function would do a great job for this\nuse case:\n\n >>> list(zip(*matrix))\n [(1, 5, 9), (2, 6, 10), (3, 7, 11), (4, 8, 12)]\n\nSee Unpacking Argument Lists for details on the asterisk in this line.\n\n5.2. The \"del\" statement\n========================\n\nThere is a way to remove an item from a list given its index instead\nof its value: the \"del\" statement. This differs from the \"pop()\"\nmethod which returns a value. The \"del\" statement can also be used to\nremove slices from a list or clear the entire list (which we did\nearlier by assignment of an empty list to the slice). For example:\n\n >>> a = [-1, 1, 66.25, 333, 333, 1234.5]\n >>> del a[0]\n >>> a\n [1, 66.25, 333, 333, 1234.5]\n >>> del a[2:4]\n >>> a\n [1, 66.25, 1234.5]\n >>> del a[:]\n >>> a\n []\n\n\"del\" can also be used to delete entire variables:\n\n >>> del a\n\nReferencing the name \"a\" hereafter is an error (at least until another\nvalue is assigned to it). We'll find other uses for \"del\" later.\n\n5.3. Tuples and Sequences\n=========================\n\nWe saw that lists and strings have many common properties, such as\nindexing and slicing operations. They are two examples of *sequence*\ndata types (see Sequence Types --- list, tuple, range). Since Python\nis an evolving language, other sequence data types may be added.\nThere is also another standard sequence data type: the *tuple*.\n\nA tuple consists of a number of values separated by commas, for\ninstance:\n\n >>> t = 12345, 54321, 'hello!'\n >>> t[0]\n 12345\n >>> t\n (12345, 54321, 'hello!')\n >>> # Tuples may be nested:\n >>> u = t, (1, 2, 3, 4, 5)\n >>> u\n ((12345, 54321, 'hello!'), (1, 2, 3, 4, 5))\n >>> # Tuples are immutable:\n >>> t[0] = 88888\n Traceback (most recent call last):\n File \"\", line 1, in \n TypeError: 'tuple' object does not support item assignment\n >>> # but they can contain mutable objects:\n >>> v = ([1, 2, 3], [3, 2, 1])\n >>> v\n ([1, 2, 3], [3, 2, 1])\n\nAs you see, on output tuples are always enclosed in parentheses, so\nthat nested tuples are interpreted correctly; they may be input with\nor without surrounding parentheses, although often parentheses are\nnecessary anyway (if the tuple is part of a larger expression). It is\nnot possible to assign to the individual items of a tuple, however it\nis possible to create tuples which contain mutable objects, such as\nlists.\n\nThough tuples may seem similar to lists, they are often used in\ndifferent situations and for different purposes. Tuples are\n*immutable*, and usually contain a heterogeneous sequence of elements\nthat are accessed via unpacking (see later in this section) or\nindexing (or even by attribute in the case of \"namedtuples\"). Lists\nare *mutable*, and their elements are usually homogeneous and are\naccessed by iterating over the list.\n\nA special problem is the construction of tuples containing 0 or 1\nitems: the syntax has some extra quirks to accommodate these. Empty\ntuples are constructed by an empty pair of parentheses; a tuple with\none item is constructed by following a value with a comma (it is not\nsufficient to enclose a single value in parentheses). Ugly, but\neffective. For example:\n\n >>> empty = ()\n >>> singleton = 'hello', # <-- note trailing comma\n >>> len(empty)\n 0\n >>> len(singleton)\n 1\n >>> singleton\n ('hello',)\n\nThe statement \"t = 12345, 54321, 'hello!'\" is an example of *tuple\npacking*: the values \"12345\", \"54321\" and \"'hello!'\" are packed\ntogether in a tuple. The reverse operation is also possible:\n\n >>> x, y, z = t\n\nThis is called, appropriately enough, *sequence unpacking* and works\nfor any sequence on the right-hand side. Sequence unpacking requires\nthat there are as many variables on the left side of the equals sign\nas there are elements in the sequence. Note that multiple assignment\nis really just a combination of tuple packing and sequence unpacking.\n\n5.4. Sets\n=========\n\nPython also includes a data type for sets. A set is an unordered\ncollection with no duplicate elements. Basic uses include membership\ntesting and eliminating duplicate entries. Set objects also support\nmathematical operations like union, intersection, difference, and\nsymmetric difference.\n\nCurly braces or the \"set()\" function can be used to create sets.\nNote: to create an empty set you have to use \"set()\", not \"{}\"; the\nlatter creates an empty dictionary, a data structure that we discuss\nin the next section.\n\nBecause sets are unordered, iterating over them or printing them can\nproduce the elements in a different order than you expect.\n\nHere is a brief demonstration:\n\n >>> basket = {'apple', 'orange', 'apple', 'pear', 'orange', 'banana'}\n >>> print(basket) # show that duplicates have been removed\n {'orange', 'banana', 'pear', 'apple'}\n >>> 'orange' in basket # fast membership testing\n True\n >>> 'crabgrass' in basket\n False\n\n >>> # Demonstrate set operations on unique letters from two words\n >>>\n >>> a = set('abracadabra')\n >>> b = set('alacazam')\n >>> a # unique letters in a\n {'a', 'r', 'b', 'c', 'd'}\n >>> a - b # letters in a but not in b\n {'r', 'd', 'b'}\n >>> a | b # letters in a or b or both\n {'a', 'c', 'r', 'd', 'b', 'm', 'z', 'l'}\n >>> a & b # letters in both a and b\n {'a', 'c'}\n >>> a ^ b # letters in a or b but not both\n {'r', 'd', 'b', 'm', 'z', 'l'}\n\nSimilarly to list comprehensions, set comprehensions are also\nsupported:\n\n >>> a = {x for x in 'abracadabra' if x not in 'abc'}\n >>> a\n {'r', 'd'}\n\n5.5. Dictionaries\n=================\n\nAnother useful data type built into Python is the *dictionary* (see\nMapping Types --- dict). Dictionaries are sometimes found in other\nlanguages as \"associative memories\" or \"associative arrays\". Unlike\nsequences, which are indexed by a range of numbers, dictionaries are\nindexed by *keys*, which can be any immutable type; strings and\nnumbers can always be keys. Tuples can be used as keys if they\ncontain only strings, numbers, or tuples; if a tuple contains any\nmutable object either directly or indirectly, it cannot be used as a\nkey. You can't use lists as keys, since lists can be modified in place\nusing index assignments, slice assignments, or methods like \"append()\"\nand \"extend()\".\n\nIt is best to think of a dictionary as a set of *key: value* pairs,\nwith the requirement that the keys are unique (within one dictionary).\nA pair of braces creates an empty dictionary: \"{}\". Placing a comma-\nseparated list of key:value pairs within the braces adds initial\nkey:value pairs to the dictionary; this is also the way dictionaries\nare written on output.\n\nThe main operations on a dictionary are storing a value with some key\nand extracting the value given the key. It is also possible to delete\na key:value pair with \"del\". If you store using a key that is already\nin use, the old value associated with that key is forgotten.\n\nExtracting a value for a non-existent key by subscripting (\"d[key]\")\nraises a \"KeyError\". To avoid getting this error when trying to access\na possibly non-existent key, use the \"get()\" method instead, which\nreturns \"None\" (or a specified default value) if the key is not in the\ndictionary.\n\nPerforming \"list(d)\" on a dictionary returns a list of all the keys\nused in the dictionary, in insertion order (if you want it sorted,\njust use \"sorted(d)\" instead). To check whether a single key is in the\ndictionary, use the \"in\" keyword.\n\nHere is a small example using a dictionary:\n\n >>> tel = {'jack': 4098, 'sape': 4139}\n >>> tel['guido'] = 4127\n >>> tel\n {'jack': 4098, 'sape': 4139, 'guido': 4127}\n >>> tel['jack']\n 4098\n >>> tel['irv']\n Traceback (most recent call last):\n File \"\", line 1, in \n KeyError: 'irv'\n >>> print(tel.get('irv'))\n None\n >>> del tel['sape']\n >>> tel['irv'] = 4127\n >>> tel\n {'jack': 4098, 'guido': 4127, 'irv': 4127}\n >>> list(tel)\n ['jack', 'guido', 'irv']\n >>> sorted(tel)\n ['guido', 'irv', 'jack']\n >>> 'guido' in tel\n True\n >>> 'jack' not in tel\n False\n\nThe \"dict()\" constructor builds dictionaries directly from sequences\nof key-value pairs:\n\n >>> dict([('sape', 4139), ('guido', 4127), ('jack', 4098)])\n {'sape': 4139, 'guido': 4127, 'jack': 4098}\n\nIn addition, dict comprehensions can be used to create dictionaries\nfrom arbitrary key and value expressions:\n\n >>> {x: x**2 for x in (2, 4, 6)}\n {2: 4, 4: 16, 6: 36}\n\nWhen the keys are simple strings, it is sometimes easier to specify\npairs using keyword arguments:\n\n >>> dict(sape=4139, guido=4127, jack=4098)\n {'sape': 4139, 'guido': 4127, 'jack': 4098}\n\n5.6. Looping Techniques\n=======================\n\nWhen looping through dictionaries, the key and corresponding value can\nbe retrieved at the same time using the \"items()\" method.\n\n >>> knights = {'gallahad': 'the pure', 'robin': 'the brave'}\n >>> for k, v in knights.items():\n ... print(k, v)\n ...\n gallahad the pure\n robin the brave\n\nWhen looping through a sequence, the position index and corresponding\nvalue can be retrieved at the same time using the \"enumerate()\"\nfunction.\n\n >>> for i, v in enumerate(['tic', 'tac', 'toe']):\n ... print(i, v)\n ...\n 0 tic\n 1 tac\n 2 toe\n\nTo loop over two or more sequences at the same time, the entries can\nbe paired with the \"zip()\" function.\n\n >>> questions = ['name', 'quest', 'favorite color']\n >>> answers = ['lancelot', 'the holy grail', 'blue']\n >>> for q, a in zip(questions, answers):\n ... print('What is your {0}? It is {1}.'.format(q, a))\n ...\n What is your name? It is lancelot.\n What is your quest? It is the holy grail.\n What is your favorite color? It is blue.\n\nTo loop over a sequence in reverse, first specify the sequence in a\nforward direction and then call the \"reversed()\" function.\n\n >>> for i in reversed(range(1, 10, 2)):\n ... print(i)\n ...\n 9\n 7\n 5\n 3\n 1\n\nTo loop over a sequence in sorted order, use the \"sorted()\" function\nwhich returns a new sorted list while leaving the source unaltered.\n\n >>> basket = ['apple', 'orange', 'apple', 'pear', 'orange', 'banana']\n >>> for i in sorted(basket):\n ... print(i)\n ...\n apple\n apple\n banana\n orange\n orange\n pear\n\nUsing \"set()\" on a sequence eliminates duplicate elements. The use of\n\"sorted()\" in combination with \"set()\" over a sequence is an idiomatic\nway to loop over unique elements of the sequence in sorted order.\n\n >>> basket = ['apple', 'orange', 'apple', 'pear', 'orange', 'banana']\n >>> for f in sorted(set(basket)):\n ... print(f)\n ...\n apple\n banana\n orange\n pear\n\nIt is sometimes tempting to change a list while you are looping over\nit; however, it is often simpler and safer to create a new list\ninstead.\n\n >>> import math\n >>> raw_data = [56.2, float('NaN'), 51.7, 55.3, 52.5, float('NaN'), 47.8]\n >>> filtered_data = []\n >>> for value in raw_data:\n ... if not math.isnan(value):\n ... filtered_data.append(value)\n ...\n >>> filtered_data\n [56.2, 51.7, 55.3, 52.5, 47.8]\n\n5.7. More on Conditions\n=======================\n\nThe conditions used in \"while\" and \"if\" statements can contain any\noperators, not just comparisons.\n\nThe comparison operators \"in\" and \"not in\" are membership tests that\ndetermine whether a value is in (or not in) a container. The\noperators \"is\" and \"is not\" compare whether two objects are really the\nsame object. All comparison operators have the same priority, which\nis lower than that of all numerical operators.\n\nComparisons can be chained. For example, \"a < b == c\" tests whether\n\"a\" is less than \"b\" and moreover \"b\" equals \"c\".\n\nComparisons may be combined using the Boolean operators \"and\" and\n\"or\", and the outcome of a comparison (or of any other Boolean\nexpression) may be negated with \"not\". These have lower priorities\nthan comparison operators; between them, \"not\" has the highest\npriority and \"or\" the lowest, so that \"A and not B or C\" is equivalent\nto \"(A and (not B)) or C\". As always, parentheses can be used to\nexpress the desired composition.\n\nThe Boolean operators \"and\" and \"or\" are so-called *short-circuit*\noperators: their arguments are evaluated from left to right, and\nevaluation stops as soon as the outcome is determined. For example,\nif \"A\" and \"C\" are true but \"B\" is false, \"A and B and C\" does not\nevaluate the expression \"C\". When used as a general value and not as\na Boolean, the return value of a short-circuit operator is the last\nevaluated argument.\n\nIt is possible to assign the result of a comparison or other Boolean\nexpression to a variable. For example,\n\n >>> string1, string2, string3 = '', 'Trondheim', 'Hammer Dance'\n >>> non_null = string1 or string2 or string3\n >>> non_null\n 'Trondheim'\n\nNote that in Python, unlike C, assignment inside expressions must be\ndone explicitly with the walrus operator \":=\". This avoids a common\nclass of problems encountered in C programs: typing \"=\" in an\nexpression when \"==\" was intended.\n\n5.8. Comparing Sequences and Other Types\n========================================\n\nSequence objects typically may be compared to other objects with the\nsame sequence type. The comparison uses *lexicographical* ordering:\nfirst the first two items are compared, and if they differ this\ndetermines the outcome of the comparison; if they are equal, the next\ntwo items are compared, and so on, until either sequence is exhausted.\nIf two items to be compared are themselves sequences of the same type,\nthe lexicographical comparison is carried out recursively. If all\nitems of two sequences compare equal, the sequences are considered\nequal. If one sequence is an initial sub-sequence of the other, the\nshorter sequence is the smaller (lesser) one. Lexicographical\nordering for strings uses the Unicode code point number to order\nindividual characters. Some examples of comparisons between sequences\nof the same type:\n\n (1, 2, 3) < (1, 2, 4)\n [1, 2, 3] < [1, 2, 4]\n 'ABC' < 'C' < 'Pascal' < 'Python'\n (1, 2, 3, 4) < (1, 2, 4)\n (1, 2) < (1, 2, -1)\n (1, 2, 3) == (1.0, 2.0, 3.0)\n (1, 2, ('aa', 'ab')) < (1, 2, ('abc', 'a'), 4)\n\nNote that comparing objects of different types with \"<\" or \">\" is\nlegal provided that the objects have appropriate comparison methods.\nFor example, mixed numeric types are compared according to their\nnumeric value, so 0 equals 0.0, etc. Otherwise, rather than providing\nan arbitrary ordering, the interpreter will raise a \"TypeError\"\nexception.\n\n-[ Footnotes ]-\n\n[1] Other languages may return the mutated object, which allows method\n chaining, such as \"d->insert(\"a\")->remove(\"b\")->sort();\".", "source": "python_docs:python-3.14-docs-text/tutorial/datastructures.txt", "domain": "software" }, { "text": "12. Virtual Environments and Packages\n*************************************\n\n12.1. Introduction\n==================\n\nPython applications will often use packages and modules that don't\ncome as part of the standard library. Applications will sometimes\nneed a specific version of a library, because the application may\nrequire that a particular bug has been fixed or the application may be\nwritten using an obsolete version of the library's interface.\n\nThis means it may not be possible for one Python installation to meet\nthe requirements of every application. If application A needs version\n1.0 of a particular module but application B needs version 2.0, then\nthe requirements are in conflict and installing either version 1.0 or\n2.0 will leave one application unable to run.\n\nThe solution for this problem is to create a *virtual environment*, a\nself-contained directory tree that contains a Python installation for\na particular version of Python, plus a number of additional packages.\n\nDifferent applications can then use different virtual environments. To\nresolve the earlier example of conflicting requirements, application A\ncan have its own virtual environment with version 1.0 installed while\napplication B has another virtual environment with version 2.0. If\napplication B requires a library be upgraded to version 3.0, this will\nnot affect application A's environment.\n\n12.2. Creating Virtual Environments\n===================================\n\nThe module used to create and manage virtual environments is called\n\"venv\". \"venv\" will install the Python version from which the command\nwas run (as reported by the \"--version\" option). For instance,\nexecuting the command with \"python3.12\" will install version 3.12.\n\nTo create a virtual environment, decide upon a directory where you\nwant to place it, and run the \"venv\" module as a script with the\ndirectory path:\n\n python -m venv tutorial-env\n\nThis will create the \"tutorial-env\" directory if it doesn't exist, and\nalso create directories inside it containing a copy of the Python\ninterpreter and various supporting files.\n\nA common directory location for a virtual environment is \".venv\". This\nname keeps the directory typically hidden in your shell and thus out\nof the way while giving it a name that explains why the directory\nexists. It also prevents clashing with \".env\" environment variable\ndefinition files that some tooling supports.\n\nOnce you've created a virtual environment, you may activate it.\n\nOn Windows, run:\n\n tutorial-env\\Scripts\\activate\n\nOn Unix or MacOS, run:\n\n source tutorial-env/bin/activate\n\n(This script is written for the bash shell. If you use the **csh** or\n**fish** shells, there are alternate \"activate.csh\" and\n\"activate.fish\" scripts you should use instead.)\n\nActivating the virtual environment will change your shell's prompt to\nshow what virtual environment you're using, and modify the environment\nso that running \"python\" will get you that particular version and\ninstallation of Python. For example:\n\n $ source ~/envs/tutorial-env/bin/activate\n (tutorial-env) $ python\n Python 3.5.1 (default, May 6 2016, 10:59:36)\n ...\n >>> import sys\n >>> sys.path\n ['', '/usr/local/lib/python35.zip', ...,\n '~/envs/tutorial-env/lib/python3.5/site-packages']\n >>>\n\nTo deactivate a virtual environment, type:\n\n deactivate\n\ninto the terminal.\n\n12.3. Managing Packages with pip\n================================\n\nYou can install, upgrade, and remove packages using a program called\n**pip**. By default \"pip\" will install packages from the Python\nPackage Index. You can browse the Python Package Index by going to it\nin your web browser.\n\n\"pip\" has a number of subcommands: \"install\", \"uninstall\", \"freeze\",\netc. (Consult the Installing Python Modules guide for complete\ndocumentation for \"pip\".)\n\nYou can install the latest version of a package by specifying a\npackage's name:\n\n (tutorial-env) $ python -m pip install novas\n Collecting novas\n Downloading novas-3.1.1.3.tar.gz (136kB)\n Installing collected packages: novas\n Running setup.py install for novas\n Successfully installed novas-3.1.1.3\n\nYou can also install a specific version of a package by giving the\npackage name followed by \"==\" and the version number:\n\n (tutorial-env) $ python -m pip install requests==2.6.0\n Collecting requests==2.6.0\n Using cached requests-2.6.0-py2.py3-none-any.whl\n Installing collected packages: requests\n Successfully installed requests-2.6.0\n\nIf you re-run this command, \"pip\" will notice that the requested\nversion is already installed and do nothing. You can supply a\ndifferent version number to get that version, or you can run \"python\n-m pip install --upgrade\" to upgrade the package to the latest\nversion:\n\n (tutorial-env) $ python -m pip install --upgrade requests\n Collecting requests\n Installing collected packages: requests\n Found existing installation: requests 2.6.0\n Uninstalling requests-2.6.0:\n Successfully uninstalled requests-2.6.0\n Successfully installed requests-2.7.0\n\n\"python -m pip uninstall\" followed by one or more package names will\nremove the packages from the virtual environment.\n\n\"python -m pip show\" will display information about a particular\npackage:\n\n (tutorial-env) $ python -m pip show requests\n ---\n Metadata-Version: 2.0\n Name: requests\n Version: 2.7.0\n Summary: Python HTTP for Humans.\n Home-page: http://python-requests.org\n Author: Kenneth Reitz\n Author-email: me@kennethreitz.com\n License: Apache 2.0\n Location: /Users/akuchling/envs/tutorial-env/lib/python3.4/site-packages\n Requires:\n\n\"python -m pip list\" will display all of the packages installed in the\nvirtual environment:\n\n (tutorial-env) $ python -m pip list\n novas (3.1.1.3)\n numpy (1.9.2)\n pip (7.0.3)\n requests (2.7.0)\n setuptools (16.0)\n\n\"python -m pip freeze\" will produce a similar list of the installed\npackages, but the output uses the format that \"python -m pip install\"\nexpects. A common convention is to put this list in a\n\"requirements.txt\" file:\n\n (tutorial-env) $ python -m pip freeze > requirements.txt\n (tutorial-env) $ cat requirements.txt\n novas==3.1.1.3\n numpy==1.9.2\n requests==2.7.0\n\nThe \"requirements.txt\" can then be committed to version control and\nshipped as part of an application. Users can then install all the\nnecessary packages with \"install -r\":\n\n (tutorial-env) $ python -m pip install -r requirements.txt\n Collecting novas==3.1.1.3 (from -r requirements.txt (line 1))\n ...\n Collecting numpy==1.9.2 (from -r requirements.txt (line 2))\n ...\n Collecting requests==2.7.0 (from -r requirements.txt (line 3))\n ...\n Installing collected packages: novas, numpy, requests\n Running setup.py install for novas\n Successfully installed novas-3.1.1.3 numpy-1.9.2 requests-2.7.0\n\n\"pip\" has many more options. Consult the Installing Python Modules\nguide for complete documentation for \"pip\". When you've written a\npackage and want to make it available on the Python Package Index,\nconsult the Python packaging user guide.", "source": "python_docs:python-3.14-docs-text/tutorial/venv.txt", "domain": "software" }, { "text": "4. Execution model\n******************\n\n4.1. Structure of a program\n===========================\n\nA Python program is constructed from code blocks. A *block* is a piece\nof Python program text that is executed as a unit. The following are\nblocks: a module, a function body, and a class definition. Each\ncommand typed interactively is a block. A script file (a file given\nas standard input to the interpreter or specified as a command line\nargument to the interpreter) is a code block. A script command (a\ncommand specified on the interpreter command line with the \"-c\"\noption) is a code block. A module run as a top level script (as module\n\"__main__\") from the command line using a \"-m\" argument is also a code\nblock. The string argument passed to the built-in functions \"eval()\"\nand \"exec()\" is a code block.\n\nA code block is executed in an *execution frame*. A frame contains\nsome administrative information (used for debugging) and determines\nwhere and how execution continues after the code block's execution has\ncompleted.\n\n4.2. Naming and binding\n=======================\n\n4.2.1. Binding of names\n-----------------------\n\n*Names* refer to objects. Names are introduced by name binding\noperations.\n\nThe following constructs bind names:\n\n* formal parameters to functions,\n\n* class definitions,\n\n* function definitions,\n\n* assignment expressions,\n\n* targets that are identifiers if occurring in an assignment:\n\n * \"for\" loop header,\n\n * after \"as\" in a \"with\" statement, \"except\" clause, \"except*\"\n clause, or in the as-pattern in structural pattern matching,\n\n * in a capture pattern in structural pattern matching\n\n* \"import\" statements.\n\n* \"type\" statements.\n\n* type parameter lists.\n\nThe \"import\" statement of the form \"from ... import *\" binds all names\ndefined in the imported module, except those beginning with an\nunderscore. This form may only be used at the module level.\n\nA target occurring in a \"del\" statement is also considered bound for\nthis purpose (though the actual semantics are to unbind the name).\n\nEach assignment or import statement occurs within a block defined by a\nclass or function definition or at the module level (the top-level\ncode block).\n\nIf a name is bound in a block, it is a local variable of that block,\nunless declared as \"nonlocal\" or \"global\". If a name is bound at the\nmodule level, it is a global variable. (The variables of the module\ncode block are local and global.) If a variable is used in a code\nblock but not defined there, it is a *free variable*.\n\nEach occurrence of a name in the program text refers to the *binding*\nof that name established by the following name resolution rules.\n\n4.2.2. Resolution of names\n--------------------------\n\nA *scope* defines the visibility of a name within a block. If a local\nvariable is defined in a block, its scope includes that block. If the\ndefinition occurs in a function block, the scope extends to any blocks\ncontained within the defining one, unless a contained block introduces\na different binding for the name.\n\nWhen a name is used in a code block, it is resolved using the nearest\nenclosing scope. The set of all such scopes visible to a code block\nis called the block's *environment*.\n\nWhen a name is not found at all, a \"NameError\" exception is raised. If\nthe current scope is a function scope, and the name refers to a local\nvariable that has not yet been bound to a value at the point where the\nname is used, an \"UnboundLocalError\" exception is raised.\n\"UnboundLocalError\" is a subclass of \"NameError\".\n\nIf a name binding operation occurs anywhere within a code block, all\nuses of the name within the block are treated as references to the\ncurrent block. This can lead to errors when a name is used within a\nblock before it is bound. This rule is subtle. Python lacks\ndeclarations and allows name binding operations to occur anywhere\nwithin a code block. The local variables of a code block can be\ndetermined by scanning the entire text of the block for name binding\noperations. See the FAQ entry on UnboundLocalError for examples.\n\nIf the \"global\" statement occurs within a block, all uses of the names\nspecified in the statement refer to the bindings of those names in the\ntop-level namespace. Names are resolved in the top-level namespace by\nsearching the global namespace, i.e. the namespace of the module\ncontaining the code block, and the builtins namespace, the namespace\nof the module \"builtins\". The global namespace is searched first. If\nthe names are not found there, the builtins namespace is searched\nnext. If the names are also not found in the builtins namespace, new\nvariables are created in the global namespace. The global statement\nmust precede all uses of the listed names.\n\nThe \"global\" statement has the same scope as a name binding operation\nin the same block. If the nearest enclosing scope for a free variable\ncontains a global statement, the free variable is treated as a global.\n\nThe \"nonlocal\" statement causes corresponding names to refer to\npreviously bound variables in the nearest enclosing function scope.\n\"SyntaxError\" is raised at compile time if the given name does not\nexist in any enclosing function scope. Type parameters cannot be\nrebound with the \"nonlocal\" statement.\n\nThe namespace for a module is automatically created the first time a\nmodule is imported. The main module for a script is always called\n\"__main__\".\n\nClass definition blocks and arguments to \"exec()\" and \"eval()\" are\nspecial in the context of name resolution. A class definition is an\nexecutable statement that may use and define names. These references\nfollow the normal rules for name resolution with an exception that\nunbound local variables are looked up in the global namespace. The\nnamespace of the class definition becomes the attribute dictionary of\nthe class. The scope of names defined in a class block is limited to\nthe class block; it does not extend to the code blocks of methods.\nThis includes comprehensions and generator expressions, but it does\nnot include annotation scopes, which have access to their enclosing\nclass scopes. This means that the following will fail:\n\n class A:\n a = 42\n b = list(a + i for i in range(10))\n\nHowever, the following will succeed:\n\n class A:\n type Alias = Nested\n class Nested: pass\n\n print(A.Alias.__value__) # \n\n4.2.3. Annotation scopes\n------------------------\n\n*Annotations*, type parameter lists and \"type\" statements introduce\n*annotation scopes*, which behave mostly like function scopes, but\nwith some exceptions discussed below.\n\nAnnotation scopes are used in the following contexts:\n\n* *Function annotations*.\n\n* *Variable annotations*.\n\n* Type parameter lists for generic type aliases.\n\n* Type parameter lists for generic functions. A generic function's\n annotations are executed within the annotation scope, but its\n defaults and decorators are not.\n\n* Type parameter lists for generic classes. A generic class's base\n classes and keyword arguments are executed within the annotation\n scope, but its decorators are not.\n\n* The bounds, constraints, and default values for type parameters\n (lazily evaluated).\n\n* The value of type aliases (lazily evaluated).\n\nAnnotation scopes differ from function scopes in the following ways:\n\n* Annotation scopes have access to their enclosing class namespace. If\n an annotation scope is immediately within a class scope, or within\n another annotation scope that is immediately within a class scope,\n the code in the annotation scope can use names defined in the class\n scope as if it were executed directly within the class body. This\n contrasts with regular functions defined within classes, which\n cannot access names defined in the class scope.\n\n* Expressions in annotation scopes cannot contain \"yield\", \"yield\n from\", \"await\", or \":=\" expressions. (These expressions are allowed\n in other scopes contained within the annotation scope.)\n\n* Names defined in annotation scopes cannot be rebound with \"nonlocal\"\n statements in inner scopes. This includes only type parameters, as\n no other syntactic elements that can appear within annotation scopes\n can introduce new names.\n\n* While annotation scopes have an internal name, that name is not\n reflected in the *qualified name* of objects defined within the\n scope. Instead, the \"__qualname__\" of such objects is as if the\n object were defined in the enclosing scope.\n\nAdded in version 3.12: Annotation scopes were introduced in Python\n3.12 as part of **PEP 695**.\n\nChanged in version 3.13: Annotation scopes are also used for type\nparameter defaults, as introduced by **PEP 696**.\n\nChanged in version 3.14: Annotation scopes are now also used for\nannotations, as specified in **PEP 649** and **PEP 749**.\n\n4.2.4. Lazy evaluation\n----------------------\n\nMost annotation scopes are *lazily evaluated*. This includes\nannotations, the values of type aliases created through the \"type\"\nstatement, and the bounds, constraints, and default values of type\nvariables created through the type parameter syntax. This means that\nthey are not evaluated when the type alias or type variable is\ncreated, or when the object carrying annotations is created. Instead,\nthey are only evaluated when necessary, for example when the\n\"__value__\" attribute on a type alias is accessed.\n\nExample:\n\n >>> type Alias = 1/0\n >>> Alias.__value__\n Traceback (most recent call last):\n ...\n ZeroDivisionError: division by zero\n >>> def func[T: 1/0](): pass\n >>> T = func.__type_params__[0]\n >>> T.__bound__\n Traceback (most recent call last):\n ...\n ZeroDivisionError: division by zero\n\nHere the exception is raised only when the \"__value__\" attribute of\nthe type alias or the \"__bound__\" attribute of the type variable is\naccessed.\n\nThis behavior is primarily useful for references to types that have\nnot yet been defined when the type alias or type variable is created.\nFor example, lazy evaluation enables creation of mutually recursive\ntype aliases:\n\n from typing import Literal\n\n type SimpleExpr = int | Parenthesized\n type Parenthesized = tuple[Literal[\"(\"], Expr, Literal[\")\"]]\n type Expr = SimpleExpr | tuple[SimpleExpr, Literal[\"+\", \"-\"], Expr]\n\nLazily evaluated values are evaluated in annotation scope, which means\nthat names that appear inside the lazily evaluated value are looked up\nas if they were used in the immediately enclosing scope.\n\nAdded in version 3.12.\n\n4.2.5. Builtins and restricted execution\n----------------------------------------\n\n**CPython implementation detail:** Users should not touch\n\"__builtins__\"; it is strictly an implementation detail. Users\nwanting to override values in the builtins namespace should \"import\"\nthe \"builtins\" module and modify its attributes appropriately.\n\nThe builtins namespace associated with the execution of a code block\nis actually found by looking up the name \"__builtins__\" in its global\nnamespace; this should be a dictionary or a module (in the latter case\nthe module's dictionary is used). By default, when in the \"__main__\"\nmodule, \"__builtins__\" is the built-in module \"builtins\"; when in any\nother module, \"__builtins__\" is an alias for the dictionary of the\n\"builtins\" module itself.\n\n4.2.6. Interaction with dynamic features\n----------------------------------------\n\nName resolution of free variables occurs at runtime, not at compile\ntime. This means that the following code will print 42:\n\n i = 10\n def f():\n print(i)\n i = 42\n f()\n\nThe \"eval()\" and \"exec()\" functions do not have access to the full\nenvironment for resolving names. Names may be resolved in the local\nand global namespaces of the caller. Free variables are not resolved\nin the nearest enclosing namespace, but in the global namespace. [1]\nThe \"exec()\" and \"eval()\" functions have optional arguments to\noverride the global and local namespace. If only one namespace is\nspecified, it is used for both.\n\n4.3. Exceptions\n===============\n\nExceptions are a means of breaking out of the normal flow of control\nof a code block in order to handle errors or other exceptional\nconditions. An exception is *raised* at the point where the error is\ndetected; it may be *handled* by the surrounding code block or by any\ncode block that directly or indirectly invoked the code block where\nthe error occurred.\n\nThe Python interpreter raises an exception when it detects a run-time\nerror (such as division by zero). A Python program can also\nexplicitly raise an exception with the \"raise\" statement. Exception\nhandlers are specified with the \"try\" ... \"except\" statement. The\n\"finally\" clause of such a statement can be used to specify cleanup\ncode which does not handle the exception, but is executed whether an\nexception occurred or not in the preceding code.\n\nPython uses the \"termination\" model of error handling: an exception\nhandler can find out what happened and continue execution at an outer\nlevel, but it cannot repair the cause of the error and retry the\nfailing operation (except by re-entering the offending piece of code\nfrom the top).\n\nWhen an exception is not handled at all, the interpreter terminates\nexecution of the program, or returns to its interactive main loop. In\neither case, it prints a stack traceback, except when the exception is\n\"SystemExit\".\n\nExceptions are identified by class instances. The \"except\" clause is\nselected depending on the class of the instance: it must reference the\nclass of the instance or a *non-virtual base class* thereof. The\ninstance can be received by the handler and can carry additional\ninformation about the exceptional condition.\n\nNote:\n\n Exception messages are not part of the Python API. Their contents\n may change from one version of Python to the next without warning\n and should not be relied on by code which will run under multiple\n versions of the interpreter.\n\nSee also the description of the \"try\" statement in section The try\nstatement and \"raise\" statement in section The raise statement.\n\n4.4. Runtime Components\n=======================\n\n4.4.1. General Computing Model\n------------------------------\n\nPython's execution model does not operate in a vacuum. It runs on a\nhost machine and through that host's runtime environment, including\nits operating system (OS), if there is one. When a program runs, the\nconceptual layers of how it runs on the host look something like this:\n\n **host machine**\n **process** (global resources)\n **thread** (runs machine code)\n\nEach process represents a program running on the host. Think of each\nprocess itself as the data part of its program. Think of the process'\nthreads as the execution part of the program. This distinction will\nbe important to understand the conceptual Python runtime.\n\nThe process, as the data part, is the execution context in which the\nprogram runs. It mostly consists of the set of resources assigned to\nthe program by the host, including memory, signals, file handles,\nsockets, and environment variables.\n\nProcesses are isolated and independent from one another. (The same is\ntrue for hosts.) The host manages the process' access to its assigned\nresources, in addition to coordinating between processes.\n\nEach thread represents the actual execution of the program's machine\ncode, running relative to the resources assigned to the program's\nprocess. It's strictly up to the host how and when that execution\ntakes place.\n\nFrom the point of view of Python, a program always starts with exactly\none thread. However, the program may grow to run in multiple\nsimultaneous threads. Not all hosts support multiple threads per\nprocess, but most do. Unlike processes, threads in a process are not\nisolated and independent from one another. Specifically, all threads\nin a process share all of the process' resources.\n\nThe fundamental point of threads is that each one does *run*\nindependently, at the same time as the others. That may be only\nconceptually at the same time (\"concurrently\") or physically (\"in\nparallel\"). Either way, the threads effectively run at a non-\nsynchronized rate.\n\nNote:\n\n That non-synchronized rate means none of the process' memory is\n guaranteed to stay consistent for the code running in any given\n thread. Thus multi-threaded programs must take care to coordinate\n access to intentionally shared resources. Likewise, they must take\n care to be absolutely diligent about not accessing any *other*\n resources in multiple threads; otherwise two threads running at the\n same time might accidentally interfere with each other's use of some\n shared data. All this is true for both Python programs and the\n Python runtime.The cost of this broad, unstructured requirement is\n the tradeoff for the kind of raw concurrency that threads provide.\n The alternative to the required discipline generally means dealing\n with non-deterministic bugs and data corruption.\n\n4.4.2. Python Runtime Model\n---------------------------\n\nThe same conceptual layers apply to each Python program, with some\nextra data layers specific to Python:\n\n **host machine**\n **process** (global resources)\n Python global runtime (*state*)\n Python interpreter (*state*)\n **thread** (runs Python bytecode and \"C-API\")\n Python thread *state*\n\nAt the conceptual level: when a Python program starts, it looks\nexactly like that diagram, with one of each. The runtime may grow to\ninclude multiple interpreters, and each interpreter may grow to\ninclude multiple thread states.\n\nNote:\n\n A Python implementation won't necessarily implement the runtime\n layers distinctly or even concretely. The only exception is places\n where distinct layers are directly specified or exposed to users,\n like through the \"threading\" module.\n\nNote:\n\n The initial interpreter is typically called the \"main\" interpreter.\n Some Python implementations, like CPython, assign special roles to\n the main interpreter.Likewise, the host thread where the runtime was\n initialized is known as the \"main\" thread. It may be different from\n the process' initial thread, though they are often the same. In\n some cases \"main thread\" may be even more specific and refer to the\n initial thread state. A Python runtime might assign specific\n responsibilities to the main thread, such as handling signals.\n\nAs a whole, the Python runtime consists of the global runtime state,\ninterpreters, and thread states. The runtime ensures all that state\nstays consistent over its lifetime, particularly when used with\nmultiple host threads.\n\nThe global runtime, at the conceptual level, is just a set of\ninterpreters. While those interpreters are otherwise isolated and\nindependent from one another, they may share some data or other\nresources. The runtime is responsible for managing these global\nresources safely. The actual nature and management of these resources\nis implementation-specific. Ultimately, the external utility of the\nglobal runtime is limited to managing interpreters.\n\nIn contrast, an \"interpreter\" is conceptually what we would normally\nthink of as the (full-featured) \"Python runtime\". When machine code\nexecuting in a host thread interacts with the Python runtime, it calls\ninto Python in the context of a specific interpreter.\n\nNote:\n\n The term \"interpreter\" here is not the same as the \"bytecode\n interpreter\", which is what regularly runs in threads, executing\n compiled Python code.In an ideal world, \"Python runtime\" would refer\n to what we currently call \"interpreter\". However, it's been called\n \"interpreter\" at least since introduced in 1997 (CPython:a027efa5b).\n\nEach interpreter completely encapsulates all of the non-process-\nglobal, non-thread-specific state needed for the Python runtime to\nwork. Notably, the interpreter's state persists between uses. It\nincludes fundamental data like \"sys.modules\". The runtime ensures\nmultiple threads using the same interpreter will safely share it\nbetween them.\n\nA Python implementation may support using multiple interpreters at the\nsame time in the same process. They are independent and isolated from\none another. For example, each interpreter has its own \"sys.modules\".\n\nFor thread-specific runtime state, each interpreter has a set of\nthread states, which it manages, in the same way the global runtime\ncontains a set of interpreters. It can have thread states for as many\nhost threads as it needs. It may even have multiple thread states for\nthe same host thread, though that isn't as common.\n\nEach thread state, conceptually, has all the thread-specific runtime\ndata an interpreter needs to operate in one host thread. The thread\nstate includes the current raised exception and the thread's Python\ncall stack. It may include other thread-specific resources.\n\nNote:\n\n The term \"Python thread\" can sometimes refer to a thread state, but\n normally it means a thread created using the \"threading\" module.\n\nEach thread state, over its lifetime, is always tied to exactly one\ninterpreter and exactly one host thread. It will only ever be used in\nthat thread and with that interpreter.\n\nMultiple thread states may be tied to the same host thread, whether\nfor different interpreters or even the same interpreter. However, for\nany given host thread, only one of the thread states tied to it can be\nused by the thread at a time.\n\nThread states are isolated and independent from one another and don't\nshare any data, except for possibly sharing an interpreter and objects\nor other resources belonging to that interpreter.\n\nOnce a program is running, new Python threads can be created using the\n\"threading\" module (on platforms and Python implementations that\nsupport threads). Additional processes can be created using the \"os\",\n\"subprocess\", and \"multiprocessing\" modules. Interpreters can be\ncreated and used with the \"interpreters\" module. Coroutines (async)\ncan be run using \"asyncio\" in each interpreter, typically only in a\nsingle thread (often the main thread).\n\n-[ Footnotes ]-\n\n[1] This limitation occurs because the code that is executed by these\n operations is not available at the time the module is compiled.", "source": "python_docs:python-3.14-docs-text/reference/executionmodel.txt", "domain": "software" }, { "text": "9. Top-level components\n***********************\n\nThe Python interpreter can get its input from a number of sources:\nfrom a script passed to it as standard input or as program argument,\ntyped in interactively, from a module source file, etc. This chapter\ngives the syntax used in these cases.\n\n9.1. Complete Python programs\n=============================\n\nWhile a language specification need not prescribe how the language\ninterpreter is invoked, it is useful to have a notion of a complete\nPython program. A complete Python program is executed in a minimally\ninitialized environment: all built-in and standard modules are\navailable, but none have been initialized, except for \"sys\" (various\nsystem services), \"builtins\" (built-in functions, exceptions and\n\"None\") and \"__main__\". The latter is used to provide the local and\nglobal namespace for execution of the complete program.\n\nThe syntax for a complete Python program is that for file input,\ndescribed in the next section.\n\nThe interpreter may also be invoked in interactive mode; in this case,\nit does not read and execute a complete program but reads and executes\none statement (possibly compound) at a time. The initial environment\nis identical to that of a complete program; each statement is executed\nin the namespace of \"__main__\".\n\nA complete program can be passed to the interpreter in three forms:\nwith the \"-c\" *string* command line option, as a file passed as the\nfirst command line argument, or as standard input. If the file or\nstandard input is a tty device, the interpreter enters interactive\nmode; otherwise, it executes the file as a complete program.\n\n9.2. File input\n===============\n\nAll input read from non-interactive files has the same form:\n\n file_input: (NEWLINE | statement)* ENDMARKER\n\nThis syntax is used in the following situations:\n\n* when parsing a complete Python program (from a file or from a\n string);\n\n* when parsing a module;\n\n* when parsing a string passed to the \"exec()\" function;\n\n9.3. Interactive input\n======================\n\nInput in interactive mode is parsed using the following grammar:\n\n interactive_input: [stmt_list] NEWLINE | compound_stmt NEWLINE | ENDMARKER\n\nNote that a (top-level) compound statement must be followed by a blank\nline in interactive mode; this is needed to help the parser detect the\nend of the input.\n\n9.4. Expression input\n=====================\n\n\"eval()\" is used for expression input. It ignores leading whitespace.\nThe string argument to \"eval()\" must have the following form:\n\n eval_input: expression_list NEWLINE* ENDMARKER", "source": "python_docs:python-3.14-docs-text/reference/toplevel_components.txt", "domain": "software" }, { "text": "1. Introduction\n***************\n\nThis reference manual describes the Python programming language. It is\nnot intended as a tutorial.\n\nWhile I am trying to be as precise as possible, I chose to use English\nrather than formal specifications for everything except syntax and\nlexical analysis. This should make the document more understandable to\nthe average reader, but will leave room for ambiguities. Consequently,\nif you were coming from Mars and tried to re-implement Python from\nthis document alone, you might have to guess things and in fact you\nwould probably end up implementing quite a different language. On the\nother hand, if you are using Python and wonder what the precise rules\nabout a particular area of the language are, you should definitely be\nable to find them here. If you would like to see a more formal\ndefinition of the language, maybe you could volunteer your time --- or\ninvent a cloning machine :-).\n\nIt is dangerous to add too many implementation details to a language\nreference document --- the implementation may change, and other\nimplementations of the same language may work differently. On the\nother hand, CPython is the one Python implementation in widespread use\n(although alternate implementations continue to gain support), and its\nparticular quirks are sometimes worth being mentioned, especially\nwhere the implementation imposes additional limitations. Therefore,\nyou'll find short \"implementation notes\" sprinkled throughout the\ntext.\n\nEvery Python implementation comes with a number of built-in and\nstandard modules. These are documented in The Python Standard\nLibrary. A few built-in modules are mentioned when they interact in a\nsignificant way with the language definition.\n\n1.1. Alternate Implementations\n==============================\n\nThough there is one Python implementation which is by far the most\npopular, there are some alternate implementations which are of\nparticular interest to different audiences.\n\nKnown implementations include:\n\nCPython\n This is the original and most-maintained implementation of Python,\n written in C. New language features generally appear here first.\n\nJython\n Python implemented in Java. This implementation can be used as a\n scripting language for Java applications, or can be used to create\n applications using the Java class libraries. It is also often used\n to create tests for Java libraries. More information can be found\n at the Jython website.\n\nPython for .NET\n This implementation actually uses the CPython implementation, but\n is a managed .NET application and makes .NET libraries available.\n It was created by Brian Lloyd. For more information, see the\n Python for .NET home page.\n\nIronPython\n An alternate Python for .NET. Unlike Python.NET, this is a\n complete Python implementation that generates IL, and compiles\n Python code directly to .NET assemblies. It was created by Jim\n Hugunin, the original creator of Jython. For more information, see\n the IronPython website.\n\nPyPy\n An implementation of Python written completely in Python. It\n supports several advanced features not found in other\n implementations like stackless support and a Just in Time compiler.\n One of the goals of the project is to encourage experimentation\n with the language itself by making it easier to modify the\n interpreter (since it is written in Python). Additional\n information is available on the PyPy project's home page.\n\nEach of these implementations varies in some way from the language as\ndocumented in this manual, or introduces specific information beyond\nwhat's covered in the standard Python documentation. Please refer to\nthe implementation-specific documentation to determine what else you\nneed to know about the specific implementation you're using.\n\n1.2. Notation\n=============\n\nThe descriptions of lexical analysis and syntax use a grammar notation\nthat is a mixture of EBNF and PEG. For example:\n\n name: letter (letter | digit | \"_\")*\n letter: \"a\"...\"z\" | \"A\"...\"Z\"\n digit: \"0\"...\"9\"\n\nIn this example, the first line says that a \"name\" is a \"letter\"\nfollowed by a sequence of zero or more \"letter\"s, \"digit\"s, and\nunderscores. A \"letter\" in turn is any of the single characters \"'a'\"\nthrough \"'z'\" and \"A\" through \"Z\"; a \"digit\" is a single character\nfrom \"0\" to \"9\".\n\nEach rule begins with a name (which identifies the rule that's being\ndefined) followed by a colon, \":\". The definition to the right of the\ncolon uses the following syntax elements:\n\n* \"name\": A name refers to another rule. Where possible, it is a link\n to the rule's definition.\n\n * \"TOKEN\": An uppercase name refers to a *token*. For the purposes\n of grammar definitions, tokens are the same as rules.\n\n* \"\"text\"\", \"'text'\": Text in single or double quotes must match\n literally (without the quotes). The type of quote is chosen\n according to the meaning of \"text\":\n\n * \"'if'\": A name in single quotes denotes a keyword.\n\n * \"\"case\"\": A name in double quotes denotes a soft-keyword.\n\n * \"'@'\": A non-letter symbol in single quotes denotes an \"OP\" token,\n that is, a delimiter or operator.\n\n* \"e1 e2\": Items separated only by whitespace denote a sequence. Here,\n \"e1\" must be followed by \"e2\".\n\n* \"e1 | e2\": A vertical bar is used to separate alternatives. It\n denotes PEG's \"ordered choice\": if \"e1\" matches, \"e2\" is not\n considered. In traditional PEG grammars, this is written as a slash,\n \"/\", rather than a vertical bar. See **PEP 617** for more background\n and details.\n\n* \"e*\": A star means zero or more repetitions of the preceding item.\n\n* \"e+\": Likewise, a plus means one or more repetitions.\n\n* \"[e]\": A phrase enclosed in square brackets means zero or one\n occurrences. In other words, the enclosed phrase is optional.\n\n* \"e?\": A question mark has exactly the same meaning as square\n brackets: the preceding item is optional.\n\n* \"(e)\": Parentheses are used for grouping.\n\nThe following notation is only used in lexical definitions.\n\n* \"\"a\"...\"z\"\": Two literal characters separated by three dots mean a\n choice of any single character in the given (inclusive) range of\n ASCII characters.\n\n* \"<...>\": A phrase between angular brackets gives an informal\n description of the matched symbol (for example, \"\"), or an abbreviation that is defined in\n nearby text (for example, \"\").\n\nSome definitions also use *lookaheads*, which indicate that an element\nmust (or must not) match at a given position, but without consuming\nany input:\n\n* \"&e\": a positive lookahead (that is, \"e\" is required to match)\n\n* \"!e\": a negative lookahead (that is, \"e\" is required *not* to match)\n\nThe unary operators (\"*\", \"+\", \"?\") bind as tightly as possible; the\nvertical bar (\"|\") binds most loosely.\n\nWhite space is only meaningful to separate tokens.\n\nRules are normally contained on a single line, but rules that are too\nlong may be wrapped:\n\n literal: stringliteral | bytesliteral\n | integer | floatnumber | imagnumber\n\nAlternatively, rules may be formatted with the first line ending at\nthe colon, and each alternative beginning with a vertical bar on a new\nline. For example:\n\n literal:\n | stringliteral\n | bytesliteral\n | integer\n | floatnumber\n | imagnumber\n\nThis does *not* mean that there is an empty first alternative.\n\n1.2.1. Lexical and Syntactic definitions\n----------------------------------------\n\nThere is some difference between *lexical* and *syntactic* analysis:\nthe *lexical analyzer* operates on the individual characters of the\ninput source, while the *parser* (syntactic analyzer) operates on the\nstream of *tokens* generated by the lexical analysis. However, in some\ncases the exact boundary between the two phases is a CPython\nimplementation detail.\n\nThe practical difference between the two is that in *lexical*\ndefinitions, all whitespace is significant. The lexical analyzer\ndiscards all whitespace that is not converted to tokens like\n\"token.INDENT\" or \"NEWLINE\". *Syntactic* definitions then use these\ntokens, rather than source characters.\n\nThis documentation uses the same BNF grammar for both styles of\ndefinitions. All uses of BNF in the next chapter (Lexical analysis)\nare lexical definitions; uses in subsequent chapters are syntactic\ndefinitions.", "source": "python_docs:python-3.14-docs-text/reference/introduction.txt", "domain": "software" }, { "text": "\"getpass\" --- Portable password input\n*************************************\n\n**Source code:** Lib/getpass.py\n\n======================================================================\n\nAvailability: not WASI.\n\nThis module does not work or is not available on WebAssembly. See\nWebAssembly platforms for more information.\n\nThe \"getpass\" module provides two functions:\n\ngetpass.getpass(prompt='Password: ', stream=None, *, echo_char=None)\n\n Prompt the user for a password without echoing. The user is\n prompted using the string *prompt*, which defaults to \"'Password:\n '\". On Unix, the prompt is written to the file-like object\n *stream* using the replace error handler if needed. *stream*\n defaults to the controlling terminal (\"/dev/tty\") or if that is\n unavailable to \"sys.stderr\" (this argument is ignored on Windows).\n\n The *echo_char* argument controls how user input is displayed while\n typing. If *echo_char* is \"None\" (default), input remains hidden.\n Otherwise, *echo_char* must be a single printable ASCII character\n and each typed character is replaced by it. For example,\n \"echo_char='*'\" will display asterisks instead of the actual input.\n\n If echo free input is unavailable getpass() falls back to printing\n a warning message to *stream* and reading from \"sys.stdin\" and\n issuing a \"GetPassWarning\".\n\n Note:\n\n If you call getpass from within IDLE, the input may be done in\n the terminal you launched IDLE from rather than the idle window\n itself.\n\n Note:\n\n On Unix systems, when *echo_char* is set, the terminal will be\n configured to operate in *noncanonical mode*. In particular, this\n means that line editing shortcuts such as \"Ctrl\"+\"U\" will not\n work and may insert unexpected characters into the input.\n\n Changed in version 3.14: Added the *echo_char* parameter for\n keyboard feedback.\n\nexception getpass.GetPassWarning\n\n A \"UserWarning\" subclass issued when password input may be echoed.\n\ngetpass.getuser()\n\n Return the \"login name\" of the user.\n\n This function checks the environment variables \"LOGNAME\", \"USER\",\n \"LNAME\" and \"USERNAME\", in order, and returns the value of the\n first one which is set to a non-empty string. If none are set, the\n login name from the password database is returned on systems which\n support the \"pwd\" module, otherwise, an \"OSError\" is raised.\n\n In general, this function should be preferred over \"os.getlogin()\".\n\n Changed in version 3.13: Previously, various exceptions beyond just\n \"OSError\" were raised.", "source": "python_docs:python-3.14-docs-text/library/getpass.txt", "domain": "software" }, { "text": "\"dataclasses\" --- Data Classes\n******************************\n\n**Source code:** Lib/dataclasses.py\n\n======================================================================\n\nThis module provides a decorator and functions for automatically\nadding generated *special methods* such as \"__init__()\" and\n\"__repr__()\" to user-defined classes. It was originally described in\n**PEP 557**.\n\nThe member variables to use in these generated methods are defined\nusing **PEP 526** type annotations. For example, this code:\n\n from dataclasses import dataclass\n\n @dataclass\n class InventoryItem:\n \"\"\"Class for keeping track of an item in inventory.\"\"\"\n name: str\n unit_price: float\n quantity_on_hand: int = 0\n\n def total_cost(self) -> float:\n return self.unit_price * self.quantity_on_hand\n\nwill add, among other things, a \"__init__()\" that looks like:\n\n def __init__(self, name: str, unit_price: float, quantity_on_hand: int = 0):\n self.name = name\n self.unit_price = unit_price\n self.quantity_on_hand = quantity_on_hand\n\nNote that this method is automatically added to the class: it is not\ndirectly specified in the \"InventoryItem\" definition shown above.\n\nAdded in version 3.7.\n\nModule contents\n===============\n\n@dataclasses.dataclass(*, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False, match_args=True, kw_only=False, slots=False, weakref_slot=False)\n\n This function is a *decorator* that is used to add generated\n *special methods* to classes, as described below.\n\n The \"@dataclass\" decorator examines the class to find \"field\"s. A\n \"field\" is defined as a class variable that has a *type\n annotation*. With two exceptions described below, nothing in\n \"@dataclass\" examines the type specified in the variable\n annotation.\n\n The order of the fields in all of the generated methods is the\n order in which they appear in the class definition.\n\n The \"@dataclass\" decorator will add various \"dunder\" methods to the\n class, described below. If any of the added methods already exist\n in the class, the behavior depends on the parameter, as documented\n below. The decorator returns the same class that it is called on;\n no new class is created.\n\n If \"@dataclass\" is used just as a simple decorator with no\n parameters, it acts as if it has the default values documented in\n this signature. That is, these three uses of \"@dataclass\" are\n equivalent:\n\n @dataclass\n class C:\n ...\n\n @dataclass()\n class C:\n ...\n\n @dataclass(init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False,\n match_args=True, kw_only=False, slots=False, weakref_slot=False)\n class C:\n ...\n\n The parameters to \"@dataclass\" are:\n\n * *init*: If true (the default), a \"__init__()\" method will be\n generated.\n\n If the class already defines \"__init__()\", this parameter is\n ignored.\n\n * *repr*: If true (the default), a \"__repr__()\" method will be\n generated. The generated repr string will have the class name\n and the name and repr of each field, in the order they are\n defined in the class. Fields that are marked as being excluded\n from the repr are not included. For example:\n \"InventoryItem(name='widget', unit_price=3.0,\n quantity_on_hand=10)\".\n\n If the class already defines \"__repr__()\", this parameter is\n ignored.\n\n * *eq*: If true (the default), an \"__eq__()\" method will be\n generated. This method compares the class as if it were a tuple\n of its fields, in order. Both instances in the comparison must\n be of the identical type.\n\n If the class already defines \"__eq__()\", this parameter is\n ignored.\n\n * *order*: If true (the default is \"False\"), \"__lt__()\",\n \"__le__()\", \"__gt__()\", and \"__ge__()\" methods will be generated.\n These compare the class as if it were a tuple of its fields, in\n order. Both instances in the comparison must be of the identical\n type. If *order* is true and *eq* is false, a \"ValueError\" is\n raised.\n\n If the class already defines any of \"__lt__()\", \"__le__()\",\n \"__gt__()\", or \"__ge__()\", then \"TypeError\" is raised.\n\n * *unsafe_hash*: If true, force \"dataclasses\" to create a\n \"__hash__()\" method, even though it may not be safe to do so.\n Otherwise, generate a \"__hash__()\" method according to how *eq*\n and *frozen* are set. The default value is \"False\".\n\n \"__hash__()\" is used by built-in \"hash()\", and when objects are\n added to hashed collections such as dictionaries and sets.\n Having a \"__hash__()\" implies that instances of the class are\n immutable. Mutability is a complicated property that depends on\n the programmer's intent, the existence and behavior of\n \"__eq__()\", and the values of the *eq* and *frozen* flags in the\n \"@dataclass\" decorator.\n\n By default, \"@dataclass\" will not implicitly add a \"__hash__()\"\n method unless it is safe to do so. Neither will it add or change\n an existing explicitly defined \"__hash__()\" method. Setting the\n class attribute \"__hash__ = None\" has a specific meaning to\n Python, as described in the \"__hash__()\" documentation.\n\n If \"__hash__()\" is not explicitly defined, or if it is set to\n \"None\", then \"@dataclass\" *may* add an implicit \"__hash__()\"\n method. Although not recommended, you can force \"@dataclass\" to\n create a \"__hash__()\" method with \"unsafe_hash=True\". This might\n be the case if your class is logically immutable but can still be\n mutated. This is a specialized use case and should be considered\n carefully.\n\n Here are the rules governing implicit creation of a \"__hash__()\"\n method. Note that you cannot both have an explicit \"__hash__()\"\n method in your dataclass and set \"unsafe_hash=True\"; this will\n result in a \"TypeError\".\n\n If *eq* and *frozen* are both true, by default \"@dataclass\" will\n generate a \"__hash__()\" method for you. If *eq* is true and\n *frozen* is false, \"__hash__()\" will be set to \"None\", marking it\n unhashable (which it is, since it is mutable). If *eq* is false,\n \"__hash__()\" will be left untouched meaning the \"__hash__()\"\n method of the superclass will be used (if the superclass is\n \"object\", this means it will fall back to id-based hashing).\n\n * *frozen*: If true (the default is \"False\"), assigning to fields\n will generate an exception. This emulates read-only frozen\n instances. See the discussion below.\n\n If \"__setattr__()\" or \"__delattr__()\" is defined in the class and\n *frozen* is true, then \"TypeError\" is raised.\n\n * *match_args*: If true (the default is \"True\"), the\n \"__match_args__\" tuple will be created from the list of non\n keyword-only parameters to the generated \"__init__()\" method\n (even if \"__init__()\" is not generated, see above). If false, or\n if \"__match_args__\" is already defined in the class, then\n \"__match_args__\" will not be generated.\n\n Added in version 3.10.\n\n * *kw_only*: If true (the default value is \"False\"), then all\n fields will be marked as keyword-only. If a field is marked as\n keyword-only, then the only effect is that the \"__init__()\"\n parameter generated from a keyword-only field must be specified\n with a keyword when \"__init__()\" is called. See the *parameter*\n glossary entry for details. Also see the \"KW_ONLY\" section.\n\n Keyword-only fields are not included in \"__match_args__\".\n\n Added in version 3.10.\n\n * *slots*: If true (the default is \"False\"), \"__slots__\" attribute\n will be generated and new class will be returned instead of the\n original one. If \"__slots__\" is already defined in the class,\n then \"TypeError\" is raised.\n\n Warning:\n\n Passing parameters to a base class \"__init_subclass__()\" when\n using \"slots=True\" will result in a \"TypeError\". Either use\n \"__init_subclass__\" with no parameters or use default values\n as a workaround. See gh-91126 for full details.\n\n Added in version 3.10.\n\n Changed in version 3.11: If a field name is already included in\n the \"__slots__\" of a base class, it will not be included in the\n generated \"__slots__\" to prevent overriding them. Therefore, do\n not use \"__slots__\" to retrieve the field names of a dataclass.\n Use \"fields()\" instead. To be able to determine inherited slots,\n base class \"__slots__\" may be any iterable, but *not* an\n iterator.\n\n * *weakref_slot*: If true (the default is \"False\"), add a slot\n named \"__weakref__\", which is required to make an instance\n \"weakref-able\". It is an error to specify \"weakref_slot=True\"\n without also specifying \"slots=True\".\n\n Added in version 3.11.\n\n \"field\"s may optionally specify a default value, using normal\n Python syntax:\n\n @dataclass\n class C:\n a: int # 'a' has no default value\n b: int = 0 # assign a default value for 'b'\n\n In this example, both \"a\" and \"b\" will be included in the added\n \"__init__()\" method, which will be defined as:\n\n def __init__(self, a: int, b: int = 0):\n\n \"TypeError\" will be raised if a field without a default value\n follows a field with a default value. This is true whether this\n occurs in a single class, or as a result of class inheritance.\n\ndataclasses.field(*, default=MISSING, default_factory=MISSING, init=True, repr=True, hash=None, compare=True, metadata=None, kw_only=MISSING, doc=None)\n\n For common and simple use cases, no other functionality is\n required. There are, however, some dataclass features that require\n additional per-field information. To satisfy this need for\n additional information, you can replace the default field value\n with a call to the provided \"field()\" function. For example:\n\n @dataclass\n class C:\n mylist: list[int] = field(default_factory=list)\n\n c = C()\n c.mylist += [1, 2, 3]\n\n As shown above, the \"MISSING\" value is a sentinel object used to\n detect if some parameters are provided by the user. This sentinel\n is used because \"None\" is a valid value for some parameters with a\n distinct meaning. No code should directly use the \"MISSING\" value.\n\n The parameters to \"field()\" are:\n\n * *default*: If provided, this will be the default value for this\n field. This is needed because the \"field()\" call itself replaces\n the normal position of the default value.\n\n * *default_factory*: If provided, it must be a zero-argument\n callable that will be called when a default value is needed for\n this field. Among other purposes, this can be used to specify\n fields with mutable default values, as discussed below. It is an\n error to specify both *default* and *default_factory*.\n\n * *init*: If true (the default), this field is included as a\n parameter to the generated \"__init__()\" method.\n\n * *repr*: If true (the default), this field is included in the\n string returned by the generated \"__repr__()\" method.\n\n * *hash*: This can be a bool or \"None\". If true, this field is\n included in the generated \"__hash__()\" method. If false, this\n field is excluded from the generated \"__hash__()\". If \"None\" (the\n default), use the value of *compare*: this would normally be the\n expected behavior, since a field should be included in the hash\n if it's used for comparisons. Setting this value to anything\n other than \"None\" is discouraged.\n\n One possible reason to set \"hash=False\" but \"compare=True\" would\n be if a field is expensive to compute a hash value for, that\n field is needed for equality testing, and there are other fields\n that contribute to the type's hash value. Even if a field is\n excluded from the hash, it will still be used for comparisons.\n\n * *compare*: If true (the default), this field is included in the\n generated equality and comparison methods (\"__eq__()\",\n \"__gt__()\", et al.).\n\n * *metadata*: This can be a mapping or \"None\". \"None\" is treated as\n an empty dict. This value is wrapped in \"MappingProxyType()\" to\n make it read-only, and exposed on the \"Field\" object. It is not\n used at all by Data Classes, and is provided as a third-party\n extension mechanism. Multiple third-parties can each have their\n own key, to use as a namespace in the metadata.\n\n * *kw_only*: If true, this field will be marked as keyword-only.\n This is used when the generated \"__init__()\" method's parameters\n are computed.\n\n Keyword-only fields are also not included in \"__match_args__\".\n\n Added in version 3.10.\n\n * *doc*: optional docstring for this field.\n\n Added in version 3.14.\n\n If the default value of a field is specified by a call to\n \"field()\", then the class attribute for this field will be replaced\n by the specified *default* value. If *default* is not provided,\n then the class attribute will be deleted. The intent is that after\n the \"@dataclass\" decorator runs, the class attributes will all\n contain the default values for the fields, just as if the default\n value itself were specified. For example, after:\n\n @dataclass\n class C:\n x: int\n y: int = field(repr=False)\n z: int = field(repr=False, default=10)\n t: int = 20\n\n The class attribute \"C.z\" will be \"10\", the class attribute \"C.t\"\n will be \"20\", and the class attributes \"C.x\" and \"C.y\" will not be\n set.\n\nclass dataclasses.Field\n\n \"Field\" objects describe each defined field. These objects are\n created internally, and are returned by the \"fields()\" module-level\n method (see below). Users should never instantiate a \"Field\"\n object directly. Its documented attributes are:\n\n * \"name\": The name of the field.\n\n * \"type\": The type of the field.\n\n * \"default\", \"default_factory\", \"init\", \"repr\", \"hash\", \"compare\",\n \"metadata\", and \"kw_only\" have the identical meaning and values\n as they do in the \"field()\" function.\n\n Other attributes may exist, but they are private and must not be\n inspected or relied on.\n\nclass dataclasses.InitVar\n\n \"InitVar[T]\" type annotations describe variables that are init-\n only. Fields annotated with \"InitVar\" are considered pseudo-fields,\n and thus are neither returned by the \"fields()\" function nor used\n in any way except adding them as parameters to \"__init__()\" and an\n optional \"__post_init__()\".\n\ndataclasses.fields(class_or_instance)\n\n Returns a tuple of \"Field\" objects that define the fields for this\n dataclass. Accepts either a dataclass, or an instance of a\n dataclass. Raises \"TypeError\" if not passed a dataclass or instance\n of one. Does not return pseudo-fields which are \"ClassVar\" or\n \"InitVar\".\n\ndataclasses.asdict(obj, *, dict_factory=dict)\n\n Converts the dataclass *obj* to a dict (by using the factory\n function *dict_factory*). Each dataclass is converted to a dict of\n its fields, as \"name: value\" pairs. dataclasses, dicts, lists, and\n tuples are recursed into. Other objects are copied with\n \"copy.deepcopy()\".\n\n Example of using \"asdict()\" on nested dataclasses:\n\n @dataclass\n class Point:\n x: int\n y: int\n\n @dataclass\n class C:\n mylist: list[Point]\n\n p = Point(10, 20)\n assert asdict(p) == {'x': 10, 'y': 20}\n\n c = C([Point(0, 0), Point(10, 4)])\n assert asdict(c) == {'mylist': [{'x': 0, 'y': 0}, {'x': 10, 'y': 4}]}\n\n To create a shallow copy, the following workaround may be used:\n\n {field.name: getattr(obj, field.name) for field in fields(obj)}\n\n \"asdict()\" raises \"TypeError\" if *obj* is not a dataclass instance.\n\ndataclasses.astuple(obj, *, tuple_factory=tuple)\n\n Converts the dataclass *obj* to a tuple (by using the factory\n function *tuple_factory*). Each dataclass is converted to a tuple\n of its field values. dataclasses, dicts, lists, and tuples are\n recursed into. Other objects are copied with \"copy.deepcopy()\".\n\n Continuing from the previous example:\n\n assert astuple(p) == (10, 20)\n assert astuple(c) == ([(0, 0), (10, 4)],)\n\n To create a shallow copy, the following workaround may be used:\n\n tuple(getattr(obj, field.name) for field in dataclasses.fields(obj))\n\n \"astuple()\" raises \"TypeError\" if *obj* is not a dataclass\n instance.\n\ndataclasses.make_dataclass(cls_name, fields, *, bases=(), namespace=None, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False, match_args=True, kw_only=False, slots=False, weakref_slot=False, module=None, decorator=dataclass)\n\n Creates a new dataclass with name *cls_name*, fields as defined in\n *fields*, base classes as given in *bases*, and initialized with a\n namespace as given in *namespace*. *fields* is an iterable whose\n elements are each either \"name\", \"(name, type)\", or \"(name, type,\n Field)\". If just \"name\" is supplied, \"typing.Any\" is used for\n \"type\". The values of *init*, *repr*, *eq*, *order*,\n *unsafe_hash*, *frozen*, *match_args*, *kw_only*, *slots*, and\n *weakref_slot* have the same meaning as they do in \"@dataclass\".\n\n If *module* is defined, the \"__module__\" attribute of the dataclass\n is set to that value. By default, it is set to the module name of\n the caller.\n\n The *decorator* parameter is a callable that will be used to create\n the dataclass. It should take the class object as a first argument\n and the same keyword arguments as \"@dataclass\". By default, the\n \"@dataclass\" function is used.\n\n This function is not strictly required, because any Python\n mechanism for creating a new class with \"__annotations__\" can then\n apply the \"@dataclass\" function to convert that class to a\n dataclass. This function is provided as a convenience. For\n example:\n\n C = make_dataclass('C',\n [('x', int),\n 'y',\n ('z', int, field(default=5))],\n namespace={'add_one': lambda self: self.x + 1})\n\n Is equivalent to:\n\n @dataclass\n class C:\n x: int\n y: 'typing.Any'\n z: int = 5\n\n def add_one(self):\n return self.x + 1\n\n Added in version 3.14: Added the *decorator* parameter.\n\ndataclasses.replace(obj, /, **changes)\n\n Creates a new object of the same type as *obj*, replacing fields\n with values from *changes*. If *obj* is not a Data Class, raises\n \"TypeError\". If keys in *changes* are not field names of the given\n dataclass, raises \"TypeError\".\n\n The newly returned object is created by calling the \"__init__()\"\n method of the dataclass. This ensures that \"__post_init__()\", if\n present, is also called.\n\n Init-only variables without default values, if any exist, must be\n specified on the call to \"replace()\" so that they can be passed to\n \"__init__()\" and \"__post_init__()\".\n\n It is an error for *changes* to contain any fields that are defined\n as having \"init=False\". A \"ValueError\" will be raised in this\n case.\n\n Be forewarned about how \"init=False\" fields work during a call to\n \"replace()\". They are not copied from the source object, but\n rather are initialized in \"__post_init__()\", if they're initialized\n at all. It is expected that \"init=False\" fields will be rarely and\n judiciously used. If they are used, it might be wise to have\n alternate class constructors, or perhaps a custom \"replace()\" (or\n similarly named) method which handles instance copying.\n\n Dataclass instances are also supported by generic function\n \"copy.replace()\".\n\ndataclasses.is_dataclass(obj)\n\n Return \"True\" if its parameter is a dataclass (including subclasses\n of a dataclass) or an instance of one, otherwise return \"False\".\n\n If you need to know if a class is an instance of a dataclass (and\n not a dataclass itself), then add a further check for \"not\n isinstance(obj, type)\":\n\n def is_dataclass_instance(obj):\n return is_dataclass(obj) and not isinstance(obj, type)\n\ndataclasses.MISSING\n\n A sentinel value signifying a missing default or default_factory.\n\ndataclasses.KW_ONLY\n\n A sentinel value used as a type annotation. Any fields after a\n pseudo-field with the type of \"KW_ONLY\" are marked as keyword-only\n fields. Note that a pseudo-field of type \"KW_ONLY\" is otherwise\n completely ignored. This includes the name of such a field. By\n convention, a name of \"_\" is used for a \"KW_ONLY\" field. Keyword-\n only fields signify \"__init__()\" parameters that must be specified\n as keywords when the class is instantiated.\n\n In this example, the fields \"y\" and \"z\" will be marked as keyword-\n only fields:\n\n @dataclass\n class Point:\n x: float\n _: KW_ONLY\n y: float\n z: float\n\n p = Point(0, y=1.5, z=2.0)\n\n In a single dataclass, it is an error to specify more than one\n field whose type is \"KW_ONLY\".\n\n Added in version 3.10.\n\nexception dataclasses.FrozenInstanceError\n\n Raised when an implicitly defined \"__setattr__()\" or\n \"__delattr__()\" is called on a dataclass which was defined with\n \"frozen=True\". It is a subclass of \"AttributeError\".\n\nPost-init processing\n====================\n\ndataclasses.__post_init__()\n\n When defined on the class, it will be called by the generated\n \"__init__()\", normally as \"self.__post_init__()\". However, if any\n \"InitVar\" fields are defined, they will also be passed to\n \"__post_init__()\" in the order they were defined in the class. If\n no \"__init__()\" method is generated, then \"__post_init__()\" will\n not automatically be called.\n\n Among other uses, this allows for initializing field values that\n depend on one or more other fields. For example:\n\n @dataclass\n class C:\n a: float\n b: float\n c: float = field(init=False)\n\n def __post_init__(self):\n self.c = self.a + self.b\n\nThe \"__init__()\" method generated by \"@dataclass\" does not call base\nclass \"__init__()\" methods. If the base class has an \"__init__()\"\nmethod that has to be called, it is common to call this method in a\n\"__post_init__()\" method:\n\n class Rectangle:\n def __init__(self, height, width):\n self.height = height\n self.width = width\n\n @dataclass\n class Square(Rectangle):\n side: float\n\n def __post_init__(self):\n super().__init__(self.side, self.side)\n\nNote, however, that in general the dataclass-generated \"__init__()\"\nmethods don't need to be called, since the derived dataclass will take\ncare of initializing all fields of any base class that is a dataclass\nitself.\n\nSee the section below on init-only variables for ways to pass\nparameters to \"__post_init__()\". Also see the warning about how\n\"replace()\" handles \"init=False\" fields.\n\nClass variables\n===============\n\nOne of the few places where \"@dataclass\" actually inspects the type of\na field is to determine if a field is a class variable as defined in\n**PEP 526**. It does this by checking if the type of the field is\n\"typing.ClassVar\". If a field is a \"ClassVar\", it is excluded from\nconsideration as a field and is ignored by the dataclass mechanisms.\nSuch \"ClassVar\" pseudo-fields are not returned by the module-level\n\"fields()\" function.\n\nInit-only variables\n===================\n\nAnother place where \"@dataclass\" inspects a type annotation is to\ndetermine if a field is an init-only variable. It does this by seeing\nif the type of a field is of type \"InitVar\". If a field is an\n\"InitVar\", it is considered a pseudo-field called an init-only field.\nAs it is not a true field, it is not returned by the module-level\n\"fields()\" function. Init-only fields are added as parameters to the\ngenerated \"__init__()\" method, and are passed to the optional\n\"__post_init__()\" method. They are not otherwise used by dataclasses.\n\nFor example, suppose a field will be initialized from a database, if a\nvalue is not provided when creating the class:\n\n @dataclass\n class C:\n i: int\n j: int | None = None\n database: InitVar[DatabaseType | None] = None\n\n def __post_init__(self, database):\n if self.j is None and database is not None:\n self.j = database.lookup('j')\n\n c = C(10, database=my_database)\n\nIn this case, \"fields()\" will return \"Field\" objects for \"i\" and \"j\",\nbut not for \"database\".\n\nFrozen instances\n================\n\nIt is not possible to create truly immutable Python objects. However,\nby passing \"frozen=True\" to the \"@dataclass\" decorator you can emulate\nimmutability. In that case, dataclasses will add \"__setattr__()\" and\n\"__delattr__()\" methods to the class. These methods will raise a\n\"FrozenInstanceError\" when invoked.\n\nThere is a tiny performance penalty when using \"frozen=True\":\n\"__init__()\" cannot use simple assignment to initialize fields, and\nmust use \"object.__setattr__()\".\n\nInheritance\n===========\n\nWhen the dataclass is being created by the \"@dataclass\" decorator, it\nlooks through all of the class's base classes in reverse MRO (that is,\nstarting at \"object\") and, for each dataclass that it finds, adds the\nfields from that base class to an ordered mapping of fields. After all\nof the base class fields are added, it adds its own fields to the\nordered mapping. All of the generated methods will use this combined,\ncalculated ordered mapping of fields. Because the fields are in\ninsertion order, derived classes override base classes. An example:\n\n @dataclass\n class Base:\n x: Any = 15.0\n y: int = 0\n\n @dataclass\n class C(Base):\n z: int = 10\n x: int = 15\n\nThe final list of fields is, in order, \"x\", \"y\", \"z\". The final type\nof \"x\" is \"int\", as specified in class \"C\".\n\nThe generated \"__init__()\" method for \"C\" will look like:\n\n def __init__(self, x: int = 15, y: int = 0, z: int = 10):\n\nRe-ordering of keyword-only parameters in \"__init__()\"\n======================================================\n\nAfter the parameters needed for \"__init__()\" are computed, any\nkeyword-only parameters are moved to come after all regular (non-\nkeyword-only) parameters. This is a requirement of how keyword-only\nparameters are implemented in Python: they must come after non-\nkeyword-only parameters.\n\nIn this example, \"Base.y\", \"Base.w\", and \"D.t\" are keyword-only\nfields, and \"Base.x\" and \"D.z\" are regular fields:\n\n @dataclass\n class Base:\n x: Any = 15.0\n _: KW_ONLY\n y: int = 0\n w: int = 1\n\n @dataclass\n class D(Base):\n z: int = 10\n t: int = field(kw_only=True, default=0)\n\nThe generated \"__init__()\" method for \"D\" will look like:\n\n def __init__(self, x: Any = 15.0, z: int = 10, *, y: int = 0, w: int = 1, t: int = 0):\n\nNote that the parameters have been re-ordered from how they appear in\nthe list of fields: parameters derived from regular fields are\nfollowed by parameters derived from keyword-only fields.\n\nThe relative ordering of keyword-only parameters is maintained in the\nre-ordered \"__init__()\" parameter list.\n\nDefault factory functions\n=========================\n\nIf a \"field()\" specifies a *default_factory*, it is called with zero\narguments when a default value for the field is needed. For example,\nto create a new instance of a list, use:\n\n mylist: list = field(default_factory=list)\n\nIf a field is excluded from \"__init__()\" (using \"init=False\") and the\nfield also specifies *default_factory*, then the default factory\nfunction will always be called from the generated \"__init__()\"\nfunction. This happens because there is no other way to give the\nfield an initial value.\n\nMutable default values\n======================\n\nPython stores default member variable values in class attributes.\nConsider this example, not using dataclasses:\n\n class C:\n x = []\n def add(self, element):\n self.x.append(element)\n\n o1 = C()\n o2 = C()\n o1.add(1)\n o2.add(2)\n assert o1.x == [1, 2]\n assert o1.x is o2.x\n\nNote that the two instances of class \"C\" share the same class variable\n\"x\", as expected.\n\nUsing dataclasses, *if* this code was valid:\n\n @dataclass\n class D:\n x: list = [] # This code raises ValueError\n def add(self, element):\n self.x.append(element)\n\nit would generate code similar to:\n\n class D:\n x = []\n def __init__(self, x=x):\n self.x = x\n def add(self, element):\n self.x.append(element)\n\n assert D().x is D().x\n\nThis has the same issue as the original example using class \"C\". That\nis, two instances of class \"D\" that do not specify a value for \"x\"\nwhen creating a class instance will share the same copy of \"x\".\nBecause dataclasses just use normal Python class creation they also\nshare this behavior. There is no general way for Data Classes to\ndetect this condition. Instead, the \"@dataclass\" decorator will raise\na \"ValueError\" if it detects an unhashable default parameter. The\nassumption is that if a value is unhashable, it is mutable. This is a\npartial solution, but it does protect against many common errors.\n\nUsing default factory functions is a way to create new instances of\nmutable types as default values for fields:\n\n @dataclass\n class D:\n x: list = field(default_factory=list)\n\n assert D().x is not D().x\n\nChanged in version 3.11: Instead of looking for and disallowing\nobjects of type \"list\", \"dict\", or \"set\", unhashable objects are now\nnot allowed as default values. Unhashability is used to approximate\nmutability.\n\nDescriptor-typed fields\n=======================\n\nFields that are assigned descriptor objects as their default value\nhave the following special behaviors:\n\n* The value for the field passed to the dataclass's \"__init__()\"\n method is passed to the descriptor's \"__set__()\" method rather than\n overwriting the descriptor object.\n\n* Similarly, when getting or setting the field, the descriptor's\n \"__get__()\" or \"__set__()\" method is called rather than returning or\n overwriting the descriptor object.\n\n* To determine whether a field contains a default value, \"@dataclass\"\n will call the descriptor's \"__get__()\" method using its class access\n form: \"descriptor.__get__(obj=None, type=cls)\". If the descriptor\n returns a value in this case, it will be used as the field's\n default. On the other hand, if the descriptor raises\n \"AttributeError\" in this situation, no default value will be\n provided for the field.\n\n class IntConversionDescriptor:\n def __init__(self, *, default):\n self._default = default\n\n def __set_name__(self, owner, name):\n self._name = \"_\" + name\n\n def __get__(self, obj, type):\n if obj is None:\n return self._default\n\n return getattr(obj, self._name, self._default)\n\n def __set__(self, obj, value):\n setattr(obj, self._name, int(value))\n\n @dataclass\n class InventoryItem:\n quantity_on_hand: IntConversionDescriptor = IntConversionDescriptor(default=100)\n\n i = InventoryItem()\n print(i.quantity_on_hand) # 100\n i.quantity_on_hand = 2.5 # calls __set__ with 2.5\n print(i.quantity_on_hand) # 2\n\nNote that if a field is annotated with a descriptor type, but is not\nassigned a descriptor object as its default value, the field will act\nlike a normal field.", "source": "python_docs:python-3.14-docs-text/library/dataclasses.txt", "domain": "software" }, { "text": "\"tkinter.font\" --- Tkinter font wrapper\n***************************************\n\n**Source code:** Lib/tkinter/font.py\n\n======================================================================\n\nThe \"tkinter.font\" module provides the \"Font\" class for creating and\nusing named fonts.\n\nThe different font weights and slants are:\n\ntkinter.font.NORMAL\ntkinter.font.BOLD\ntkinter.font.ITALIC\ntkinter.font.ROMAN\n\nclass tkinter.font.Font(root=None, font=None, name=None, exists=False, **options)\n\n The \"Font\" class represents a named font. *Font* instances are\n given unique names and can be specified by their family, size, and\n style configuration. Named fonts are Tk's method of creating and\n identifying fonts as a single object, rather than specifying a font\n by its attributes with each occurrence.\n\n arguments:\n\n *font* - font specifier tuple (family, size, options)\n *name* - unique font name\n *exists* - self points to existing named font if true\n\n additional keyword options (ignored if *font* is specified):\n\n *family* - font family i.e. Courier, Times\n *size* - font size\n If *size* is positive it is interpreted as size in points.\n If *size* is a negative number its absolute value is treated\n as size in pixels.\n *weight* - font emphasis (NORMAL, BOLD)\n *slant* - ROMAN, ITALIC\n *underline* - font underlining (0 - none, 1 - underline)\n *overstrike* - font strikeout (0 - none, 1 - strikeout)\n\n actual(option=None, displayof=None)\n\n Return the attributes of the font.\n\n cget(option)\n\n Retrieve an attribute of the font.\n\n config(**options)\n\n Modify attributes of the font.\n\n copy()\n\n Return new instance of the current font.\n\n measure(text, displayof=None)\n\n Return amount of space the text would occupy on the specified\n display when formatted in the current font. If no display is\n specified then the main application window is assumed.\n\n metrics(*options, **kw)\n\n Return font-specific data. Options include:\n\n *ascent* - distance between baseline and highest point that a\n character of the font can occupy\n\n *descent* - distance between baseline and lowest point that a\n character of the font can occupy\n\n *linespace* - minimum vertical separation necessary between any\n two\n characters of the font that ensures no vertical overlap\n between lines.\n\n *fixed* - 1 if font is fixed-width else 0\n\ntkinter.font.families(root=None, displayof=None)\n\n Return the different font families.\n\ntkinter.font.names(root=None)\n\n Return the names of defined fonts.\n\ntkinter.font.nametofont(name, root=None)\n\n Return a \"Font\" representation of a tk named font.\n\n Changed in version 3.10: The *root* parameter was added.", "source": "python_docs:python-3.14-docs-text/library/tkinter.font.txt", "domain": "software" }, { "text": "\"xml.dom.pulldom\" --- Support for building partial DOM trees\n************************************************************\n\n**Source code:** Lib/xml/dom/pulldom.py\n\n======================================================================\n\nThe \"xml.dom.pulldom\" module provides a \"pull parser\" which can also\nbe asked to produce DOM-accessible fragments of the document where\nnecessary. The basic concept involves pulling \"events\" from a stream\nof incoming XML and processing them. In contrast to SAX which also\nemploys an event-driven processing model together with callbacks, the\nuser of a pull parser is responsible for explicitly pulling events\nfrom the stream, looping over those events until either processing is\nfinished or an error condition occurs.\n\nNote:\n\n If you need to parse untrusted or unauthenticated data, see XML\n security.\n\nChanged in version 3.7.1: The SAX parser no longer processes general\nexternal entities by default to increase security by default. To\nenable processing of external entities, pass a custom parser instance\nin:\n\n from xml.dom.pulldom import parse\n from xml.sax import make_parser\n from xml.sax.handler import feature_external_ges\n\n parser = make_parser()\n parser.setFeature(feature_external_ges, True)\n parse(filename, parser=parser)\n\nExample:\n\n from xml.dom import pulldom\n\n doc = pulldom.parse('sales_items.xml')\n for event, node in doc:\n if event == pulldom.START_ELEMENT and node.tagName == 'item':\n if int(node.getAttribute('price')) > 50:\n doc.expandNode(node)\n print(node.toxml())\n\n\"event\" is a constant and can be one of:\n\n* \"START_ELEMENT\"\n\n* \"END_ELEMENT\"\n\n* \"COMMENT\"\n\n* \"START_DOCUMENT\"\n\n* \"END_DOCUMENT\"\n\n* \"CHARACTERS\"\n\n* \"PROCESSING_INSTRUCTION\"\n\n* \"IGNORABLE_WHITESPACE\"\n\n\"node\" is an object of type \"xml.dom.minidom.Document\",\n\"xml.dom.minidom.Element\" or \"xml.dom.minidom.Text\".\n\nSince the document is treated as a \"flat\" stream of events, the\ndocument \"tree\" is implicitly traversed and the desired elements are\nfound regardless of their depth in the tree. In other words, one does\nnot need to consider hierarchical issues such as recursive searching\nof the document nodes, although if the context of elements were\nimportant, one would either need to maintain some context-related\nstate (i.e. remembering where one is in the document at any given\npoint) or to make use of the \"DOMEventStream.expandNode()\" method and\nswitch to DOM-related processing.\n\nclass xml.dom.pulldom.PullDOM(documentFactory=None)\n\n Subclass of \"xml.sax.handler.ContentHandler\".\n\nclass xml.dom.pulldom.SAX2DOM(documentFactory=None)\n\n Subclass of \"xml.sax.handler.ContentHandler\".\n\nxml.dom.pulldom.parse(stream_or_string, parser=None, bufsize=None)\n\n Return a \"DOMEventStream\" from the given input. *stream_or_string*\n may be either a file name, or a file-like object. *parser*, if\n given, must be an \"XMLReader\" object. This function will change the\n document handler of the parser and activate namespace support;\n other parser configuration (like setting an entity resolver) must\n have been done in advance.\n\nIf you have XML in a string, you can use the \"parseString()\" function\ninstead:\n\nxml.dom.pulldom.parseString(string, parser=None)\n\n Return a \"DOMEventStream\" that represents the (Unicode) *string*.\n\nxml.dom.pulldom.default_bufsize\n\n Default value for the *bufsize* parameter to \"parse()\".\n\n The value of this variable can be changed before calling \"parse()\"\n and the new value will take effect.\n\nDOMEventStream Objects\n======================\n\nclass xml.dom.pulldom.DOMEventStream(stream, parser, bufsize)\n\n Changed in version 3.11: Support for \"__getitem__()\" method has\n been removed.\n\n getEvent()\n\n Return a tuple containing *event* and the current *node* as\n \"xml.dom.minidom.Document\" if event equals \"START_DOCUMENT\",\n \"xml.dom.minidom.Element\" if event equals \"START_ELEMENT\" or\n \"END_ELEMENT\" or \"xml.dom.minidom.Text\" if event equals\n \"CHARACTERS\". The current node does not contain information\n about its children, unless \"expandNode()\" is called.\n\n expandNode(node)\n\n Expands all children of *node* into *node*. Example:\n\n from xml.dom import pulldom\n\n xml = 'Foo

Some text

and more

'\n doc = pulldom.parseString(xml)\n for event, node in doc:\n if event == pulldom.START_ELEMENT and node.tagName == 'p':\n # Following statement only prints '

'\n print(node.toxml())\n doc.expandNode(node)\n # Following statement prints node with all its children '

Some text

and more

'\n print(node.toxml())\n\n reset()", "source": "python_docs:python-3.14-docs-text/library/xml.dom.pulldom.txt", "domain": "software" }, { "text": "\"stat\" --- Interpreting \"stat()\" results\n****************************************\n\n**Source code:** Lib/stat.py\n\n======================================================================\n\nThe \"stat\" module defines constants and functions for interpreting the\nresults of \"os.stat()\", \"os.fstat()\" and \"os.lstat()\" (if they exist).\nFor complete details about the \"stat()\", \"fstat()\" and \"lstat()\"\ncalls, consult the documentation for your system.\n\nChanged in version 3.4: The stat module is backed by a C\nimplementation.\n\nThe \"stat\" module defines the following functions to test for specific\nfile types:\n\nstat.S_ISDIR(mode)\n\n Return non-zero if the mode is from a directory.\n\nstat.S_ISCHR(mode)\n\n Return non-zero if the mode is from a character special device\n file.\n\nstat.S_ISBLK(mode)\n\n Return non-zero if the mode is from a block special device file.\n\nstat.S_ISREG(mode)\n\n Return non-zero if the mode is from a regular file.\n\nstat.S_ISFIFO(mode)\n\n Return non-zero if the mode is from a FIFO (named pipe).\n\nstat.S_ISLNK(mode)\n\n Return non-zero if the mode is from a symbolic link.\n\nstat.S_ISSOCK(mode)\n\n Return non-zero if the mode is from a socket.\n\nstat.S_ISDOOR(mode)\n\n Return non-zero if the mode is from a door.\n\n Added in version 3.4.\n\nstat.S_ISPORT(mode)\n\n Return non-zero if the mode is from an event port.\n\n Added in version 3.4.\n\nstat.S_ISWHT(mode)\n\n Return non-zero if the mode is from a whiteout.\n\n Added in version 3.4.\n\nTwo additional functions are defined for more general manipulation of\nthe file's mode:\n\nstat.S_IMODE(mode)\n\n Return the portion of the file's mode that can be set by\n \"os.chmod()\"---that is, the file's permission bits, plus the sticky\n bit, set-group-id, and set-user-id bits (on systems that support\n them).\n\nstat.S_IFMT(mode)\n\n Return the portion of the file's mode that describes the file type\n (used by the \"S_IS*()\" functions above).\n\nNormally, you would use the \"os.path.is*()\" functions for testing the\ntype of a file; the functions here are useful when you are doing\nmultiple tests of the same file and wish to avoid the overhead of the\n\"stat()\" system call for each test. These are also useful when\nchecking for information about a file that isn't handled by \"os.path\",\nlike the tests for block and character devices.\n\nExample:\n\n import os, sys\n from stat import *\n\n def walktree(top, callback):\n '''recursively descend the directory tree rooted at top,\n calling the callback function for each regular file'''\n\n for f in os.listdir(top):\n pathname = os.path.join(top, f)\n mode = os.lstat(pathname).st_mode\n if S_ISDIR(mode):\n # It's a directory, recurse into it\n walktree(pathname, callback)\n elif S_ISREG(mode):\n # It's a file, call the callback function\n callback(pathname)\n else:\n # Unknown file type, print a message\n print('Skipping %s' % pathname)\n\n def visitfile(file):\n print('visiting', file)\n\n if __name__ == '__main__':\n walktree(sys.argv[1], visitfile)\n\nAn additional utility function is provided to convert a file's mode in\na human readable string:\n\nstat.filemode(mode)\n\n Convert a file's mode to a string of the form '-rwxrwxrwx'.\n\n Added in version 3.3.\n\n Changed in version 3.4: The function supports \"S_IFDOOR\",\n \"S_IFPORT\" and \"S_IFWHT\".\n\nAll the variables below are simply symbolic indexes into the 10-tuple\nreturned by \"os.stat()\", \"os.fstat()\" or \"os.lstat()\".\n\nstat.ST_MODE\n\n Inode protection mode.\n\nstat.ST_INO\n\n Inode number.\n\nstat.ST_DEV\n\n Device inode resides on.\n\nstat.ST_NLINK\n\n Number of links to the inode.\n\nstat.ST_UID\n\n User id of the owner.\n\nstat.ST_GID\n\n Group id of the owner.\n\nstat.ST_SIZE\n\n Size in bytes of a plain file; amount of data waiting on some\n special files.\n\nstat.ST_ATIME\n\n Time of last access.\n\nstat.ST_MTIME\n\n Time of last modification.\n\nstat.ST_CTIME\n\n The \"ctime\" as reported by the operating system. On some systems\n (like Unix) is the time of the last metadata change, and, on others\n (like Windows), is the creation time (see platform documentation\n for details).\n\nThe interpretation of \"file size\" changes according to the file type.\nFor plain files this is the size of the file in bytes. For FIFOs and\nsockets under most flavors of Unix (including Linux in particular),\nthe \"size\" is the number of bytes waiting to be read at the time of\nthe call to \"os.stat()\", \"os.fstat()\", or \"os.lstat()\"; this can\nsometimes be useful, especially for polling one of these special files\nafter a non-blocking open. The meaning of the size field for other\ncharacter and block devices varies more, depending on the\nimplementation of the underlying system call.\n\nThe variables below define the flags used in the \"ST_MODE\" field.\n\nUse of the functions above is more portable than use of the first set\nof flags:\n\nstat.S_IFSOCK\n\n Socket.\n\nstat.S_IFLNK\n\n Symbolic link.\n\nstat.S_IFREG\n\n Regular file.\n\nstat.S_IFBLK\n\n Block device.\n\nstat.S_IFDIR\n\n Directory.\n\nstat.S_IFCHR\n\n Character device.\n\nstat.S_IFIFO\n\n FIFO.\n\nstat.S_IFDOOR\n\n Door.\n\n Added in version 3.4.\n\nstat.S_IFPORT\n\n Event port.\n\n Added in version 3.4.\n\nstat.S_IFWHT\n\n Whiteout.\n\n Added in version 3.4.\n\nNote:\n\n \"S_IFDOOR\", \"S_IFPORT\" or \"S_IFWHT\" are defined as 0 when the\n platform does not have support for the file types.\n\nThe following flags can also be used in the *mode* argument of\n\"os.chmod()\":\n\nstat.S_ISUID\n\n Set UID bit.\n\nstat.S_ISGID\n\n Set-group-ID bit. This bit has several special uses. For a\n directory it indicates that BSD semantics is to be used for that\n directory: files created there inherit their group ID from the\n directory, not from the effective group ID of the creating process,\n and directories created there will also get the \"S_ISGID\" bit set.\n For a file that does not have the group execution bit (\"S_IXGRP\")\n set, the set-group-ID bit indicates mandatory file/record locking\n (see also \"S_ENFMT\").\n\nstat.S_ISVTX\n\n Sticky bit. When this bit is set on a directory it means that a\n file in that directory can be renamed or deleted only by the owner\n of the file, by the owner of the directory, or by a privileged\n process.\n\nstat.S_IRWXU\n\n Mask for file owner permissions.\n\nstat.S_IRUSR\n\n Owner has read permission.\n\nstat.S_IWUSR\n\n Owner has write permission.\n\nstat.S_IXUSR\n\n Owner has execute permission.\n\nstat.S_IRWXG\n\n Mask for group permissions.\n\nstat.S_IRGRP\n\n Group has read permission.\n\nstat.S_IWGRP\n\n Group has write permission.\n\nstat.S_IXGRP\n\n Group has execute permission.\n\nstat.S_IRWXO\n\n Mask for permissions for others (not in group).\n\nstat.S_IROTH\n\n Others have read permission.\n\nstat.S_IWOTH\n\n Others have write permission.\n\nstat.S_IXOTH\n\n Others have execute permission.\n\nstat.S_ENFMT\n\n System V file locking enforcement. This flag is shared with\n \"S_ISGID\": file/record locking is enforced on files that do not\n have the group execution bit (\"S_IXGRP\") set.\n\nstat.S_IREAD\n\n Unix V7 synonym for \"S_IRUSR\".\n\nstat.S_IWRITE\n\n Unix V7 synonym for \"S_IWUSR\".\n\nstat.S_IEXEC\n\n Unix V7 synonym for \"S_IXUSR\".\n\nThe following flags can be used in the *flags* argument of\n\"os.chflags()\":\n\nstat.UF_SETTABLE\n\n All user settable flags.\n\n Added in version 3.13.\n\nstat.UF_NODUMP\n\n Do not dump the file.\n\nstat.UF_IMMUTABLE\n\n The file may not be changed.\n\nstat.UF_APPEND\n\n The file may only be appended to.\n\nstat.UF_OPAQUE\n\n The directory is opaque when viewed through a union stack.\n\nstat.UF_NOUNLINK\n\n The file may not be renamed or deleted.\n\nstat.UF_COMPRESSED\n\n The file is stored compressed (macOS 10.6+).\n\nstat.UF_TRACKED\n\n Used for handling document IDs (macOS)\n\n Added in version 3.13.\n\nstat.UF_DATAVAULT\n\n The file needs an entitlement for reading or writing (macOS 10.13+)\n\n Added in version 3.13.\n\nstat.UF_HIDDEN\n\n The file should not be displayed in a GUI (macOS 10.5+).\n\nstat.SF_SETTABLE\n\n All super-user changeable flags\n\n Added in version 3.13.\n\nstat.SF_SUPPORTED\n\n All super-user supported flags\n\n Availability: macOS\n\n Added in version 3.13.\n\nstat.SF_SYNTHETIC\n\n All super-user read-only synthetic flags\n\n Availability: macOS\n\n Added in version 3.13.\n\nstat.SF_ARCHIVED\n\n The file may be archived.\n\nstat.SF_IMMUTABLE\n\n The file may not be changed.\n\nstat.SF_APPEND\n\n The file may only be appended to.\n\nstat.SF_RESTRICTED\n\n The file needs an entitlement to write to (macOS 10.13+)\n\n Added in version 3.13.\n\nstat.SF_NOUNLINK\n\n The file may not be renamed or deleted.\n\nstat.SF_SNAPSHOT\n\n The file is a snapshot file.\n\nstat.SF_FIRMLINK\n\n The file is a firmlink (macOS 10.15+)\n\n Added in version 3.13.\n\nstat.SF_DATALESS\n\n The file is a dataless object (macOS 10.15+)\n\n Added in version 3.13.\n\nSee the *BSD or macOS systems man page *chflags(2)* for more\ninformation.\n\nOn Windows, the following file attribute constants are available for\nuse when testing bits in the \"st_file_attributes\" member returned by\n\"os.stat()\". See the Windows API documentation for more detail on the\nmeaning of these constants.\n\nstat.FILE_ATTRIBUTE_ARCHIVE\nstat.FILE_ATTRIBUTE_COMPRESSED\nstat.FILE_ATTRIBUTE_DEVICE\nstat.FILE_ATTRIBUTE_DIRECTORY\nstat.FILE_ATTRIBUTE_ENCRYPTED\nstat.FILE_ATTRIBUTE_HIDDEN\nstat.FILE_ATTRIBUTE_INTEGRITY_STREAM\nstat.FILE_ATTRIBUTE_NORMAL\nstat.FILE_ATTRIBUTE_NOT_CONTENT_INDEXED\nstat.FILE_ATTRIBUTE_NO_SCRUB_DATA\nstat.FILE_ATTRIBUTE_OFFLINE\nstat.FILE_ATTRIBUTE_READONLY\nstat.FILE_ATTRIBUTE_REPARSE_POINT\nstat.FILE_ATTRIBUTE_SPARSE_FILE\nstat.FILE_ATTRIBUTE_SYSTEM\nstat.FILE_ATTRIBUTE_TEMPORARY\nstat.FILE_ATTRIBUTE_VIRTUAL\n\n Added in version 3.5.\n\nOn Windows, the following constants are available for comparing\nagainst the \"st_reparse_tag\" member returned by \"os.lstat()\". These\nare well-known constants, but are not an exhaustive list.\n\nstat.IO_REPARSE_TAG_SYMLINK\nstat.IO_REPARSE_TAG_MOUNT_POINT\nstat.IO_REPARSE_TAG_APPEXECLINK\n\n Added in version 3.8.", "source": "python_docs:python-3.14-docs-text/library/stat.txt", "domain": "software" }, { "text": "\"filecmp\" --- File and Directory Comparisons\n********************************************\n\n**Source code:** Lib/filecmp.py\n\n======================================================================\n\nThe \"filecmp\" module defines functions to compare files and\ndirectories, with various optional time/correctness trade-offs. For\ncomparing files, see also the \"difflib\" module.\n\nThe \"filecmp\" module defines the following functions:\n\nfilecmp.cmp(f1, f2, shallow=True)\n\n Compare the files named *f1* and *f2*, returning \"True\" if they\n seem equal, \"False\" otherwise.\n\n If *shallow* is true and the \"os.stat()\" signatures (file type,\n size, and modification time) of both files are identical, the files\n are taken to be equal.\n\n Otherwise, the files are treated as different if their sizes or\n contents differ.\n\n Note that no external programs are called from this function,\n giving it portability and efficiency.\n\n This function uses a cache for past comparisons and the results,\n with cache entries invalidated if the \"os.stat()\" information for\n the file changes. The entire cache may be cleared using\n \"clear_cache()\".\n\nfilecmp.cmpfiles(dir1, dir2, common, shallow=True)\n\n Compare the files in the two directories *dir1* and *dir2* whose\n names are given by *common*.\n\n Returns three lists of file names: *match*, *mismatch*, *errors*.\n *match* contains the list of files that match, *mismatch* contains\n the names of those that don't, and *errors* lists the names of\n files which could not be compared. Files are listed in *errors* if\n they don't exist in one of the directories, the user lacks\n permission to read them or if the comparison could not be done for\n some other reason.\n\n The *shallow* parameter has the same meaning and default value as\n for \"filecmp.cmp()\".\n\n For example, \"cmpfiles('a', 'b', ['c', 'd/e'])\" will compare \"a/c\"\n with \"b/c\" and \"a/d/e\" with \"b/d/e\". \"'c'\" and \"'d/e'\" will each\n be in one of the three returned lists.\n\nfilecmp.clear_cache()\n\n Clear the filecmp cache. This may be useful if a file is compared\n so quickly after it is modified that it is within the mtime\n resolution of the underlying filesystem.\n\n Added in version 3.4.\n\nThe \"dircmp\" class\n==================\n\nclass filecmp.dircmp(a, b, ignore=None, hide=None, *, shallow=True)\n\n Construct a new directory comparison object, to compare the\n directories *a* and *b*. *ignore* is a list of names to ignore,\n and defaults to \"filecmp.DEFAULT_IGNORES\". *hide* is a list of\n names to hide, and defaults to \"[os.curdir, os.pardir]\".\n\n The \"dircmp\" class compares files by doing *shallow* comparisons as\n described for \"filecmp.cmp()\" by default using the *shallow*\n parameter.\n\n Changed in version 3.13: Added the *shallow* parameter.\n\n The \"dircmp\" class provides the following methods:\n\n report()\n\n Print (to \"sys.stdout\") a comparison between *a* and *b*.\n\n report_partial_closure()\n\n Print a comparison between *a* and *b* and common immediate\n subdirectories.\n\n report_full_closure()\n\n Print a comparison between *a* and *b* and common subdirectories\n (recursively).\n\n The \"dircmp\" class offers a number of interesting attributes that\n may be used to get various bits of information about the directory\n trees being compared.\n\n Note that via \"__getattr__()\" hooks, all attributes are computed\n lazily, so there is no speed penalty if only those attributes which\n are lightweight to compute are used.\n\n left\n\n The directory *a*.\n\n right\n\n The directory *b*.\n\n left_list\n\n Files and subdirectories in *a*, filtered by *hide* and\n *ignore*.\n\n right_list\n\n Files and subdirectories in *b*, filtered by *hide* and\n *ignore*.\n\n common\n\n Files and subdirectories in both *a* and *b*.\n\n left_only\n\n Files and subdirectories only in *a*.\n\n right_only\n\n Files and subdirectories only in *b*.\n\n common_dirs\n\n Subdirectories in both *a* and *b*.\n\n common_files\n\n Files in both *a* and *b*.\n\n common_funny\n\n Names in both *a* and *b*, such that the type differs between\n the directories, or names for which \"os.stat()\" reports an\n error.\n\n same_files\n\n Files which are identical in both *a* and *b*, using the class's\n file comparison operator.\n\n diff_files\n\n Files which are in both *a* and *b*, whose contents differ\n according to the class's file comparison operator.\n\n funny_files\n\n Files which are in both *a* and *b*, but could not be compared.\n\n subdirs\n\n A dictionary mapping names in \"common_dirs\" to \"dircmp\"\n instances (or MyDirCmp instances if this instance is of type\n MyDirCmp, a subclass of \"dircmp\").\n\n Changed in version 3.10: Previously entries were always \"dircmp\"\n instances. Now entries are the same type as *self*, if *self* is\n a subclass of \"dircmp\".\n\nfilecmp.DEFAULT_IGNORES\n\n Added in version 3.4.\n\n List of directories ignored by \"dircmp\" by default.\n\nHere is a simplified example of using the \"subdirs\" attribute to\nsearch recursively through two directories to show common different\nfiles:\n\n >>> from filecmp import dircmp\n >>> def print_diff_files(dcmp):\n ... for name in dcmp.diff_files:\n ... print(\"diff_file %s found in %s and %s\" % (name, dcmp.left,\n ... dcmp.right))\n ... for sub_dcmp in dcmp.subdirs.values():\n ... print_diff_files(sub_dcmp)\n ...\n >>> dcmp = dircmp('dir1', 'dir2')\n >>> print_diff_files(dcmp)", "source": "python_docs:python-3.14-docs-text/library/filecmp.txt", "domain": "software" }, { "text": "\"code\" --- Interpreter base classes\n***********************************\n\n**Source code:** Lib/code.py\n\n======================================================================\n\nThe \"code\" module provides facilities to implement read-eval-print\nloops in Python. Two classes and convenience functions are included\nwhich can be used to build applications which provide an interactive\ninterpreter prompt.\n\nclass code.InteractiveInterpreter(locals=None)\n\n This class deals with parsing and interpreter state (the user's\n namespace); it does not deal with input buffering or prompting or\n input file naming (the filename is always passed in explicitly).\n The optional *locals* argument specifies a mapping to use as the\n namespace in which code will be executed; it defaults to a newly\n created dictionary with key \"'__name__'\" set to \"'__console__'\" and\n key \"'__doc__'\" set to \"None\".\n\n Note that functions and classes objects created under an\n \"InteractiveInterpreter\" instance will belong to the namespace\n specified by *locals*. They are only pickleable if *locals* is the\n namespace of an existing module.\n\nclass code.InteractiveConsole(locals=None, filename='', local_exit=False)\n\n Closely emulate the behavior of the interactive Python interpreter.\n This class builds on \"InteractiveInterpreter\" and adds prompting\n using the familiar \"sys.ps1\" and \"sys.ps2\", and input buffering. If\n *local_exit* is true, \"exit()\" and \"quit()\" in the console will not\n raise \"SystemExit\", but instead return to the calling code.\n\n Changed in version 3.13: Added *local_exit* parameter.\n\ncode.interact(banner=None, readfunc=None, local=None, exitmsg=None, local_exit=False)\n\n Convenience function to run a read-eval-print loop. This creates a\n new instance of \"InteractiveConsole\" and sets *readfunc* to be used\n as the \"InteractiveConsole.raw_input()\" method, if provided. If\n *local* is provided, it is passed to the \"InteractiveConsole\"\n constructor for use as the default namespace for the interpreter\n loop. If *local_exit* is provided, it is passed to the\n \"InteractiveConsole\" constructor. The \"interact()\" method of the\n instance is then run with *banner* and *exitmsg* passed as the\n banner and exit message to use, if provided. The console object is\n discarded after use.\n\n Changed in version 3.6: Added *exitmsg* parameter.\n\n Changed in version 3.13: Added *local_exit* parameter.\n\ncode.compile_command(source, filename='', symbol='single')\n\n This function is useful for programs that want to emulate Python's\n interpreter main loop (a.k.a. the read-eval-print loop). The\n tricky part is to determine when the user has entered an incomplete\n command that can be completed by entering more text (as opposed to\n a complete command or a syntax error). This function *almost*\n always makes the same decision as the real interpreter main loop.\n\n *source* is the source string; *filename* is the optional filename\n from which source was read, defaulting to \"''\"; and *symbol*\n is the optional grammar start symbol, which should be \"'single'\"\n (the default), \"'eval'\" or \"'exec'\".\n\n Returns a code object (the same as \"compile(source, filename,\n symbol)\") if the command is complete and valid; \"None\" if the\n command is incomplete; raises \"SyntaxError\" if the command is\n complete and contains a syntax error, or raises \"OverflowError\" or\n \"ValueError\" if the command contains an invalid literal.\n\nInteractive Interpreter Objects\n===============================\n\nInteractiveInterpreter.runsource(source, filename='', symbol='single')\n\n Compile and run some source in the interpreter. Arguments are the\n same as for \"compile_command()\"; the default for *filename* is\n \"''\", and for *symbol* is \"'single'\". One of several things\n can happen:\n\n * The input is incorrect; \"compile_command()\" raised an exception\n (\"SyntaxError\" or \"OverflowError\"). A syntax traceback will be\n printed by calling the \"showsyntaxerror()\" method. \"runsource()\"\n returns \"False\".\n\n * The input is incomplete, and more input is required;\n \"compile_command()\" returned \"None\". \"runsource()\" returns\n \"True\".\n\n * The input is complete; \"compile_command()\" returned a code\n object. The code is executed by calling the \"runcode()\" (which\n also handles run-time exceptions, except for \"SystemExit\").\n \"runsource()\" returns \"False\".\n\n The return value can be used to decide whether to use \"sys.ps1\" or\n \"sys.ps2\" to prompt the next line.\n\nInteractiveInterpreter.runcode(code)\n\n Execute a code object. When an exception occurs, \"showtraceback()\"\n is called to display a traceback. All exceptions are caught except\n \"SystemExit\", which is allowed to propagate.\n\n A note about \"KeyboardInterrupt\": this exception may occur\n elsewhere in this code, and may not always be caught. The caller\n should be prepared to deal with it.\n\nInteractiveInterpreter.showsyntaxerror(filename=None)\n\n Display the syntax error that just occurred. This does not display\n a stack trace because there isn't one for syntax errors. If\n *filename* is given, it is stuffed into the exception instead of\n the default filename provided by Python's parser, because it always\n uses \"''\" when reading from a string. The output is written\n by the \"write()\" method.\n\nInteractiveInterpreter.showtraceback()\n\n Display the exception that just occurred. We remove the first\n stack item because it is within the interpreter object\n implementation. The output is written by the \"write()\" method.\n\n Changed in version 3.5: The full chained traceback is displayed\n instead of just the primary traceback.\n\nInteractiveInterpreter.write(data)\n\n Write a string to the standard error stream (\"sys.stderr\"). Derived\n classes should override this to provide the appropriate output\n handling as needed.\n\nInteractive Console Objects\n===========================\n\nThe \"InteractiveConsole\" class is a subclass of\n\"InteractiveInterpreter\", and so offers all the methods of the\ninterpreter objects as well as the following additions.\n\nInteractiveConsole.interact(banner=None, exitmsg=None)\n\n Closely emulate the interactive Python console. The optional\n *banner* argument specify the banner to print before the first\n interaction; by default it prints a banner similar to the one\n printed by the standard Python interpreter, followed by the class\n name of the console object in parentheses (so as not to confuse\n this with the real interpreter -- since it's so close!).\n\n The optional *exitmsg* argument specifies an exit message printed\n when exiting. Pass the empty string to suppress the exit message.\n If *exitmsg* is not given or \"None\", a default message is printed.\n\n Changed in version 3.4: To suppress printing any banner, pass an\n empty string.\n\n Changed in version 3.6: Print an exit message when exiting.\n\nInteractiveConsole.push(line)\n\n Push a line of source text to the interpreter. The line should not\n have a trailing newline; it may have internal newlines. The line\n is appended to a buffer and the interpreter's \"runsource()\" method\n is called with the concatenated contents of the buffer as source.\n If this indicates that the command was executed or invalid, the\n buffer is reset; otherwise, the command is incomplete, and the\n buffer is left as it was after the line was appended. The return\n value is \"True\" if more input is required, \"False\" if the line was\n dealt with in some way (this is the same as \"runsource()\").\n\nInteractiveConsole.resetbuffer()\n\n Remove any unhandled source text from the input buffer.\n\nInteractiveConsole.raw_input(prompt='')\n\n Write a prompt and read a line. The returned line does not include\n the trailing newline. When the user enters the EOF key sequence,\n \"EOFError\" is raised. The base implementation reads from\n \"sys.stdin\"; a subclass may replace this with a different\n implementation.", "source": "python_docs:python-3.14-docs-text/library/code.txt", "domain": "software" }, { "text": "\"email.message.Message\": Representing an email message using the \"compat32\" API\n*******************************************************************************\n\nThe \"Message\" class is very similar to the \"EmailMessage\" class,\nwithout the methods added by that class, and with the default behavior\nof certain other methods being slightly different. We also document\nhere some methods that, while supported by the \"EmailMessage\" class,\nare not recommended unless you are dealing with legacy code.\n\nThe philosophy and structure of the two classes is otherwise the same.\n\nThis document describes the behavior under the default (for \"Message\")\npolicy \"Compat32\". If you are going to use another policy, you should\nbe using the \"EmailMessage\" class instead.\n\nAn email message consists of *headers* and a *payload*. Headers must\nbe **RFC 5322** style names and values, where the field name and value\nare separated by a colon. The colon is not part of either the field\nname or the field value. The payload may be a simple text message, or\na binary object, or a structured sequence of sub-messages each with\ntheir own set of headers and their own payload. The latter type of\npayload is indicated by the message having a MIME type such as\n*multipart/** or *message/rfc822*.\n\nThe conceptual model provided by a \"Message\" object is that of an\nordered dictionary of headers with additional methods for accessing\nboth specialized information from the headers, for accessing the\npayload, for generating a serialized version of the message, and for\nrecursively walking over the object tree. Note that duplicate headers\nare supported but special methods must be used to access them.\n\nThe \"Message\" pseudo-dictionary is indexed by the header names, which\nmust be ASCII values. The values of the dictionary are strings that\nare supposed to contain only ASCII characters; there is some special\nhandling for non-ASCII input, but it doesn't always produce the\ncorrect results. Headers are stored and returned in case-preserving\nform, but field names are matched case-insensitively. There may also\nbe a single envelope header, also known as the *Unix-From* header or\nthe \"From_\" header. The *payload* is either a string or bytes, in the\ncase of simple message objects, or a list of \"Message\" objects, for\nMIME container documents (e.g. *multipart/** and *message/rfc822*).\n\nHere are the methods of the \"Message\" class:\n\nclass email.message.Message(policy=compat32)\n\n If *policy* is specified (it must be an instance of a \"policy\"\n class) use the rules it specifies to update and serialize the\n representation of the message. If *policy* is not set, use the\n \"compat32\" policy, which maintains backward compatibility with the\n Python 3.2 version of the email package. For more information see\n the \"policy\" documentation.\n\n Changed in version 3.3: The *policy* keyword argument was added.\n\n as_string(unixfrom=False, maxheaderlen=0, policy=None)\n\n Return the entire message flattened as a string. When optional\n *unixfrom* is true, the envelope header is included in the\n returned string. *unixfrom* defaults to \"False\". For backward\n compatibility reasons, *maxheaderlen* defaults to \"0\", so if you\n want a different value you must override it explicitly (the\n value specified for *max_line_length* in the policy will be\n ignored by this method). The *policy* argument may be used to\n override the default policy obtained from the message instance.\n This can be used to control some of the formatting produced by\n the method, since the specified *policy* will be passed to the\n \"Generator\".\n\n Flattening the message may trigger changes to the \"Message\" if\n defaults need to be filled in to complete the transformation to\n a string (for example, MIME boundaries may be generated or\n modified).\n\n Note that this method is provided as a convenience and may not\n always format the message the way you want. For example, by\n default it does not do the mangling of lines that begin with\n \"From\" that is required by the Unix mbox format. For more\n flexibility, instantiate a \"Generator\" instance and use its\n \"flatten()\" method directly. For example:\n\n from io import StringIO\n from email.generator import Generator\n fp = StringIO()\n g = Generator(fp, mangle_from_=True, maxheaderlen=60)\n g.flatten(msg)\n text = fp.getvalue()\n\n If the message object contains binary data that is not encoded\n according to RFC standards, the non-compliant data will be\n replaced by unicode \"unknown character\" code points. (See also\n \"as_bytes()\" and \"BytesGenerator\".)\n\n Changed in version 3.4: the *policy* keyword argument was added.\n\n __str__()\n\n Equivalent to \"as_string()\". Allows \"str(msg)\" to produce a\n string containing the formatted message.\n\n as_bytes(unixfrom=False, policy=None)\n\n Return the entire message flattened as a bytes object. When\n optional *unixfrom* is true, the envelope header is included in\n the returned string. *unixfrom* defaults to \"False\". The\n *policy* argument may be used to override the default policy\n obtained from the message instance. This can be used to control\n some of the formatting produced by the method, since the\n specified *policy* will be passed to the \"BytesGenerator\".\n\n Flattening the message may trigger changes to the \"Message\" if\n defaults need to be filled in to complete the transformation to\n a string (for example, MIME boundaries may be generated or\n modified).\n\n Note that this method is provided as a convenience and may not\n always format the message the way you want. For example, by\n default it does not do the mangling of lines that begin with\n \"From\" that is required by the Unix mbox format. For more\n flexibility, instantiate a \"BytesGenerator\" instance and use its\n \"flatten()\" method directly. For example:\n\n from io import BytesIO\n from email.generator import BytesGenerator\n fp = BytesIO()\n g = BytesGenerator(fp, mangle_from_=True, maxheaderlen=60)\n g.flatten(msg)\n text = fp.getvalue()\n\n Added in version 3.4.\n\n __bytes__()\n\n Equivalent to \"as_bytes()\". Allows \"bytes(msg)\" to produce a\n bytes object containing the formatted message.\n\n Added in version 3.4.\n\n is_multipart()\n\n Return \"True\" if the message's payload is a list of\n sub-\"Message\" objects, otherwise return \"False\". When\n \"is_multipart()\" returns \"False\", the payload should be a string\n object (which might be a CTE encoded binary payload). (Note\n that \"is_multipart()\" returning \"True\" does not necessarily mean\n that \"msg.get_content_maintype() == 'multipart'\" will return the\n \"True\". For example, \"is_multipart\" will return \"True\" when the\n \"Message\" is of type \"message/rfc822\".)\n\n set_unixfrom(unixfrom)\n\n Set the message's envelope header to *unixfrom*, which should be\n a string.\n\n get_unixfrom()\n\n Return the message's envelope header. Defaults to \"None\" if the\n envelope header was never set.\n\n attach(payload)\n\n Add the given *payload* to the current payload, which must be\n \"None\" or a list of \"Message\" objects before the call. After the\n call, the payload will always be a list of \"Message\" objects.\n If you want to set the payload to a scalar object (e.g. a\n string), use \"set_payload()\" instead.\n\n This is a legacy method. On the \"EmailMessage\" class its\n functionality is replaced by \"set_content()\" and the related\n \"make\" and \"add\" methods.\n\n get_payload(i=None, decode=False)\n\n Return the current payload, which will be a list of \"Message\"\n objects when \"is_multipart()\" is \"True\", or a string when\n \"is_multipart()\" is \"False\". If the payload is a list and you\n mutate the list object, you modify the message's payload in\n place.\n\n With optional argument *i*, \"get_payload()\" will return the\n *i*-th element of the payload, counting from zero, if\n \"is_multipart()\" is \"True\". An \"IndexError\" will be raised if\n *i* is less than 0 or greater than or equal to the number of\n items in the payload. If the payload is a string (i.e.\n \"is_multipart()\" is \"False\") and *i* is given, a \"TypeError\" is\n raised.\n\n Optional *decode* is a flag indicating whether the payload\n should be decoded or not, according to the *Content-Transfer-\n Encoding* header. When \"True\" and the message is not a\n multipart, the payload will be decoded if this header's value is\n \"quoted-printable\" or \"base64\". If some other encoding is used,\n or *Content-Transfer-Encoding* header is missing, the payload is\n returned as-is (undecoded). In all cases the returned value is\n binary data. If the message is a multipart and the *decode*\n flag is \"True\", then \"None\" is returned. If the payload is\n base64 and it was not perfectly formed (missing padding,\n characters outside the base64 alphabet), then an appropriate\n defect will be added to the message's defect property\n (\"InvalidBase64PaddingDefect\" or\n \"InvalidBase64CharactersDefect\", respectively).\n\n When *decode* is \"False\" (the default) the body is returned as a\n string without decoding the *Content-Transfer-Encoding*.\n However, for a *Content-Transfer-Encoding* of 8bit, an attempt\n is made to decode the original bytes using the \"charset\"\n specified by the *Content-Type* header, using the \"replace\"\n error handler. If no \"charset\" is specified, or if the \"charset\"\n given is not recognized by the email package, the body is\n decoded using the default ASCII charset.\n\n This is a legacy method. On the \"EmailMessage\" class its\n functionality is replaced by \"get_content()\" and \"iter_parts()\".\n\n set_payload(payload, charset=None)\n\n Set the entire message object's payload to *payload*. It is the\n client's responsibility to ensure the payload invariants.\n Optional *charset* sets the message's default character set; see\n \"set_charset()\" for details.\n\n This is a legacy method. On the \"EmailMessage\" class its\n functionality is replaced by \"set_content()\".\n\n set_charset(charset)\n\n Set the character set of the payload to *charset*, which can\n either be a \"Charset\" instance (see \"email.charset\"), a string\n naming a character set, or \"None\". If it is a string, it will\n be converted to a \"Charset\" instance. If *charset* is \"None\",\n the \"charset\" parameter will be removed from the *Content-Type*\n header (the message will not be otherwise modified). Anything\n else will generate a \"TypeError\".\n\n If there is no existing *MIME-Version* header one will be added.\n If there is no existing *Content-Type* header, one will be added\n with a value of *text/plain*. Whether the *Content-Type* header\n already exists or not, its \"charset\" parameter will be set to\n *charset.output_charset*. If *charset.input_charset* and\n *charset.output_charset* differ, the payload will be re-encoded\n to the *output_charset*. If there is no existing *Content-\n Transfer-Encoding* header, then the payload will be transfer-\n encoded, if needed, using the specified \"Charset\", and a header\n with the appropriate value will be added. If a *Content-\n Transfer-Encoding* header already exists, the payload is assumed\n to already be correctly encoded using that *Content-Transfer-\n Encoding* and is not modified.\n\n This is a legacy method. On the \"EmailMessage\" class its\n functionality is replaced by the *charset* parameter of the\n \"email.message.EmailMessage.set_content()\" method.\n\n get_charset()\n\n Return the \"Charset\" instance associated with the message's\n payload.\n\n This is a legacy method. On the \"EmailMessage\" class it always\n returns \"None\".\n\n The following methods implement a mapping-like interface for\n accessing the message's **RFC 2822** headers. Note that there are\n some semantic differences between these methods and a normal\n mapping (i.e. dictionary) interface. For example, in a dictionary\n there are no duplicate keys, but here there may be duplicate\n message headers. Also, in dictionaries there is no guaranteed\n order to the keys returned by \"keys()\", but in a \"Message\" object,\n headers are always returned in the order they appeared in the\n original message, or were added to the message later. Any header\n deleted and then re-added are always appended to the end of the\n header list.\n\n These semantic differences are intentional and are biased toward\n maximal convenience.\n\n Note that in all cases, any envelope header present in the message\n is not included in the mapping interface.\n\n In a model generated from bytes, any header values that (in\n contravention of the RFCs) contain non-ASCII bytes will, when\n retrieved through this interface, be represented as \"Header\"\n objects with a charset of \"unknown-8bit\".\n\n __len__()\n\n Return the total number of headers, including duplicates.\n\n __contains__(name)\n\n Return \"True\" if the message object has a field named *name*.\n Matching is done case-insensitively and *name* should not\n include the trailing colon. Used for the \"in\" operator, e.g.:\n\n if 'message-id' in myMessage:\n print('Message-ID:', myMessage['message-id'])\n\n __getitem__(name)\n\n Return the value of the named header field. *name* should not\n include the colon field separator. If the header is missing,\n \"None\" is returned; a \"KeyError\" is never raised.\n\n Note that if the named field appears more than once in the\n message's headers, exactly which of those field values will be\n returned is undefined. Use the \"get_all()\" method to get the\n values of all the extant named headers.\n\n __setitem__(name, val)\n\n Add a header to the message with field name *name* and value\n *val*. The field is appended to the end of the message's\n existing fields.\n\n Note that this does *not* overwrite or delete any existing\n header with the same name. If you want to ensure that the new\n header is the only one present in the message with field name\n *name*, delete the field first, e.g.:\n\n del msg['subject']\n msg['subject'] = 'Python roolz!'\n\n __delitem__(name)\n\n Delete all occurrences of the field with name *name* from the\n message's headers. No exception is raised if the named field\n isn't present in the headers.\n\n keys()\n\n Return a list of all the message's header field names.\n\n values()\n\n Return a list of all the message's field values.\n\n items()\n\n Return a list of 2-tuples containing all the message's field\n headers and values.\n\n get(name, failobj=None)\n\n Return the value of the named header field. This is identical\n to \"__getitem__()\" except that optional *failobj* is returned if\n the named header is missing (defaults to \"None\").\n\n Here are some additional useful methods:\n\n get_all(name, failobj=None)\n\n Return a list of all the values for the field named *name*. If\n there are no such named headers in the message, *failobj* is\n returned (defaults to \"None\").\n\n add_header(_name, _value, **_params)\n\n Extended header setting. This method is similar to\n \"__setitem__()\" except that additional header parameters can be\n provided as keyword arguments. *_name* is the header field to\n add and *_value* is the *primary* value for the header.\n\n For each item in the keyword argument dictionary *_params*, the\n key is taken as the parameter name, with underscores converted\n to dashes (since dashes are illegal in Python identifiers).\n Normally, the parameter will be added as \"key=\"value\"\" unless\n the value is \"None\", in which case only the key will be added.\n If the value contains non-ASCII characters, it can be specified\n as a three tuple in the format \"(CHARSET, LANGUAGE, VALUE)\",\n where \"CHARSET\" is a string naming the charset to be used to\n encode the value, \"LANGUAGE\" can usually be set to \"None\" or the\n empty string (see **RFC 2231** for other possibilities), and\n \"VALUE\" is the string value containing non-ASCII code points.\n If a three tuple is not passed and the value contains non-ASCII\n characters, it is automatically encoded in **RFC 2231** format\n using a \"CHARSET\" of \"utf-8\" and a \"LANGUAGE\" of \"None\".\n\n Here's an example:\n\n msg.add_header('Content-Disposition', 'attachment', filename='bud.gif')\n\n This will add a header that looks like\n\n Content-Disposition: attachment; filename=\"bud.gif\"\n\n An example with non-ASCII characters:\n\n msg.add_header('Content-Disposition', 'attachment',\n filename=('iso-8859-1', '', 'Fußballer.ppt'))\n\n Which produces\n\n Content-Disposition: attachment; filename*=\"iso-8859-1''Fu%DFballer.ppt\"\n\n replace_header(_name, _value)\n\n Replace a header. Replace the first header found in the message\n that matches *_name*, retaining header order and field name\n case. If no matching header was found, a \"KeyError\" is raised.\n\n get_content_type()\n\n Return the message's content type. The returned string is\n coerced to lower case of the form *maintype/subtype*. If there\n was no *Content-Type* header in the message the default type as\n given by \"get_default_type()\" will be returned. Since according\n to **RFC 2045**, messages always have a default type,\n \"get_content_type()\" will always return a value.\n\n **RFC 2045** defines a message's default type to be *text/plain*\n unless it appears inside a *multipart/digest* container, in\n which case it would be *message/rfc822*. If the *Content-Type*\n header has an invalid type specification, **RFC 2045** mandates\n that the default type be *text/plain*.\n\n get_content_maintype()\n\n Return the message's main content type. This is the *maintype*\n part of the string returned by \"get_content_type()\".\n\n get_content_subtype()\n\n Return the message's sub-content type. This is the *subtype*\n part of the string returned by \"get_content_type()\".\n\n get_default_type()\n\n Return the default content type. Most messages have a default\n content type of *text/plain*, except for messages that are\n subparts of *multipart/digest* containers. Such subparts have a\n default content type of *message/rfc822*.\n\n set_default_type(ctype)\n\n Set the default content type. *ctype* should either be\n *text/plain* or *message/rfc822*, although this is not enforced.\n The default content type is not stored in the *Content-Type*\n header.\n\n get_params(failobj=None, header='content-type', unquote=True)\n\n Return the message's *Content-Type* parameters, as a list. The\n elements of the returned list are 2-tuples of key/value pairs,\n as split on the \"'='\" sign. The left hand side of the \"'='\" is\n the key, while the right hand side is the value. If there is no\n \"'='\" sign in the parameter the value is the empty string,\n otherwise the value is as described in \"get_param()\" and is\n unquoted if optional *unquote* is \"True\" (the default).\n\n Optional *failobj* is the object to return if there is no\n *Content-Type* header. Optional *header* is the header to\n search instead of *Content-Type*.\n\n This is a legacy method. On the \"EmailMessage\" class its\n functionality is replaced by the *params* property of the\n individual header objects returned by the header access methods.\n\n get_param(param, failobj=None, header='content-type', unquote=True)\n\n Return the value of the *Content-Type* header's parameter\n *param* as a string. If the message has no *Content-Type*\n header or if there is no such parameter, then *failobj* is\n returned (defaults to \"None\").\n\n Optional *header* if given, specifies the message header to use\n instead of *Content-Type*.\n\n Parameter keys are always compared case insensitively. The\n return value can either be a string, or a 3-tuple if the\n parameter was **RFC 2231** encoded. When it's a 3-tuple, the\n elements of the value are of the form \"(CHARSET, LANGUAGE,\n VALUE)\". Note that both \"CHARSET\" and \"LANGUAGE\" can be \"None\",\n in which case you should consider \"VALUE\" to be encoded in the\n \"us-ascii\" charset. You can usually ignore \"LANGUAGE\".\n\n If your application doesn't care whether the parameter was\n encoded as in **RFC 2231**, you can collapse the parameter value\n by calling \"email.utils.collapse_rfc2231_value()\", passing in\n the return value from \"get_param()\". This will return a\n suitably decoded Unicode string when the value is a tuple, or\n the original string unquoted if it isn't. For example:\n\n rawparam = msg.get_param('foo')\n param = email.utils.collapse_rfc2231_value(rawparam)\n\n In any case, the parameter value (either the returned string, or\n the \"VALUE\" item in the 3-tuple) is always unquoted, unless\n *unquote* is set to \"False\".\n\n This is a legacy method. On the \"EmailMessage\" class its\n functionality is replaced by the *params* property of the\n individual header objects returned by the header access methods.\n\n set_param(param, value, header='Content-Type', requote=True, charset=None, language='', replace=False)\n\n Set a parameter in the *Content-Type* header. If the parameter\n already exists in the header, its value will be replaced with\n *value*. If the *Content-Type* header as not yet been defined\n for this message, it will be set to *text/plain* and the new\n parameter value will be appended as per **RFC 2045**.\n\n Optional *header* specifies an alternative header to *Content-\n Type*, and all parameters will be quoted as necessary unless\n optional *requote* is \"False\" (the default is \"True\").\n\n If optional *charset* is specified, the parameter will be\n encoded according to **RFC 2231**. Optional *language* specifies\n the RFC 2231 language, defaulting to the empty string. Both\n *charset* and *language* should be strings.\n\n If *replace* is \"False\" (the default) the header is moved to the\n end of the list of headers. If *replace* is \"True\", the header\n will be updated in place.\n\n Changed in version 3.4: \"replace\" keyword was added.\n\n del_param(param, header='content-type', requote=True)\n\n Remove the given parameter completely from the *Content-Type*\n header. The header will be re-written in place without the\n parameter or its value. All values will be quoted as necessary\n unless *requote* is \"False\" (the default is \"True\"). Optional\n *header* specifies an alternative to *Content-Type*.\n\n set_type(type, header='Content-Type', requote=True)\n\n Set the main type and subtype for the *Content-Type* header.\n *type* must be a string in the form *maintype/subtype*,\n otherwise a \"ValueError\" is raised.\n\n This method replaces the *Content-Type* header, keeping all the\n parameters in place. If *requote* is \"False\", this leaves the\n existing header's quoting as is, otherwise the parameters will\n be quoted (the default).\n\n An alternative header can be specified in the *header* argument.\n When the *Content-Type* header is set a *MIME-Version* header is\n also added.\n\n This is a legacy method. On the \"EmailMessage\" class its\n functionality is replaced by the \"make_\" and \"add_\" methods.\n\n get_filename(failobj=None)\n\n Return the value of the \"filename\" parameter of the *Content-\n Disposition* header of the message. If the header does not have\n a \"filename\" parameter, this method falls back to looking for\n the \"name\" parameter on the *Content-Type* header. If neither\n is found, or the header is missing, then *failobj* is returned.\n The returned string will always be unquoted as per\n \"email.utils.unquote()\".\n\n get_boundary(failobj=None)\n\n Return the value of the \"boundary\" parameter of the *Content-\n Type* header of the message, or *failobj* if either the header\n is missing, or has no \"boundary\" parameter. The returned string\n will always be unquoted as per \"email.utils.unquote()\".\n\n set_boundary(boundary)\n\n Set the \"boundary\" parameter of the *Content-Type* header to\n *boundary*. \"set_boundary()\" will always quote *boundary* if\n necessary. A \"HeaderParseError\" is raised if the message object\n has no *Content-Type* header.\n\n Note that using this method is subtly different than deleting\n the old *Content-Type* header and adding a new one with the new\n boundary via \"add_header()\", because \"set_boundary()\" preserves\n the order of the *Content-Type* header in the list of headers.\n However, it does *not* preserve any continuation lines which may\n have been present in the original *Content-Type* header.\n\n get_content_charset(failobj=None)\n\n Return the \"charset\" parameter of the *Content-Type* header,\n coerced to lower case. If there is no *Content-Type* header, or\n if that header has no \"charset\" parameter, *failobj* is\n returned.\n\n Note that this method differs from \"get_charset()\" which returns\n the \"Charset\" instance for the default encoding of the message\n body.\n\n get_charsets(failobj=None)\n\n Return a list containing the character set names in the message.\n If the message is a *multipart*, then the list will contain one\n element for each subpart in the payload, otherwise, it will be a\n list of length 1.\n\n Each item in the list will be a string which is the value of the\n \"charset\" parameter in the *Content-Type* header for the\n represented subpart. However, if the subpart has no *Content-\n Type* header, no \"charset\" parameter, or is not of the *text*\n main MIME type, then that item in the returned list will be\n *failobj*.\n\n get_content_disposition()\n\n Return the lowercased value (without parameters) of the\n message's *Content-Disposition* header if it has one, or \"None\".\n The possible values for this method are *inline*, *attachment*\n or \"None\" if the message follows **RFC 2183**.\n\n Added in version 3.5.\n\n walk()\n\n The \"walk()\" method is an all-purpose generator which can be\n used to iterate over all the parts and subparts of a message\n object tree, in depth-first traversal order. You will typically\n use \"walk()\" as the iterator in a \"for\" loop; each iteration\n returns the next subpart.\n\n Here's an example that prints the MIME type of every part of a\n multipart message structure:\n\n >>> for part in msg.walk():\n ... print(part.get_content_type())\n multipart/report\n text/plain\n message/delivery-status\n text/plain\n text/plain\n message/rfc822\n text/plain\n\n \"walk\" iterates over the subparts of any part where\n \"is_multipart()\" returns \"True\", even though\n \"msg.get_content_maintype() == 'multipart'\" may return \"False\".\n We can see this in our example by making use of the \"_structure\"\n debug helper function:\n\n >>> for part in msg.walk():\n ... print(part.get_content_maintype() == 'multipart',\n ... part.is_multipart())\n True True\n False False\n False True\n False False\n False False\n False True\n False False\n >>> _structure(msg)\n multipart/report\n text/plain\n message/delivery-status\n text/plain\n text/plain\n message/rfc822\n text/plain\n\n Here the \"message\" parts are not \"multiparts\", but they do\n contain subparts. \"is_multipart()\" returns \"True\" and \"walk\"\n descends into the subparts.\n\n \"Message\" objects can also optionally contain two instance\n attributes, which can be used when generating the plain text of a\n MIME message.\n\n preamble\n\n The format of a MIME document allows for some text between the\n blank line following the headers, and the first multipart\n boundary string. Normally, this text is never visible in a MIME-\n aware mail reader because it falls outside the standard MIME\n armor. However, when viewing the raw text of the message, or\n when viewing the message in a non-MIME aware reader, this text\n can become visible.\n\n The *preamble* attribute contains this leading extra-armor text\n for MIME documents. When the \"Parser\" discovers some text after\n the headers but before the first boundary string, it assigns\n this text to the message's *preamble* attribute. When the\n \"Generator\" is writing out the plain text representation of a\n MIME message, and it finds the message has a *preamble*\n attribute, it will write this text in the area between the\n headers and the first boundary. See \"email.parser\" and\n \"email.generator\" for details.\n\n Note that if the message object has no preamble, the *preamble*\n attribute will be \"None\".\n\n epilogue\n\n The *epilogue* attribute acts the same way as the *preamble*\n attribute, except that it contains text that appears between the\n last boundary and the end of the message.\n\n You do not need to set the epilogue to the empty string in order\n for the \"Generator\" to print a newline at the end of the file.\n\n defects\n\n The *defects* attribute contains a list of all the problems\n found when parsing this message. See \"email.errors\" for a\n detailed description of the possible parsing defects.", "source": "python_docs:python-3.14-docs-text/library/email.compat32-message.txt", "domain": "software" }, { "text": "\"signal\" --- Set handlers for asynchronous events\n*************************************************\n\n**Source code:** Lib/signal.py\n\n======================================================================\n\nThis module provides mechanisms to use signal handlers in Python.\n\nGeneral rules\n=============\n\nThe \"signal.signal()\" function allows defining custom handlers to be\nexecuted when a signal is received. A small number of default\nhandlers are installed: \"SIGPIPE\" is ignored (so write errors on pipes\nand sockets can be reported as ordinary Python exceptions) and\n\"SIGINT\" is translated into a \"KeyboardInterrupt\" exception if the\nparent process has not changed it.\n\nA handler for a particular signal, once set, remains installed until\nit is explicitly reset (Python emulates the BSD style interface\nregardless of the underlying implementation), with the exception of\nthe handler for \"SIGCHLD\", which follows the underlying\nimplementation.\n\nOn WebAssembly platforms, signals are emulated and therefore behave\ndifferently. Several functions and signals are not available on these\nplatforms.\n\nExecution of Python signal handlers\n-----------------------------------\n\nA Python signal handler does not get executed inside the low-level (C)\nsignal handler. Instead, the low-level signal handler sets a flag\nwhich tells the *virtual machine* to execute the corresponding Python\nsignal handler at a later point (for example, at the next *bytecode*\ninstruction). This has consequences:\n\n* It makes little sense to catch synchronous errors like \"SIGFPE\" or\n \"SIGSEGV\" that are caused by an invalid operation in C code. Python\n will return from the signal handler to the C code, which is likely\n to raise the same signal again, causing Python to apparently hang.\n From Python 3.3 onwards, you can use the \"faulthandler\" module to\n report on synchronous errors.\n\n* A long-running calculation implemented purely in C (such as regular\n expression matching on a large body of text) may run uninterrupted\n for an arbitrary amount of time, regardless of any signals received.\n The Python signal handlers will be called when the calculation\n finishes.\n\n* If the handler raises an exception, it will be raised \"out of thin\n air\" in the main thread. See the note below for a discussion.\n\nSignals and threads\n-------------------\n\nPython signal handlers are always executed in the main Python thread\nof the main interpreter, even if the signal was received in another\nthread. This means that signals can't be used as a means of inter-\nthread communication. You can use the synchronization primitives from\nthe \"threading\" module instead.\n\nBesides, only the main thread of the main interpreter is allowed to\nset a new signal handler.\n\nWarning:\n\n Synchronization primitives such as \"threading.Lock\" should not be\n used within signal handlers. Doing so can lead to unexpected\n deadlocks.\n\nModule contents\n===============\n\nChanged in version 3.5: signal (SIG*), handler (\"SIG_DFL\", \"SIG_IGN\")\nand sigmask (\"SIG_BLOCK\", \"SIG_UNBLOCK\", \"SIG_SETMASK\") related\nconstants listed below were turned into \"enums\" (\"Signals\", \"Handlers\"\nand \"Sigmasks\" respectively). \"getsignal()\", \"pthread_sigmask()\",\n\"sigpending()\" and \"sigwait()\" functions return human-readable \"enums\"\nas \"Signals\" objects.\n\nThe signal module defines three enums:\n\nclass signal.Signals\n\n \"enum.IntEnum\" collection of SIG* constants and the CTRL_*\n constants.\n\n Added in version 3.5.\n\nclass signal.Handlers\n\n \"enum.IntEnum\" collection of the constants \"SIG_DFL\" and \"SIG_IGN\".\n\n Added in version 3.5.\n\nclass signal.Sigmasks\n\n \"enum.IntEnum\" collection of the constants \"SIG_BLOCK\",\n \"SIG_UNBLOCK\" and \"SIG_SETMASK\".\n\n Availability: Unix.\n\n See the man page *sigprocmask(2)* and *pthread_sigmask(3)* for\n further information.\n\n Added in version 3.5.\n\nThe variables defined in the \"signal\" module are:\n\nsignal.SIG_DFL\n\n This is one of two standard signal handling options; it will simply\n perform the default function for the signal. For example, on most\n systems the default action for \"SIGQUIT\" is to dump core and exit,\n while the default action for \"SIGCHLD\" is to simply ignore it.\n\nsignal.SIG_IGN\n\n This is another standard signal handler, which will simply ignore\n the given signal.\n\nsignal.SIGABRT\n\n Abort signal from *abort(3)*.\n\nsignal.SIGALRM\n\n Timer signal from *alarm(2)*.\n\n Availability: Unix.\n\nsignal.SIGBREAK\n\n Interrupt from keyboard (CTRL + BREAK).\n\n Availability: Windows.\n\nsignal.SIGBUS\n\n Bus error (bad memory access).\n\n Availability: Unix.\n\nsignal.SIGCHLD\n\n Child process stopped or terminated.\n\n Availability: Unix.\n\nsignal.SIGCLD\n\n Alias to \"SIGCHLD\".\n\n Availability: not macOS.\n\nsignal.SIGCONT\n\n Continue the process if it is currently stopped\n\n Availability: Unix.\n\nsignal.SIGFPE\n\n Floating-point exception. For example, division by zero.\n\n See also:\n\n \"ZeroDivisionError\" is raised when the second argument of a\n division or modulo operation is zero.\n\nsignal.SIGHUP\n\n Hangup detected on controlling terminal or death of controlling\n process.\n\n Availability: Unix.\n\nsignal.SIGILL\n\n Illegal instruction.\n\nsignal.SIGINT\n\n Interrupt from keyboard (CTRL + C).\n\n Default action is to raise \"KeyboardInterrupt\".\n\nsignal.SIGKILL\n\n Kill signal.\n\n It cannot be caught, blocked, or ignored.\n\n Availability: Unix.\n\nsignal.SIGPIPE\n\n Broken pipe: write to pipe with no readers.\n\n Default action is to ignore the signal.\n\n Availability: Unix.\n\nsignal.SIGPROF\n\n Profiling timer expired.\n\n Availability: Unix.\n\nsignal.SIGQUIT\n\n Terminal quit signal.\n\n Availability: Unix.\n\nsignal.SIGSEGV\n\n Segmentation fault: invalid memory reference.\n\nsignal.SIGSTOP\n\n Stop executing (cannot be caught or ignored).\n\nsignal.SIGSTKFLT\n\n Stack fault on coprocessor. The Linux kernel does not raise this\n signal: it can only be raised in user space.\n\n Availability: Linux.\n\n On architectures where the signal is available. See the man page\n *signal(7)* for further information.\n\n Added in version 3.11.\n\nsignal.SIGTERM\n\n Termination signal.\n\nsignal.SIGUSR1\n\n User-defined signal 1.\n\n Availability: Unix.\n\nsignal.SIGUSR2\n\n User-defined signal 2.\n\n Availability: Unix.\n\nsignal.SIGVTALRM\n\n Virtual timer expired.\n\n Availability: Unix.\n\nsignal.SIGWINCH\n\n Window resize signal.\n\n Availability: Unix.\n\nsignal.SIGXCPU\n\n CPU time limit exceeded.\n\n Availability: Unix.\n\nSIG*\n\n All the signal numbers are defined symbolically. For example, the\n hangup signal is defined as \"signal.SIGHUP\"; the variable names are\n identical to the names used in C programs, as found in\n \"\". The Unix man page for '\"signal\"' lists the existing\n signals (on some systems this is *signal(2)*, on others the list is\n in *signal(7)*). Note that not all systems define the same set of\n signal names; only those names defined by the system are defined by\n this module.\n\nsignal.CTRL_C_EVENT\n\n The signal corresponding to the \"Ctrl\"+\"C\" keystroke event. This\n signal can only be used with \"os.kill()\".\n\n Availability: Windows.\n\n Added in version 3.2.\n\nsignal.CTRL_BREAK_EVENT\n\n The signal corresponding to the \"Ctrl\"+\"Break\" keystroke event.\n This signal can only be used with \"os.kill()\".\n\n Availability: Windows.\n\n Added in version 3.2.\n\nsignal.NSIG\n\n One more than the number of the highest signal number. Use\n \"valid_signals()\" to get valid signal numbers.\n\nsignal.ITIMER_REAL\n\n Decrements interval timer in real time, and delivers \"SIGALRM\" upon\n expiration.\n\nsignal.ITIMER_VIRTUAL\n\n Decrements interval timer only when the process is executing, and\n delivers SIGVTALRM upon expiration.\n\nsignal.ITIMER_PROF\n\n Decrements interval timer both when the process executes and when\n the system is executing on behalf of the process. Coupled with\n ITIMER_VIRTUAL, this timer is usually used to profile the time\n spent by the application in user and kernel space. SIGPROF is\n delivered upon expiration.\n\nsignal.SIG_BLOCK\n\n A possible value for the *how* parameter to \"pthread_sigmask()\"\n indicating that signals are to be blocked.\n\n Added in version 3.3.\n\nsignal.SIG_UNBLOCK\n\n A possible value for the *how* parameter to \"pthread_sigmask()\"\n indicating that signals are to be unblocked.\n\n Added in version 3.3.\n\nsignal.SIG_SETMASK\n\n A possible value for the *how* parameter to \"pthread_sigmask()\"\n indicating that the signal mask is to be replaced.\n\n Added in version 3.3.\n\nThe \"signal\" module defines one exception:\n\nexception signal.ItimerError\n\n Raised to signal an error from the underlying \"setitimer()\" or\n \"getitimer()\" implementation. Expect this error if an invalid\n interval timer or a negative time is passed to \"setitimer()\". This\n error is a subtype of \"OSError\".\n\n Added in version 3.3: This error used to be a subtype of \"IOError\",\n which is now an alias of \"OSError\".\n\nThe \"signal\" module defines the following functions:\n\nsignal.alarm(time)\n\n If *time* is non-zero, this function requests that a \"SIGALRM\"\n signal be sent to the process in *time* seconds. Any previously\n scheduled alarm is canceled (only one alarm can be scheduled at any\n time). The returned value is then the number of seconds before any\n previously set alarm was to have been delivered. If *time* is zero,\n no alarm is scheduled, and any scheduled alarm is canceled. If the\n return value is zero, no alarm is currently scheduled.\n\n Availability: Unix.\n\n See the man page *alarm(2)* for further information.\n\nsignal.getsignal(signalnum)\n\n Return the current signal handler for the signal *signalnum*. The\n returned value may be a callable Python object, or one of the\n special values \"signal.SIG_IGN\", \"signal.SIG_DFL\" or \"None\". Here,\n \"signal.SIG_IGN\" means that the signal was previously ignored,\n \"signal.SIG_DFL\" means that the default way of handling the signal\n was previously in use, and \"None\" means that the previous signal\n handler was not installed from Python.\n\nsignal.strsignal(signalnum)\n\n Returns the description of signal *signalnum*, such as \"Interrupt\"\n for \"SIGINT\". Returns \"None\" if *signalnum* has no description.\n Raises \"ValueError\" if *signalnum* is invalid.\n\n Added in version 3.8.\n\nsignal.valid_signals()\n\n Return the set of valid signal numbers on this platform. This can\n be less than \"range(1, NSIG)\" if some signals are reserved by the\n system for internal use.\n\n Added in version 3.8.\n\nsignal.pause()\n\n Cause the process to sleep until a signal is received; the\n appropriate handler will then be called. Returns nothing.\n\n Availability: Unix.\n\n See the man page *signal(2)* for further information.\n\n See also \"sigwait()\", \"sigwaitinfo()\", \"sigtimedwait()\" and\n \"sigpending()\".\n\nsignal.raise_signal(signum)\n\n Sends a signal to the calling process. Returns nothing.\n\n Added in version 3.8.\n\nsignal.pidfd_send_signal(pidfd, sig, siginfo=None, flags=0)\n\n Send signal *sig* to the process referred to by file descriptor\n *pidfd*. Python does not currently support the *siginfo* parameter;\n it must be \"None\". The *flags* argument is provided for future\n extensions; no flag values are currently defined.\n\n See the *pidfd_send_signal(2)* man page for more information.\n\n Availability: Linux >= 5.1, Android >= \"build-time\" API level 31\n\n Added in version 3.9.\n\nsignal.pthread_kill(thread_id, signalnum)\n\n Send the signal *signalnum* to the thread *thread_id*, another\n thread in the same process as the caller. The target thread can be\n executing any code (Python or not). However, if the target thread\n is executing the Python interpreter, the Python signal handlers\n will be executed by the main thread of the main interpreter.\n Therefore, the only point of sending a signal to a particular\n Python thread would be to force a running system call to fail with\n \"InterruptedError\".\n\n Use \"threading.get_ident()\" or the \"ident\" attribute of\n \"threading.Thread\" objects to get a suitable value for *thread_id*.\n\n If *signalnum* is 0, then no signal is sent, but error checking is\n still performed; this can be used to check if the target thread is\n still running.\n\n Raises an auditing event \"signal.pthread_kill\" with arguments\n \"thread_id\", \"signalnum\".\n\n Availability: Unix.\n\n See the man page *pthread_kill(3)* for further information.\n\n See also \"os.kill()\".\n\n Added in version 3.3.\n\nsignal.pthread_sigmask(how, mask)\n\n Fetch and/or change the signal mask of the calling thread. The\n signal mask is the set of signals whose delivery is currently\n blocked for the caller. Return the old signal mask as a set of\n signals.\n\n The behavior of the call is dependent on the value of *how*, as\n follows.\n\n * \"SIG_BLOCK\": The set of blocked signals is the union of the\n current set and the *mask* argument.\n\n * \"SIG_UNBLOCK\": The signals in *mask* are removed from the current\n set of blocked signals. It is permissible to attempt to unblock\n a signal which is not blocked.\n\n * \"SIG_SETMASK\": The set of blocked signals is set to the *mask*\n argument.\n\n *mask* is a set of signal numbers (e.g. {\"signal.SIGINT\",\n \"signal.SIGTERM\"}). Use \"valid_signals()\" for a full mask including\n all signals.\n\n For example, \"signal.pthread_sigmask(signal.SIG_BLOCK, [])\" reads\n the signal mask of the calling thread.\n\n \"SIGKILL\" and \"SIGSTOP\" cannot be blocked.\n\n Availability: Unix.\n\n See the man page *sigprocmask(2)* and *pthread_sigmask(3)* for\n further information.\n\n See also \"pause()\", \"sigpending()\" and \"sigwait()\".\n\n Added in version 3.3.\n\nsignal.setitimer(which, seconds, interval=0.0)\n\n Sets given interval timer (one of \"signal.ITIMER_REAL\",\n \"signal.ITIMER_VIRTUAL\" or \"signal.ITIMER_PROF\") specified by\n *which* to fire after *seconds* (float is accepted, different from\n \"alarm()\") and after that every *interval* seconds (if *interval*\n is non-zero). The interval timer specified by *which* can be\n cleared by setting *seconds* to zero.\n\n When an interval timer fires, a signal is sent to the process. The\n signal sent is dependent on the timer being used;\n \"signal.ITIMER_REAL\" will deliver \"SIGALRM\",\n \"signal.ITIMER_VIRTUAL\" sends \"SIGVTALRM\", and \"signal.ITIMER_PROF\"\n will deliver \"SIGPROF\".\n\n The old values are returned as a tuple: (delay, interval).\n\n Attempting to pass an invalid interval timer will cause an\n \"ItimerError\".\n\n Availability: Unix.\n\nsignal.getitimer(which)\n\n Returns current value of a given interval timer specified by\n *which*.\n\n Availability: Unix.\n\nsignal.set_wakeup_fd(fd, *, warn_on_full_buffer=True)\n\n Set the wakeup file descriptor to *fd*. When a signal your program\n has registered a signal handler for is received, the signal number\n is written as a single byte into the fd. If you haven't registered\n a signal handler for the signals you care about, then nothing will\n be written to the wakeup fd. This can be used by a library to\n wakeup a poll or select call, allowing the signal to be fully\n processed.\n\n The old wakeup fd is returned (or -1 if file descriptor wakeup was\n not enabled). If *fd* is -1, file descriptor wakeup is disabled.\n If not -1, *fd* must be non-blocking. It is up to the library to\n remove any bytes from *fd* before calling poll or select again.\n\n When threads are enabled, this function can only be called from the\n main thread of the main interpreter; attempting to call it from\n other threads will cause a \"ValueError\" exception to be raised.\n\n There are two common ways to use this function. In both approaches,\n you use the fd to wake up when a signal arrives, but then they\n differ in how they determine *which* signal or signals have\n arrived.\n\n In the first approach, we read the data out of the fd's buffer, and\n the byte values give you the signal numbers. This is simple, but in\n rare cases it can run into a problem: generally the fd will have a\n limited amount of buffer space, and if too many signals arrive too\n quickly, then the buffer may become full, and some signals may be\n lost. If you use this approach, then you should set\n \"warn_on_full_buffer=True\", which will at least cause a warning to\n be printed to stderr when signals are lost.\n\n In the second approach, we use the wakeup fd *only* for wakeups,\n and ignore the actual byte values. In this case, all we care about\n is whether the fd's buffer is empty or non-empty; a full buffer\n doesn't indicate a problem at all. If you use this approach, then\n you should set \"warn_on_full_buffer=False\", so that your users are\n not confused by spurious warning messages.\n\n Changed in version 3.5: On Windows, the function now also supports\n socket handles.\n\n Changed in version 3.7: Added \"warn_on_full_buffer\" parameter.\n\nsignal.siginterrupt(signalnum, flag)\n\n Change system call restart behaviour: if *flag* is \"False\", system\n calls will be restarted when interrupted by signal *signalnum*,\n otherwise system calls will be interrupted. Returns nothing.\n\n Availability: Unix.\n\n See the man page *siginterrupt(3)* for further information.\n\n Note that installing a signal handler with \"signal()\" will reset\n the restart behaviour to interruptible by implicitly calling\n \"siginterrupt()\" with a true *flag* value for the given signal.\n\nsignal.signal(signalnum, handler)\n\n Set the handler for signal *signalnum* to the function *handler*.\n *handler* can be a callable Python object taking two arguments (see\n below), or one of the special values \"signal.SIG_IGN\" or\n \"signal.SIG_DFL\". The previous signal handler will be returned\n (see the description of \"getsignal()\" above). (See the Unix man\n page *signal(2)* for further information.)\n\n When threads are enabled, this function can only be called from the\n main thread of the main interpreter; attempting to call it from\n other threads will cause a \"ValueError\" exception to be raised.\n\n The *handler* is called with two arguments: the signal number and\n the current stack frame (\"None\" or a frame object; for a\n description of frame objects, see the description in the type\n hierarchy or see the attribute descriptions in the \"inspect\"\n module).\n\n On Windows, \"signal()\" can only be called with \"SIGABRT\", \"SIGFPE\",\n \"SIGILL\", \"SIGINT\", \"SIGSEGV\", \"SIGTERM\", or \"SIGBREAK\". A\n \"ValueError\" will be raised in any other case. Note that not all\n systems define the same set of signal names; an \"AttributeError\"\n will be raised if a signal name is not defined as \"SIG*\" module\n level constant.\n\nsignal.sigpending()\n\n Examine the set of signals that are pending for delivery to the\n calling thread (i.e., the signals which have been raised while\n blocked). Return the set of the pending signals.\n\n Availability: Unix.\n\n See the man page *sigpending(2)* for further information.\n\n See also \"pause()\", \"pthread_sigmask()\" and \"sigwait()\".\n\n Added in version 3.3.\n\nsignal.sigwait(sigset)\n\n Suspend execution of the calling thread until the delivery of one\n of the signals specified in the signal set *sigset*. The function\n accepts the signal (removes it from the pending list of signals),\n and returns the signal number.\n\n Availability: Unix.\n\n See the man page *sigwait(3)* for further information.\n\n See also \"pause()\", \"pthread_sigmask()\", \"sigpending()\",\n \"sigwaitinfo()\" and \"sigtimedwait()\".\n\n Added in version 3.3.\n\nsignal.sigwaitinfo(sigset)\n\n Suspend execution of the calling thread until the delivery of one\n of the signals specified in the signal set *sigset*. The function\n accepts the signal and removes it from the pending list of signals.\n If one of the signals in *sigset* is already pending for the\n calling thread, the function will return immediately with\n information about that signal. The signal handler is not called for\n the delivered signal. The function raises an \"InterruptedError\" if\n it is interrupted by a signal that is not in *sigset*.\n\n The return value is an object representing the data contained in\n the \"siginfo_t\" structure, namely: \"si_signo\", \"si_code\",\n \"si_errno\", \"si_pid\", \"si_uid\", \"si_status\", \"si_band\".\n\n Availability: Unix.\n\n See the man page *sigwaitinfo(2)* for further information.\n\n See also \"pause()\", \"sigwait()\" and \"sigtimedwait()\".\n\n Added in version 3.3.\n\n Changed in version 3.5: The function is now retried if interrupted\n by a signal not in *sigset* and the signal handler does not raise\n an exception (see **PEP 475** for the rationale).\n\nsignal.sigtimedwait(sigset, timeout)\n\n Like \"sigwaitinfo()\", but takes an additional *timeout* argument\n specifying a timeout. If *timeout* is specified as \"0\", a poll is\n performed. Returns \"None\" if a timeout occurs.\n\n Availability: Unix.\n\n See the man page *sigtimedwait(2)* for further information.\n\n See also \"pause()\", \"sigwait()\" and \"sigwaitinfo()\".\n\n Added in version 3.3.\n\n Changed in version 3.5: The function is now retried with the\n recomputed *timeout* if interrupted by a signal not in *sigset* and\n the signal handler does not raise an exception (see **PEP 475** for\n the rationale).\n\nExamples\n========\n\nHere is a minimal example program. It uses the \"alarm()\" function to\nlimit the time spent waiting to open a file; this is useful if the\nfile is for a serial device that may not be turned on, which would\nnormally cause the \"os.open()\" to hang indefinitely. The solution is\nto set a 5-second alarm before opening the file; if the operation\ntakes too long, the alarm signal will be sent, and the handler raises\nan exception.\n\n import signal, os\n\n def handler(signum, frame):\n signame = signal.Signals(signum).name\n print(f'Signal handler called with signal {signame} ({signum})')\n raise OSError(\"Couldn't open device!\")\n\n # Set the signal handler and a 5-second alarm\n signal.signal(signal.SIGALRM, handler)\n signal.alarm(5)\n\n # This open() may hang indefinitely\n fd = os.open('/dev/ttyS0', os.O_RDWR)\n\n signal.alarm(0) # Disable the alarm\n\nNote on SIGPIPE\n===============\n\nPiping output of your program to tools like *head(1)* will cause a\n\"SIGPIPE\" signal to be sent to your process when the receiver of its\nstandard output closes early. This results in an exception like\n\"BrokenPipeError: [Errno 32] Broken pipe\". To handle this case, wrap\nyour entry point to catch this exception as follows:\n\n import os\n import sys\n\n def main():\n try:\n # simulate large output (your code replaces this loop)\n for x in range(10000):\n print(\"y\")\n # flush output here to force SIGPIPE to be triggered\n # while inside this try block.\n sys.stdout.flush()\n except BrokenPipeError:\n # Python flushes standard streams on exit; redirect remaining output\n # to devnull to avoid another BrokenPipeError at shutdown\n devnull = os.open(os.devnull, os.O_WRONLY)\n os.dup2(devnull, sys.stdout.fileno())\n sys.exit(1) # Python exits with error code 1 on EPIPE\n\n if __name__ == '__main__':\n main()\n\nDo not set \"SIGPIPE\"'s disposition to \"SIG_DFL\" in order to avoid\n\"BrokenPipeError\". Doing that would cause your program to exit\nunexpectedly whenever any socket connection is interrupted while your\nprogram is still writing to it.\n\nNote on Signal Handlers and Exceptions\n======================================\n\nIf a signal handler raises an exception, the exception will be\npropagated to the main thread and may be raised after any *bytecode*\ninstruction. Most notably, a \"KeyboardInterrupt\" may appear at any\npoint during execution. Most Python code, including the standard\nlibrary, cannot be made robust against this, and so a\n\"KeyboardInterrupt\" (or any other exception resulting from a signal\nhandler) may on rare occasions put the program in an unexpected state.\n\nTo illustrate this issue, consider the following code:\n\n class SpamContext:\n def __init__(self):\n self.lock = threading.Lock()\n\n def __enter__(self):\n # If KeyboardInterrupt occurs here, everything is fine\n self.lock.acquire()\n # If KeyboardInterrupt occurs here, __exit__ will not be called\n ...\n # KeyboardInterrupt could occur just before the function returns\n\n def __exit__(self, exc_type, exc_val, exc_tb):\n ...\n self.lock.release()\n\nFor many programs, especially those that merely want to exit on\n\"KeyboardInterrupt\", this is not a problem, but applications that are\ncomplex or require high reliability should avoid raising exceptions\nfrom signal handlers. They should also avoid catching\n\"KeyboardInterrupt\" as a means of gracefully shutting down. Instead,\nthey should install their own \"SIGINT\" handler. Below is an example of\nan HTTP server that avoids \"KeyboardInterrupt\":\n\n import signal\n import socket\n from selectors import DefaultSelector, EVENT_READ\n from http.server import HTTPServer, SimpleHTTPRequestHandler\n\n interrupt_read, interrupt_write = socket.socketpair()\n\n def handler(signum, frame):\n print('Signal handler called with signal', signum)\n interrupt_write.send(b'\\0')\n signal.signal(signal.SIGINT, handler)\n\n def serve_forever(httpd):\n sel = DefaultSelector()\n sel.register(interrupt_read, EVENT_READ)\n sel.register(httpd, EVENT_READ)\n\n while True:\n for key, _ in sel.select():\n if key.fileobj == interrupt_read:\n interrupt_read.recv(1)\n return\n if key.fileobj == httpd:\n httpd.handle_request()\n\n print(\"Serving on port 8000\")\n httpd = HTTPServer(('', 8000), SimpleHTTPRequestHandler)\n serve_forever(httpd)\n print(\"Shutdown...\")", "source": "python_docs:python-3.14-docs-text/library/signal.txt", "domain": "software" }, { "text": "Choosing a metric for your task\n\n**So you've trained your model and want to see how well it’s doing on a dataset of your choice. Where do you start?**\n\nThere is no ā€œone size fits allā€ approach to choosing an evaluation metric, but some good guidelines to keep in mind are:\n\n## Categories of metrics\n\nThere are 3 high-level categories of metrics:\n\n1. *Generic metrics*, which can be applied to a variety of situations and datasets, such as precision and accuracy.\n2. *Task-specific metrics*, which are limited to a given task, such as Machine Translation (often evaluated using metrics [BLEU](https://huggingface.co/metrics/bleu) or [ROUGE](https://huggingface.co/metrics/rouge)) or Named Entity Recognition (often evaluated with [seqeval](https://huggingface.co/metrics/seqeval)).\n3. *Dataset-specific metrics*, which aim to measure model performance on specific benchmarks: for instance, the [GLUE benchmark](https://huggingface.co/datasets/glue) has a dedicated [evaluation metric](https://huggingface.co/metrics/glue).\n\nLet's look at each of these three cases:\n\n### Generic metrics\n\nMany of the metrics used in the Machine Learning community are quite generic and can be applied in a variety of tasks and datasets.\n\nThis is the case for metrics like [accuracy](https://huggingface.co/metrics/accuracy) and [precision](https://huggingface.co/metrics/precision), which can be used for evaluating labeled (supervised) datasets, as well as [perplexity](https://huggingface.co/metrics/perplexity), which can be used for evaluating different kinds of (unsupervised) generative tasks.\n\nTo see the input structure of a given metric, you can look at its metric card. For example, in the case of [precision](https://huggingface.co/metrics/precision), the format is:\n```\n>>> precision_metric = evaluate.load(\"precision\")\n>>> results = precision_metric.compute(references=[0, 1], predictions=[0, 1])\n>>> print(results)\n{'precision': 1.0}\n```\n\n### Task-specific metrics\n\nPopular ML tasks like Machine Translation and Named Entity Recognition have specific metrics that can be used to compare models. For example, a series of different metrics have been proposed for text generation, ranging from [BLEU](https://huggingface.co/metrics/bleu) and its derivatives such as [GoogleBLEU](https://huggingface.co/metrics/google_bleu) and [GLEU](https://huggingface.co/metrics/gleu), but also [ROUGE](https://huggingface.co/metrics/rouge), [MAUVE](https://huggingface.co/metrics/mauve), etc.\n\nYou can find the right metric for your task by:\n\n- **Looking at the [Task pages](https://huggingface.co/tasks)** to see what metrics can be used for evaluating models for a given task.\n- **Checking out leaderboards** on sites like [Papers With Code](https://paperswithcode.com/) (you can search by task and by dataset).\n- **Reading the metric cards** for the relevant metrics and see which ones are a good fit for your use case. For example, see the [BLEU metric card](https://github.com/huggingface/evaluate/tree/main/metrics/bleu) or [SQuaD metric card](https://github.com/huggingface/evaluate/tree/main/metrics/squad).\n- **Looking at papers and blog posts** published on the topic and see what metrics they report. This can change over time, so try to pick papers from the last couple of years!\n\n### Dataset-specific metrics\n\nSome datasets have specific metrics associated with them -- this is especially in the case of popular benchmarks like [GLUE](https://huggingface.co/metrics/glue) and [SQuAD](https://huggingface.co/metrics/squad).\n\n\nšŸ’”\nGLUE is actually a collection of different subsets on different tasks, so first you need to choose the one that corresponds to the NLI task, such as mnli, which is described as ā€œcrowdsourced collection of sentence pairs with textual entailment annotationsā€\n\n\nIf you are evaluating your model on a benchmark dataset like the ones mentioned above, you can use its dedicated evaluation metric. Make sure you respect the format that they require. For example, to evaluate your model on the [SQuAD](https://huggingface.co/datasets/squad) dataset, you need to feed the `question` and `context` into your model and return the `prediction_text`, which should be compared with the `references` (based on matching the `id` of the question) :\n\n```\n>>> from evaluate import load\n>>> squad_metric = load(\"squad\")\n>>> predictions = [{'prediction_text': '1976', 'id': '56e10a3be3433e1400422b22'}]\n>>> references = [{'answers': {'answer_start': [97], 'text': ['1976']}, 'id': '56e10a3be3433e1400422b22'}]\n>>> results = squad_metric.compute(predictions=predictions, references=references)\n>>> results\n{'exact_match': 100.0, 'f1': 100.0}\n```\n\nYou can find examples of dataset structures by consulting the \"Dataset Preview\" function or the dataset card for a given dataset, and you can see how to use its dedicated evaluation function based on the metric card.", "source": "huggingface_doc", "domain": "software" }, { "text": "主要特点\n\nč®©ęˆ‘ä»¬ę„ä»‹ē»äø€äø‹ Gradio ęœ€å—ę¬¢čæŽēš„äø€äŗ›åŠŸčƒ½ļ¼čæ™é‡Œę˜Æ Gradio ēš„äø»č¦ē‰¹ē‚¹ļ¼š\n\n1. [ę·»åŠ ē¤ŗä¾‹č¾“å…„](#example-inputs)\n2. [ä¼ é€’č‡Ŗå®šä¹‰é”™čÆÆę¶ˆęÆ](#errors)\n3. [ę·»åŠ ęčæ°å†…å®¹](#descriptive-content)\n4. [设置旗标](#flagging)\n5. [é¢„å¤„ē†å’ŒåŽå¤„ē†](#preprocessing-and-postprocessing)\n6. [ę ·å¼åŒ–ę¼”ē¤ŗ](#styling)\n7. [ęŽ’é˜Ÿē”Øęˆ·](#queuing)\n8. [迭代输出](#iterative-outputs)\n9. [čæ›åŗ¦ę”](#progress-bars)\n10. [批处理函数](#batch-functions)\n11. [åœØåä½œē¬”č®°ęœ¬äøŠčæč”Œ](#colab-notebooks)\n\n## 示例输兄\n\nę‚ØåÆä»„ęä¾›ē”Øęˆ·åÆä»„č½»ę¾åŠ č½½åˆ° \"Interface\" äø­ēš„ē¤ŗä¾‹ę•°ę®ć€‚čæ™åÆ¹äŗŽę¼”ē¤ŗęØ”åž‹ęœŸęœ›ēš„č¾“å…„ē±»åž‹ä»„åŠę¼”ē¤ŗę•°ę®é›†å’ŒęØ”åž‹äø€čµ·ęŽ¢ē“¢ēš„ę–¹å¼éžåøøęœ‰åø®åŠ©ć€‚č¦åŠ č½½ē¤ŗä¾‹ę•°ę®ļ¼Œę‚ØåÆä»„å°†åµŒå„—åˆ—č”Øęä¾›ē»™ Interface ęž„é€ å‡½ę•°ēš„ `examples=` å…³é”®å­—å‚ę•°ć€‚å¤–éƒØåˆ—č”Øäø­ēš„ęÆäøŖå­åˆ—č”Øč”Øē¤ŗäø€äøŖę•°ę®ę ·ęœ¬ļ¼Œå­åˆ—č”Øäø­ēš„ęÆäøŖå…ƒē“ č”Øē¤ŗęÆäøŖč¾“å…„ē»„ä»¶ēš„č¾“å…„ć€‚ęœ‰å…³ęÆäøŖē»„ä»¶ēš„ē¤ŗä¾‹ę•°ę®ę ¼å¼åœØ[Docs](https://gradio.app/docs#components)äø­ęœ‰čÆ“ę˜Žć€‚\n\n$code_calculator\n$demo_calculator\n\nę‚ØåÆä»„å°†å¤§åž‹ę•°ę®é›†åŠ č½½åˆ°ē¤ŗä¾‹äø­ļ¼Œé€ščæ‡ Gradio ęµč§ˆå’ŒäøŽę•°ę®é›†čæ›č”Œäŗ¤äŗ’ć€‚ē¤ŗä¾‹å°†č‡ŖåŠØåˆ†é”µļ¼ˆåÆä»„é€ščæ‡ Interface ēš„ `examples_per_page` å‚ę•°čæ›č”Œé…ē½®ļ¼‰ć€‚\n\nē»§ē»­äŗ†č§£ē¤ŗä¾‹ļ¼ŒčÆ·å‚é˜…[ę›“å¤šē¤ŗä¾‹](https://gradio.app/more-on-examples)ęŒ‡å—ć€‚\n\n## 错误\n\nę‚ØåøŒęœ›å‘ē”Øęˆ·ä¼ é€’č‡Ŗå®šä¹‰é”™čÆÆę¶ˆęÆć€‚äøŗę­¤ļ¼Œwith `gr.Error(\"custom message\")` ę„ę˜¾ē¤ŗé”™čÆÆę¶ˆęÆć€‚å¦‚ęžœåœØäøŠé¢ēš„č®”ē®—å™Øē¤ŗä¾‹äø­å°čÆ•é™¤ä»„é›¶ļ¼Œå°†ę˜¾ē¤ŗč‡Ŗå®šä¹‰é”™čÆÆę¶ˆęÆēš„å¼¹å‡ŗęØ”ę€ēŖ—å£ć€‚äŗ†č§£ęœ‰å…³é”™čÆÆēš„ę›“å¤šäæ”ęÆļ¼ŒčÆ·å‚é˜…[文攣](https://gradio.app/docs#error)怂\n\n## ęčæ°ę€§å†…å®¹\n\nåœØå‰é¢ēš„ē¤ŗä¾‹äø­ļ¼Œę‚ØåÆčƒ½å·²ē»ę³Øę„åˆ° Interface ęž„é€ å‡½ę•°äø­ēš„ `title=` 和 `description=` å…³é”®å­—å‚ę•°ļ¼Œåø®åŠ©ē”Øęˆ·äŗ†č§£ę‚Øēš„åŗ”ē”ØēØ‹åŗć€‚\n\nInterface ęž„é€ å‡½ę•°äø­ęœ‰äø‰äøŖå‚ę•°ē”ØäŗŽęŒ‡å®šę­¤å†…å®¹åŗ”ę”¾ē½®åœØå“Ŗé‡Œļ¼š\n\n- `title`ļ¼šęŽ„å—ę–‡ęœ¬ļ¼Œå¹¶åÆä»„å°†å…¶ę˜¾ē¤ŗåœØē•Œé¢ēš„é”¶éƒØļ¼Œä¹Ÿå°†ęˆäøŗé”µé¢ę ‡é¢˜ć€‚\n- `description`ļ¼šęŽ„å—ę–‡ęœ¬ć€Markdown ꈖ HTMLļ¼Œå¹¶å°†å…¶ę”¾ē½®åœØę ‡é¢˜ę­£äø‹ę–¹ć€‚\n- `article`ļ¼šä¹ŸęŽ„å—ę–‡ęœ¬ć€Markdown ꈖ HTMLļ¼Œå¹¶å°†å…¶ę”¾ē½®åœØē•Œé¢äø‹ę–¹ć€‚\n\n![annotated](/assets/guides/annotated.png)\n\nå¦‚ęžœę‚Øä½æē”Øēš„ę˜Æ `Blocks` APIļ¼Œåˆ™åÆä»„ with `gr.Markdown(...)` ꈖ `gr.HTML(...)` ē»„ä»¶åœØä»»ä½•ä½ē½®ę’å…„ę–‡ęœ¬ć€Markdown ꈖ HTMLļ¼Œå…¶äø­ęčæ°ę€§å†…å®¹ä½äŗŽ `Component` ęž„é€ å‡½ę•°å†…éƒØć€‚\n\nå¦äø€äøŖęœ‰ē”Øēš„å…³é”®å­—å‚ę•°ę˜Æ `label=`ļ¼Œå®ƒå­˜åœØäŗŽęÆäøŖ `Component` äø­ć€‚čæ™äæ®ę”¹äŗ†ęÆäøŖ `Component` é”¶éƒØēš„ę ‡ē­¾ę–‡ęœ¬ć€‚čæ˜åÆä»„äøŗčÆøå¦‚ `Textbox` ꈖ `Radio` ä¹‹ē±»ēš„č”Øå•å…ƒē“ ę·»åŠ  `info=` å…³é”®å­—å‚ę•°ļ¼Œä»„ęä¾›ęœ‰å…³å…¶ē”Øę³•ēš„čæ›äø€ę­„äæ”ęÆć€‚\n\n```python\ngr.Number(label='幓龄', info='ä»„å¹“äøŗå•ä½ļ¼Œåæ…é”»å¤§äŗŽ0')\n```\n\n## ꗗꠇ\n\né»˜č®¤ęƒ…å†µäø‹ļ¼Œ\"Interface\" å°†ęœ‰äø€äøŖ \"Flag\" ęŒ‰é’®ć€‚å½“ē”Øęˆ·ęµ‹čÆ•ę‚Øēš„ `Interface` ę—¶ļ¼Œå¦‚ęžœēœ‹åˆ°ęœ‰č¶£ēš„č¾“å‡ŗļ¼Œä¾‹å¦‚é”™čÆÆęˆ–ę„å¤–ēš„ęØ”åž‹č”Œäøŗļ¼Œä»–ä»¬åÆä»„å°†č¾“å…„ę ‡č®°äøŗę‚Øčæ›č”ŒęŸ„ēœ‹ć€‚åœØē”± `Interface` ęž„é€ å‡½ę•°ēš„ `flagging_dir=` å‚ę•°ęä¾›ēš„ē›®å½•äø­ļ¼Œå°†č®°å½•ę ‡č®°ēš„č¾“å…„åˆ°äø€äøŖ CSV ę–‡ä»¶äø­ć€‚å¦‚ęžœē•Œé¢ę¶‰åŠę–‡ä»¶ę•°ę®ļ¼Œä¾‹å¦‚å›¾åƒå’ŒéŸ³é¢‘ē»„ä»¶ļ¼Œå°†åˆ›å»ŗę–‡ä»¶å¤¹ę„å­˜å‚Øčæ™äŗ›ę ‡č®°ēš„ę•°ę®ć€‚\n\nä¾‹å¦‚ļ¼ŒåÆ¹äŗŽäøŠé¢ę˜¾ē¤ŗēš„č®”ē®—å™Øē•Œé¢ļ¼Œęˆ‘ä»¬å°†åœØäø‹é¢ēš„ę——ę ‡ē›®å½•äø­å­˜å‚Øę ‡č®°ēš„ę•°ę®ļ¼š\n\n```directory\n+-- calculator.py\n+-- flagged/\n| +-- logs.csv\n```\n\n_flagged/logs.csv_\n\n```csv\nnum1,operation,num2,Output\n5,add,7,12\n6,subtract,1.5,4.5\n```\n\näøŽę—©ęœŸę˜¾ē¤ŗēš„å†·č‰²ē•Œé¢ē›øåÆ¹åŗ”ļ¼Œęˆ‘ä»¬å°†åœØäø‹é¢ēš„ę——ę ‡ē›®å½•äø­å­˜å‚Øę ‡č®°ēš„ę•°ę®ļ¼š\n\n```directory\n+-- sepia.py\n+-- flagged/\n| +-- logs.csv\n| +-- im/\n| | +-- 0.png\n| | +-- 1.png\n| +-- Output/\n| | +-- 0.png\n| | +-- 1.png\n```\n\n_flagged/logs.csv_\n\n```csv\nim,Output\nim/0.png,Output/0.png\nim/1.png,Output/1.png\n```\n\nå¦‚ęžœę‚ØåøŒęœ›ē”Øęˆ·ęä¾›ę——ę ‡åŽŸå› ļ¼ŒåÆä»„å°†å­—ē¬¦äø²åˆ—č”Øä¼ é€’ē»™ Interface ēš„ `flagging_options` å‚ę•°ć€‚ē”Øęˆ·åœØčæ›č”Œę——ę ‡ę—¶åæ…é”»é€‰ę‹©å…¶äø­äø€äøŖå­—ē¬¦äø²ļ¼Œčæ™å°†ä½œäøŗé™„åŠ åˆ—äæå­˜åˆ° CSV 中。\n\n## é¢„å¤„ē†å’ŒåŽå¤„ē† (Preprocessing and Postprocessing)\n\n![annotated](/assets/img/dataflow.svg)\n\nå¦‚ę‚Øę‰€č§ļ¼ŒGradio åŒ…ę‹¬åÆä»„å¤„ē†å„ē§äøåŒę•°ę®ē±»åž‹ēš„ē»„ä»¶ļ¼Œä¾‹å¦‚å›¾åƒć€éŸ³é¢‘å’Œč§†é¢‘ć€‚å¤§å¤šę•°ē»„ä»¶éƒ½åÆä»„ē”Øä½œč¾“å…„ęˆ–č¾“å‡ŗć€‚\n\nå½“ē»„ä»¶ē”Øä½œč¾“å…„ę—¶ļ¼ŒGradio č‡ŖåŠØå¤„ē†*预处理*ļ¼Œå°†ę•°ę®ä»Žē”Øęˆ·ęµč§ˆå™Øå‘é€ēš„ē±»åž‹ļ¼ˆä¾‹å¦‚ē½‘ē»œę‘„åƒå¤“åæ«ē…§ēš„ base64 č”Øē¤ŗļ¼‰č½¬ę¢äøŗę‚Øēš„å‡½ę•°åÆä»„ęŽ„å—ēš„å½¢å¼ļ¼ˆä¾‹å¦‚ `numpy` 数组)。\n\nåŒę ·ļ¼Œå½“ē»„ä»¶ē”Øä½œč¾“å‡ŗę—¶ļ¼ŒGradio č‡ŖåŠØå¤„ē†*åŽå¤„ē†*ļ¼Œå°†ę•°ę®ä»Žå‡½ę•°čæ”å›žēš„å½¢å¼ļ¼ˆä¾‹å¦‚å›¾åƒč·Æå¾„åˆ—č”Øļ¼‰č½¬ę¢äøŗåÆä»„åœØē”Øęˆ·ęµč§ˆå™Øäø­ę˜¾ē¤ŗēš„å½¢å¼ļ¼ˆä¾‹å¦‚ä»„ base64 ę ¼å¼ę˜¾ē¤ŗå›¾åƒēš„ `Gallery`)。\n\nę‚ØåÆä»„ä½æē”Øęž„å»ŗå›¾åƒē»„ä»¶ę—¶ēš„å‚ę•°ęŽ§åˆ¶*预处理*ć€‚ä¾‹å¦‚ļ¼Œå¦‚ęžœę‚Øä½æē”Øä»„äø‹å‚ę•°å®žä¾‹åŒ– `Image` ē»„ä»¶ļ¼Œå®ƒå°†å°†å›¾åƒč½¬ę¢äøŗ `PIL` ē±»åž‹ļ¼Œå¹¶å°†å…¶é‡å”‘äøŗ`(100, 100)`ļ¼Œč€Œäøē®”ęäŗ¤ę—¶ēš„åŽŸå§‹å¤§å°å¦‚ä½•ļ¼š\n\n```py\nimg = gr.Image(shape=(100, 100), type=\"pil\")\n```\n\nē›øåļ¼Œčæ™é‡Œęˆ‘ä»¬äæē•™å›¾åƒēš„åŽŸå§‹å¤§å°ļ¼Œä½†åœØå°†å…¶č½¬ę¢äøŗ numpy ę•°ē»„ä¹‹å‰åč½¬é¢œč‰²ļ¼š\n\n```py\nimg = gr.Image(invert_colors=True, type=\"numpy\")\n```\n\nåŽå¤„ē†č¦å®¹ę˜“å¾—å¤šļ¼Gradio č‡ŖåŠØčÆ†åˆ«čæ”å›žę•°ę®ēš„ę ¼å¼ļ¼ˆä¾‹å¦‚ `Image` 是 `numpy` ę•°ē»„čæ˜ę˜Æ `str` ę–‡ä»¶č·Æå¾„ļ¼Ÿļ¼‰ļ¼Œå¹¶å°†å…¶åŽå¤„ē†äøŗåÆä»„ē”±ęµč§ˆå™Øę˜¾ē¤ŗēš„ę ¼å¼ć€‚\n\nčÆ·ęŸ„ēœ‹[文攣](https://gradio.app/docs)ļ¼Œäŗ†č§£ęÆäøŖē»„ä»¶ēš„ę‰€ęœ‰äøŽé¢„å¤„ē†ē›øå…³ēš„å‚ę•°ć€‚\n\n## ę ·å¼ (Styling)\n\nGradio äø»é¢˜ę˜Æč‡Ŗå®šä¹‰åŗ”ē”ØēØ‹åŗå¤–č§‚å’Œę„Ÿč§‰ēš„ęœ€ē®€å•ę–¹ę³•ć€‚ę‚ØåÆä»„é€‰ę‹©å¤šē§äø»é¢˜ęˆ–åˆ›å»ŗč‡Ŗå·±ēš„äø»é¢˜ć€‚č¦čæ™ę ·åšļ¼ŒčÆ·å°† `theme=` å‚ę•°ä¼ é€’ē»™ `Interface` ęž„é€ å‡½ę•°ć€‚ä¾‹å¦‚ļ¼š\n\n```python\ndemo = gr.Interface(..., theme=gr.themes.Monochrome())\n```\n\nGradio åø¦ęœ‰äø€ē»„é¢„å…ˆęž„å»ŗēš„äø»é¢˜ļ¼Œę‚ØåÆä»„ä»Ž `gr.themes.*` åŠ č½½ć€‚ę‚ØåÆä»„ę‰©å±•čæ™äŗ›äø»é¢˜ęˆ–ä»Žå¤“å¼€å§‹åˆ›å»ŗč‡Ŗå·±ēš„äø»é¢˜ - ęœ‰å…³ę›“å¤ščÆ¦ē»†äæ”ęÆļ¼ŒčÆ·å‚é˜…[äø»é¢˜ęŒ‡å—](https://gradio.app/theming-guide)怂\n\nč¦å¢žåŠ é¢å¤–ēš„ę ·å¼čƒ½åŠ›ļ¼Œę‚ØåÆä»„ with `css=` 关键字将任何 CSS ä¼ é€’ē»™ę‚Øēš„åŗ”ē”ØēØ‹åŗć€‚\nGradio åŗ”ē”ØēØ‹åŗēš„åŸŗē±»ę˜Æ `gradio-container`ļ¼Œå› ę­¤ä»„äø‹ę˜Æäø€äøŖę›“ę”¹ Gradio åŗ”ē”ØēØ‹åŗčƒŒę™Æé¢œč‰²ēš„ē¤ŗä¾‹ļ¼š\n\n```python\nwith `gr.Interface(css=\".gradio-container {background-color: red}\") as demo:\n ...\n```\n\n## 队列 (Queuing)\n\nå¦‚ęžœę‚Øēš„åŗ”ē”ØēØ‹åŗé¢„č®”ä¼šęœ‰å¤§é‡ęµé‡ļ¼ŒčÆ· with `queue()` ę–¹ę³•ę„ęŽ§åˆ¶å¤„ē†é€ŸēŽ‡ć€‚čæ™å°†ęŽ’é˜Ÿå¤„ē†č°ƒē”Øļ¼Œå› ę­¤äø€ę¬”åŖå¤„ē†äø€å®šę•°é‡ēš„čÆ·ę±‚ć€‚é˜Ÿåˆ—ä½æē”Ø Websocketsļ¼Œčæ˜åÆä»„é˜²ę­¢ē½‘ē»œč¶…ę—¶ļ¼Œå› ę­¤å¦‚ęžœę‚Øēš„å‡½ę•°ēš„ęŽØē†ę—¶é—“å¾ˆé•æļ¼ˆ> 1 åˆ†é’Ÿļ¼‰ļ¼Œåŗ”ä½æē”Øé˜Ÿåˆ—ć€‚\n\nwith `Interface`:\n\n```python\ndemo = gr.Interface(...).queue()\ndemo.launch()\n```\n\nwith `Blocks`:\n\n```python\nwith gr.Blocks() as demo:\n #...\ndemo.queue()\ndemo.launch()\n```\n\nę‚ØåÆä»„é€ščæ‡ä»„äø‹ę–¹å¼ęŽ§åˆ¶äø€ę¬”å¤„ē†ēš„čÆ·ę±‚ę•°é‡ļ¼š\n\n```python\ndemo.queue(concurrency_count=3)\n```\n\nęŸ„ēœ‹ęœ‰å…³é…ē½®å…¶ä»–é˜Ÿåˆ—å‚ę•°ēš„[é˜Ÿåˆ—ę–‡ę”£](/docs/#queue)怂\n\n在 Blocks äø­ęŒ‡å®šä»…åÆ¹ęŸäŗ›å‡½ę•°čæ›č”ŒęŽ’é˜Ÿļ¼š\n\n```python\nwith gr.Blocks() as demo2:\n num1 = gr.Number()\n num2 = gr.Number()\n output = gr.Number()\n gr.Button(\"Add\").click(\n lambda a, b: a + b, [num1, num2], output)\n gr.Button(\"Multiply\").click(\n lambda a, b: a * b, [num1, num2], output, queue=True)\ndemo2.launch()\n```\n\n## 迭代输出 (Iterative Outputs)\n\nåœØęŸäŗ›ęƒ…å†µäø‹ļ¼Œę‚ØåÆčƒ½éœ€č¦ä¼ č¾“äø€ē³»åˆ—č¾“å‡ŗč€Œäøę˜Æäø€ę¬”ę˜¾ē¤ŗå•äøŖč¾“å‡ŗć€‚ä¾‹å¦‚ļ¼Œę‚ØåÆčƒ½ęœ‰äø€äøŖå›¾åƒē”ŸęˆęØ”åž‹ļ¼ŒåøŒęœ›ę˜¾ē¤ŗē”Ÿęˆēš„ęÆäøŖę­„éŖ¤ēš„å›¾åƒļ¼Œē›“åˆ°ęœ€ē»ˆå›¾åƒć€‚ęˆ–č€…ę‚ØåÆčƒ½ęœ‰äø€äøŖčŠå¤©ęœŗå™Øäŗŗļ¼Œå®ƒé€å­—é€å„åœ°ęµå¼ä¼ č¾“å“åŗ”ļ¼Œč€Œäøę˜Æäø€ę¬”čæ”å›žå…ØéƒØå“åŗ”ć€‚\n\nåœØčæ™ē§ęƒ…å†µäø‹ļ¼Œę‚ØåÆä»„å°†**ē”Ÿęˆå™Ø**å‡½ę•°ęä¾›ē»™ Gradioļ¼Œč€Œäøę˜Æåøøč§„å‡½ę•°ć€‚åœØ Python äø­åˆ›å»ŗē”Ÿęˆå™Øéžåøøē®€å•ļ¼šå‡½ę•°äøåŗ”čÆ„ęœ‰äø€äøŖå•ē‹¬ēš„ `return` å€¼ļ¼Œč€Œę˜Æåŗ”čÆ„ with `yield` čæžē»­čæ”å›žäø€ē³»åˆ—å€¼ć€‚é€šåøøļ¼Œ`yield` čÆ­å„ę”¾ē½®åœØęŸē§å¾ŖēŽÆäø­ć€‚äø‹é¢ę˜Æäø€äøŖē®€å•ē¤ŗä¾‹ļ¼Œē”Ÿęˆå™ØåŖę˜Æē®€å•č®”ę•°åˆ°ē»™å®šę•°å­—ļ¼š\n\n```python\ndef my_generator(x):\n for i in range(x):\n yield i\n```\n\nę‚Øä»„äøŽåøøč§„å‡½ę•°ē›øåŒēš„ę–¹å¼å°†ē”Ÿęˆå™Øęä¾›ē»™ Gradioć€‚ä¾‹å¦‚ļ¼Œčæ™ę˜Æäø€äøŖļ¼ˆč™šę‹Ÿēš„ļ¼‰å›¾åƒē”ŸęˆęØ”åž‹ļ¼Œå®ƒåœØč¾“å‡ŗå›¾åƒä¹‹å‰ē”Ÿęˆę•°äøŖę­„éŖ¤ēš„å™ŖéŸ³ļ¼š\n\n$code_fake_diffusion\n$demo_fake_diffusion\n\nčÆ·ę³Øę„ļ¼Œęˆ‘ä»¬åœØčæ­ä»£å™Øäø­ę·»åŠ äŗ† `time.sleep(1)`ļ¼Œä»„åˆ›å»ŗę­„éŖ¤ä¹‹é—“ēš„äŗŗå·„ęš‚åœļ¼Œä»„ä¾æę‚ØåÆä»„č§‚åÆŸčæ­ä»£å™Øēš„ę­„éŖ¤ļ¼ˆåœØēœŸå®žēš„å›¾åƒē”ŸęˆęØ”åž‹äø­ļ¼Œčæ™åÆčƒ½ę˜Æäøåæ…č¦ēš„ļ¼‰ć€‚\n\nå°†ē”Ÿęˆå™Øęä¾›ē»™ Gradio **éœ€č¦**åœØåŗ•å±‚ Interface ꈖ Blocks äø­åÆē”Øé˜Ÿåˆ—ļ¼ˆčÆ·å‚é˜…äøŠé¢ēš„é˜Ÿåˆ—éƒØåˆ†ļ¼‰ć€‚\n\n## čæ›åŗ¦ę”\n\nGradio ę”ÆęŒåˆ›å»ŗč‡Ŗå®šä¹‰čæ›åŗ¦ę”ļ¼Œä»„ä¾æę‚ØåÆä»„č‡Ŗå®šä¹‰å’ŒęŽ§åˆ¶å‘ē”Øęˆ·ę˜¾ē¤ŗēš„čæ›åŗ¦ę›“ę–°ć€‚č¦åÆē”Øę­¤åŠŸčƒ½ļ¼ŒåŖéœ€äøŗę–¹ę³•ę·»åŠ äø€äøŖé»˜č®¤å€¼äøŗ `gr.Progress` å®žä¾‹ēš„å‚ę•°å³åÆć€‚ē„¶åŽļ¼Œę‚ØåÆä»„ē›“ęŽ„č°ƒē”Øę­¤å®žä¾‹å¹¶ä¼ å…„ 0 到 1 ä¹‹é—“ēš„ęµ®ē‚¹ę•°ę„ę›“ę–°čæ›åŗ¦ēŗ§åˆ«ļ¼Œęˆ–č€… with `Progress` å®žä¾‹ēš„ `tqdm()` ę–¹ę³•ę„č·ŸčøŖåÆčæ­ä»£åÆ¹č±”äøŠēš„čæ›åŗ¦ļ¼Œå¦‚äø‹ę‰€ē¤ŗć€‚åæ…é”»åÆē”Øé˜Ÿåˆ—ä»„čæ›č”Œčæ›åŗ¦ę›“ę–°ć€‚\n\n$code_progress_simple\n$demo_progress_simple\n\nå¦‚ęžœę‚Ø with `tqdm` åŗ“ļ¼Œå¹¶äø”åøŒęœ›ä»Žå‡½ę•°å†…éƒØēš„ä»»ä½• `tqdm.tqdm` č‡ŖåŠØęŠ„å‘Ščæ›åŗ¦ę›“ę–°ļ¼ŒčÆ·å°†é»˜č®¤å‚ę•°č®¾ē½®äøŗ `gr.Progress(track_tqdm=True)`!\n\n## 批处理函数 (Batch Functions)\n\nGradio ę”ÆęŒä¼ é€’*批处理*å‡½ę•°ć€‚ę‰¹å¤„ē†å‡½ę•°åŖę˜ÆęŽ„å—č¾“å…„åˆ—č”Øå¹¶čæ”å›žé¢„ęµ‹åˆ—č”Øēš„å‡½ę•°ć€‚\n\nä¾‹å¦‚ļ¼Œčæ™ę˜Æäø€äøŖę‰¹å¤„ē†å‡½ę•°ļ¼Œå®ƒęŽ„å—äø¤äøŖč¾“å…„åˆ—č”Øļ¼ˆäø€äøŖå•čÆåˆ—č”Øå’Œäø€äøŖę•“ę•°åˆ—č”Øļ¼‰ļ¼Œå¹¶čæ”å›žäæ®å‰Ŗčæ‡ēš„å•čÆåˆ—č”Øä½œäøŗč¾“å‡ŗļ¼š\n\n```python\nimport time\n\ndef trim_words(words, lens):\n trimmed_words = []\n time.sleep(5)\n for w, l in zip(words, lens):\n trimmed_words.append(w[:int(l)])\n return [trimmed_words]\n for w, l in zip(words, lens):\n```\n\nä½æē”Øę‰¹å¤„ē†å‡½ę•°ēš„ä¼˜ē‚¹ę˜Æļ¼Œå¦‚ęžœåÆē”Øäŗ†é˜Ÿåˆ—ļ¼ŒGradio ęœåŠ”å™ØåÆä»„č‡ŖåŠØ*批处理*ä¼ å…„ēš„čÆ·ę±‚å¹¶å¹¶č”Œå¤„ē†å®ƒä»¬ļ¼Œä»Žč€ŒåÆčƒ½åŠ åæ«ę¼”ē¤ŗé€Ÿåŗ¦ć€‚ä»„äø‹ę˜Æ Gradio ä»£ē ēš„ē¤ŗä¾‹ļ¼ˆčÆ·ę³Øę„ `batch=True` 和 `max_batch_size=16` - čæ™äø¤äøŖå‚ę•°éƒ½åÆä»„ä¼ é€’ē»™äŗ‹ä»¶č§¦å‘å™Øęˆ– `Interface` 类)\n\nwith `Interface`:\n\n```python\ndemo = gr.Interface(trim_words, [\"textbox\", \"number\"], [\"output\"],\n batch=True, max_batch_size=16)\ndemo.queue()\ndemo.launch()\n```\n\nwith `Blocks`:\n\n```python\nimport gradio as gr\n\nwith gr.Blocks() as demo:\n with gr.Row():\n word = gr.Textbox(label=\"word\")\n leng = gr.Number(label=\"leng\")\n output = gr.Textbox(label=\"Output\")\n with gr.Row():\n run = gr.Button()\n\n event = run.click(trim_words, [word, leng], output, batch=True, max_batch_size=16)\n\ndemo.queue()\ndemo.launch()\n```\n\nåœØäøŠé¢ēš„ē¤ŗä¾‹äø­ļ¼ŒåÆä»„å¹¶č”Œå¤„ē† 16 äøŖčÆ·ę±‚ļ¼ˆę€»ęŽØē†ę—¶é—“äøŗ 5 ē§’ļ¼‰ļ¼Œč€Œäøę˜Æåˆ†åˆ«å¤„ē†ęÆäøŖčÆ·ę±‚ļ¼ˆę€»ęŽØē†ę—¶é—“äøŗ 80 ē§’ļ¼‰ć€‚č®øå¤š Hugging Face ēš„ `transformers` 和 `diffusers` ęØ”åž‹åœØ Gradio ēš„ę‰¹å¤„ē†ęØ”å¼äø‹č‡Ŗē„¶å·„ä½œļ¼ščæ™ę˜Æ[ä½æē”Øę‰¹å¤„ē†ē”Ÿęˆå›¾åƒēš„ē¤ŗä¾‹ę¼”ē¤ŗ](https://github.com/gradio-app/gradio/blob/main/demo/diffusers_with_batching/run.py)\n\nę³Øę„ļ¼šä½æē”Ø Gradio ēš„ę‰¹å¤„ē†å‡½ę•° **requires** åœØåŗ•å±‚ Interface ꈖ Blocks äø­åÆē”Øé˜Ÿåˆ—ļ¼ˆčÆ·å‚é˜…äøŠé¢ēš„é˜Ÿåˆ—éƒØåˆ†ļ¼‰ć€‚\n\n## Gradio ē¬”č®°ęœ¬ (Colab Notebooks)\n\nGradio åÆä»„åœØä»»ä½•čæč”Œ Python ēš„åœ°ę–¹čæč”Œļ¼ŒåŒ…ę‹¬ęœ¬åœ° Jupyter ē¬”č®°ęœ¬å’Œåä½œē¬”č®°ęœ¬ļ¼Œå¦‚[Google Colab](https://colab.research.google.com/)ć€‚åÆ¹äŗŽęœ¬åœ° Jupyter ē¬”č®°ęœ¬å’Œ Google Colab ē¬”č®°ęœ¬ļ¼ŒGradio åœØęœ¬åœ°ęœåŠ”å™ØäøŠčæč”Œļ¼Œę‚ØåÆä»„åœØęµč§ˆå™Øäø­äøŽä¹‹äŗ¤äŗ’ć€‚ļ¼ˆę³Øę„ļ¼šåÆ¹äŗŽ Google Colabļ¼Œčæ™ę˜Æé€ščæ‡[ęœåŠ”å·„ä½œå™Øéš§é“](https://github.com/tensorflow/tensorboard/blob/master/docs/design/colab_integration.md)å®žēŽ°ēš„ļ¼Œę‚Øēš„ęµč§ˆå™Øéœ€č¦åÆē”Ø cookiesć€‚ļ¼‰åÆ¹äŗŽå…¶ä»–čæœēØ‹ē¬”č®°ęœ¬ļ¼ŒGradio ä¹Ÿå°†åœØęœåŠ”å™ØäøŠčæč”Œļ¼Œä½†ę‚Øéœ€č¦ä½æē”Ø[SSH 隧道](https://coderwall.com/p/ohk6cg/remote-access-to-ipython-notebooks-via-ssh)åœØęœ¬åœ°ęµč§ˆå™Øäø­ęŸ„ēœ‹åŗ”ē”ØēØ‹åŗć€‚é€šåøøļ¼Œę›“ē®€å•ēš„é€‰ę‹©ę˜Æä½æē”Ø Gradio å†…ē½®ēš„å…¬å…±é“¾ęŽ„ļ¼Œ[åœØäø‹äø€ēÆ‡ęŒ‡å—äø­č®Øč®ŗ](/sharing-your-app/#sharing-demos)怂", "source": "huggingface_doc", "domain": "software" }, { "text": "!--Copyright 2023 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with\nthe License. You may obtain a copy of the License at\n\nhttp://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software distributed under the License is distributed on\nan \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the\n\nāš ļø Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be\nrendered properly in your Markdown viewer.\n\n-->\n\n# Training on TPU with TensorFlow\n\n\n\nIf you don't need long explanations and just want TPU code samples to get started with, check out [our TPU example notebook!](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/tpu_training-tf.ipynb)\n\n\n\n### What is a TPU?\n\nA TPU is a **Tensor Processing Unit.** They are hardware designed by Google, which are used to greatly speed up the tensor computations within neural networks, much like GPUs. They can be used for both network training and inference. They are generally accessed through Google’s cloud services, but small TPUs can also be accessed directly for free through Google Colab and Kaggle Kernels.\n\nBecause [all TensorFlow models in šŸ¤— Transformers are Keras models](https://huggingface.co/blog/tensorflow-philosophy), most of the methods in this document are generally applicable to TPU training for any Keras model! However, there are a few points that are specific to the HuggingFace ecosystem (hug-o-system?) of Transformers and Datasets, and we’ll make sure to flag them up when we get to them.\n\n### What kinds of TPU are available?\n\nNew users are often very confused by the range of TPUs, and the different ways to access them. The first key distinction to understand is the difference between **TPU Nodes** and **TPU VMs.**\n\nWhen you use a **TPU Node**, you are effectively indirectly accessing a remote TPU. You will need a separate VM, which will initialize your network and data pipeline and then forward them to the remote node. When you use a TPU on Google Colab, you are accessing it in the **TPU Node** style.\n\nUsing TPU Nodes can have some quite unexpected behaviour for people who aren’t used to them! In particular, because the TPU is located on a physically different system to the machine you’re running your Python code on, your data cannot be local to your machine - any data pipeline that loads from your machine’s internal storage will totally fail! Instead, data must be stored in Google Cloud Storage where your data pipeline can still access it, even when the pipeline is running on the remote TPU node.\n\n\n\nIf you can fit all your data in memory as `np.ndarray` or `tf.Tensor`, then you can `fit()` on that data even when using Colab or a TPU Node, without needing to upload it to Google Cloud Storage.\n\n\n\n\n\n**šŸ¤—Specific Hugging Face TipšŸ¤—:** The methods `Dataset.to_tf_dataset()` and its higher-level wrapper `model.prepare_tf_dataset()` , which you will see throughout our TF code examples, will both fail on a TPU Node. The reason for this is that even though they create a `tf.data.Dataset` it is not a ā€œpureā€ `tf.data` pipeline and uses `tf.numpy_function` or `Dataset.from_generator()` to stream data from the underlying HuggingFace `Dataset`. This HuggingFace `Dataset` is backed by data that is on a local disc and which the remote TPU Node will not be able to read.\n\n\n\nThe second way to access a TPU is via a **TPU VM.** When using a TPU VM, you connect directly to the machine that the TPU is attached to, much like training on a GPU VM. TPU VMs are generally easier to work with, particularly when it comes to your data pipeline. All of the above warnings do not apply to TPU VMs!\n\nThis is an opinionated document, so here’s our opinion: **Avoid using TPU Node if possible.** It is more confusing and more difficult to debug than TPU VMs. It is also likely to be unsupported in future - Google’s latest TPU, TPUv4, can only be accessed as a TPU VM, which suggests that TPU Nodes are increasingly going to become a ā€œlegacyā€ access method. However, we understand that the only free TPU access is on Colab and Kaggle Kernels, which uses TPU Node - so we’ll try to explain how to handle it if you have to! Check the [TPU example notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/tpu_training-tf.ipynb) for code samples that explain this in more detail.\n\n### What sizes of TPU are available?\n\nA single TPU (a v2-8/v3-8/v4-8) runs 8 replicas. TPUs exist in **pods** that can run hundreds or thousands of replicas simultaneously. When you use more than a single TPU but less than a whole pod (for example, a v3-32), your TPU fleet is referred to as a **pod slice.**\n\nWhen you access a free TPU via Colab, you generally get a single v2-8 TPU.\n\n### I keep hearing about this XLA thing. What’s XLA, and how does it relate to TPUs?\n\nXLA is an optimizing compiler, used by both TensorFlow and JAX. In JAX it is the only compiler, whereas in TensorFlow it is optional (but mandatory on TPU!). The easiest way to enable it when training a Keras model is to pass the argument `jit_compile=True` to `model.compile()`. If you don’t get any errors and performance is good, that’s a great sign that you’re ready to move to TPU!\n\nDebugging on TPU is generally a bit harder than on CPU/GPU, so we recommend getting your code running on CPU/GPU with XLA first before trying it on TPU. You don’t have to train for long, of course - just for a few steps to make sure that your model and data pipeline are working like you expect them to.\n\n\n\nXLA compiled code is usually faster - so even if you’re not planning to run on TPU, adding `jit_compile=True` can improve your performance. Be sure to note the caveats below about XLA compatibility, though!\n\n\n\n\n\n**Tip born of painful experience:** Although using `jit_compile=True` is a good way to get a speed boost and test if your CPU/GPU code is XLA-compatible, it can actually cause a lot of problems if you leave it in when actually training on TPU. XLA compilation will happen implicitly on TPU, so remember to remove that line before actually running your code on a TPU!\n\n\n\n### How do I make my model XLA compatible?\n\nIn many cases, your code is probably XLA-compatible already! However, there are a few things that work in normal TensorFlow that don’t work in XLA. We’ve distilled them into three core rules below:\n\n\n\n**šŸ¤—Specific HuggingFace TipšŸ¤—:** We’ve put a lot of effort into rewriting our TensorFlow models and loss functions to be XLA-compatible. Our models and loss functions generally obey rule #1 and #2 by default, so you can skip over them if you’re using `transformers` models. Don’t forget about these rules when writing your own models and loss functions, though!\n\n\n\n#### XLA Rule #1: Your code cannot have ā€œdata-dependent conditionalsā€\n\nWhat that means is that any `if` statement cannot depend on values inside a `tf.Tensor`. For example, this code block cannot be compiled with XLA!\n\n```python\nif tf.reduce_sum(tensor) > 10:\n tensor = tensor / 2.0\n```\n\nThis might seem very restrictive at first, but most neural net code doesn’t need to do this. You can often get around this restriction by using `tf.cond` (see the documentation [here](https://www.tensorflow.org/api_docs/python/tf/cond)) or by removing the conditional and finding a clever math trick with indicator variables instead, like so:\n\n```python\nsum_over_10 = tf.cast(tf.reduce_sum(tensor) > 10, tf.float32)\ntensor = tensor / (1.0 + sum_over_10)\n```\n\nThis code has exactly the same effect as the code above, but by avoiding a conditional, we ensure it will compile with XLA without problems!\n\n#### XLA Rule #2: Your code cannot have ā€œdata-dependent shapesā€\n\nWhat this means is that the shape of all of the `tf.Tensor` objects in your code cannot depend on their values. For example, the function `tf.unique` cannot be compiled with XLA, because it returns a `tensor` containing one instance of each unique value in the input. The shape of this output will obviously be different depending on how repetitive the input `Tensor` was, and so XLA refuses to handle it!\n\nIn general, most neural network code obeys rule #2 by default. However, there are a few common cases where it becomes a problem. One very common one is when you use **label masking**, setting your labels to a negative value to indicate that those positions should be ignored when computing the loss. If you look at NumPy or PyTorch loss functions that support label masking, you will often see code like this that uses [boolean indexing](https://numpy.org/doc/stable/user/basics.indexing.html#boolean-array-indexing):\n\n```python\nlabel_mask = labels >= 0\nmasked_outputs = outputs[label_mask]\nmasked_labels = labels[label_mask]\nloss = compute_loss(masked_outputs, masked_labels)\nmean_loss = torch.mean(loss)\n```\n\nThis code is totally fine in NumPy or PyTorch, but it breaks in XLA! Why? Because the shape of `masked_outputs` and `masked_labels` depends on how many positions are masked - that makes it a **data-dependent shape.** However, just like for rule #1, we can often rewrite this code to yield exactly the same output without any data-dependent shapes.\n\n```python\nlabel_mask = tf.cast(labels >= 0, tf.float32)\nloss = compute_loss(outputs, labels)\nloss = loss * label_mask # Set negative label positions to 0\nmean_loss = tf.reduce_sum(loss) / tf.reduce_sum(label_mask)\n```\n\nHere, we avoid data-dependent shapes by computing the loss for every position, but zeroing out the masked positions in both the numerator and denominator when we calculate the mean, which yields exactly the same result as the first block while maintaining XLA compatibility. Note that we use the same trick as in rule #1 - converting a `tf.bool` to `tf.float32` and using it as an indicator variable. This is a really useful trick, so remember it if you need to convert your own code to XLA!\n\n#### XLA Rule #3: XLA will need to recompile your model for every different input shape it sees\n\nThis is the big one. What this means is that if your input shapes are very variable, XLA will have to recompile your model over and over, which will create huge performance problems. This commonly arises in NLP models, where input texts have variable lengths after tokenization. In other modalities, static shapes are more common and this rule is much less of a problem.\n\nHow can you get around rule #3? The key is **padding** - if you pad all your inputs to the same length, and then use an `attention_mask`, you can get the same results as you’d get from variable shapes, but without any XLA issues. However, excessive padding can cause severe slowdown too - if you pad all your samples to the maximum length in the whole dataset, you might end up with batches consisting endless padding tokens, which will waste a lot of compute and memory!\n\nThere isn’t a perfect solution to this problem. However, you can try some tricks. One very useful trick is to **pad batches of samples up to a multiple of a number like 32 or 64 tokens.** This often only increases the number of tokens by a small amount, but it hugely reduces the number of unique input shapes, because every input shape now has to be a multiple of 32 or 64. Fewer unique input shapes means fewer XLA compilations!\n\n\n\n**šŸ¤—Specific HuggingFace TipšŸ¤—:** Our tokenizers and data collators have methods that can help you here. You can use `padding=\"max_length\"` or `padding=\"longest\"` when calling tokenizers to get them to output padded data. Our tokenizers and data collators also have a `pad_to_multiple_of` argument that you can use to reduce the number of unique input shapes you see!\n\n\n\n### How do I actually train my model on TPU?\n\nOnce your training is XLA-compatible and (if you’re using TPU Node / Colab) your dataset has been prepared appropriately, running on TPU is surprisingly easy! All you really need to change in your code is to add a few lines to initialize your TPU, and to ensure that your model and dataset are created inside a `TPUStrategy` scope. Take a look at [our TPU example notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/tpu_training-tf.ipynb) to see this in action!\n\n### Summary\n\nThere was a lot in here, so let’s summarize with a quick checklist you can follow when you want to get your model ready for TPU training:\n\n- Make sure your code follows the three rules of XLA\n- Compile your model with `jit_compile=True` on CPU/GPU and confirm that you can train it with XLA\n- Either load your dataset into memory or use a TPU-compatible dataset loading approach (see [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/tpu_training-tf.ipynb))\n- Migrate your code either to Colab (with accelerator set to ā€œTPUā€) or a TPU VM on Google Cloud\n- Add TPU initializer code (see [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/tpu_training-tf.ipynb))\n- Create your `TPUStrategy` and make sure dataset loading and model creation are inside the `strategy.scope()` (see [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/tpu_training-tf.ipynb))\n- Don’t forget to take `jit_compile=True` out again when you move to TPU!\n- šŸ™šŸ™šŸ™šŸ„ŗšŸ„ŗšŸ„ŗ\n- Call model.fit()\n- You did it!", "source": "huggingface_doc", "domain": "software" }, { "text": "Git over SSH\n\nYou can access and write data in repositories on huggingface.co using SSH (Secure Shell Protocol). When you connect via SSH, you authenticate using a private key file on your local machine.\n\nSome actions, such as pushing changes, or cloning private repositories, will require you to upload your SSH public key to your account on huggingface.co.\n\nYou can use a pre-existing SSH key, or generate a new one specifically for huggingface.co.\n\n## Checking for existing SSH keys\n\nIf you have an existing SSH key, you can use that key to authenticate Git operations over SSH.\n\nSSH keys are usually located under `~/.ssh` on Mac & Linux, and under `C:\\\\Users\\\\\\\\.ssh` on Windows. List files under that directory and look for files of the form:\n\n- id_rsa.pub\n- id_ecdsa.pub\n- id_ed25519.pub\n\nThose files contain your SSH public key.\n\nIf you don't have such file under `~/.ssh`, you will have to [generate a new key](#generating-a-new-ssh-keypair). Otherwise, you can [add your existing SSH public key(s) to your huggingface.co account](#add-a-ssh-key-to-your-account).\n\n## Generating a new SSH keypair\n\nIf you don't have any SSH keys on your machine, you can use `ssh-keygen` to generate a new SSH key pair (public + private keys):\n\n```\n$ ssh-keygen -t ed25519 -C \"your.email@example.co\"\n```\n\nWe recommend entering a passphrase when you are prompted to. A passphrase is an extra layer of security: it is a password that will be prompted whenever you use your SSH key.\n\nOnce your new key is generated, add it to your SSH agent with `ssh-add`:\n\n```\n$ ssh-add ~/.ssh/id_ed25519\n```\n\nIf you chose a different location than the default to store your SSH key, you would have to replace `~/.ssh/id_ed25519` with the file location you used.\n\n## Add a SSH key to your account\n\nTo access private repositories with SSH, or to push changes via SSH, you will need to add your SSH public key to your huggingface.co account. You can manage your SSH keys [in your user settings](https://huggingface.co/settings/keys).\n\nTo add a SSH key to your account, click on the \"Add SSH key\" button.\n\nThen, enter a name for this key (for example, \"Personal computer\"), and copy and paste the content of your **public** SSH key in the area below. The public key is located in the `~/.ssh/id_XXXX.pub` file you found or generated in the previous steps.\n\nClick on \"Add key\", and voilĆ ! You have added a SSH key to your huggingface.co account.\n\n## Testing your SSH authentication\n\nOnce you have added your SSH key to your huggingface.co account, you can test that the connection works as expected.\n\nIn a terminal, run:\n```\n$ ssh -T git@hf.co\n```\n\nIf you see a message with your username, congrats! Everything went well, you are ready to use git over SSH.\n\nOtherwise, if the message states something like the following, make sure your SSH key is actually used by your SSH agent.\n```\nHi anonymous, welcome to Hugging Face.\n```", "source": "huggingface_doc", "domain": "software" }, { "text": "!---\nCopyright 2022 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"License\");\nyou may not use this file except in compliance with the License.\nYou may obtain a copy of the License at\n\n http://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software\ndistributed under the License is distributed on an \"AS IS\" BASIS,\nWITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\nSee the License for the specific language governing permissions and\nlimitations under the License.\n-->\n\n# Token classification with LayoutLMv3 (PyTorch version)\n\nThis directory contains a script, `run_funsd_cord.py`, that can be used to fine-tune (or evaluate) LayoutLMv3 on form understanding datasets, such as [FUNSD](https://guillaumejaume.github.io/FUNSD/) and [CORD](https://github.com/clovaai/cord).\n\nThe script `run_funsd_cord.py` leverages the šŸ¤— Datasets library and the Trainer API. You can easily customize it to your needs.\n\n## Fine-tuning on FUNSD\n\nFine-tuning LayoutLMv3 for token classification on [FUNSD](https://guillaumejaume.github.io/FUNSD/) can be done as follows:\n\n```bash\npython run_funsd_cord.py \\\n --model_name_or_path microsoft/layoutlmv3-base \\\n --dataset_name funsd \\\n --output_dir layoutlmv3-test \\\n --do_train \\\n --do_eval \\\n --max_steps 1000 \\\n --evaluation_strategy steps \\\n --eval_steps 100 \\\n --learning_rate 1e-5 \\\n --load_best_model_at_end \\\n --metric_for_best_model \"eval_f1\" \\\n --push_to_hub \\\n --push_to_hub°model_id layoutlmv3-finetuned-funsd\n```\n\nšŸ‘€ The resulting model can be found here: https://huggingface.co/nielsr/layoutlmv3-finetuned-funsd. By specifying the `push_to_hub` flag, the model gets uploaded automatically to the hub (regularly), together with a model card, which includes metrics such as precision, recall and F1. Note that you can easily update the model card, as it's just a README file of the respective repo on the hub.\n\nThere's also the \"Training metrics\" [tab](https://huggingface.co/nielsr/layoutlmv3-finetuned-funsd/tensorboard), which shows Tensorboard logs over the course of training. Pretty neat, huh?\n\n## Fine-tuning on CORD\n\nFine-tuning LayoutLMv3 for token classification on [CORD](https://github.com/clovaai/cord) can be done as follows:\n\n```bash\npython run_funsd_cord.py \\\n --model_name_or_path microsoft/layoutlmv3-base \\\n --dataset_name cord \\\n --output_dir layoutlmv3-test \\\n --do_train \\\n --do_eval \\\n --max_steps 1000 \\\n --evaluation_strategy steps \\\n --eval_steps 100 \\\n --learning_rate 5e-5 \\\n --load_best_model_at_end \\\n --metric_for_best_model \"eval_f1\" \\\n --push_to_hub \\\n --push_to_hub°model_id layoutlmv3-finetuned-cord\n```\n\nšŸ‘€ The resulting model can be found here: https://huggingface.co/nielsr/layoutlmv3-finetuned-cord. Note that a model card gets generated automatically in case you specify the `push_to_hub` flag.", "source": "huggingface_doc", "domain": "software" }, { "text": "SE-ResNet\n\n**SE ResNet** is a variant of a [ResNet](https://www.paperswithcode.com/method/resnet) that employs [squeeze-and-excitation blocks](https://paperswithcode.com/method/squeeze-and-excitation-block) to enable the network to perform dynamic channel-wise feature recalibration.\n\n## How do I use this model on an image?\n\nTo load a pretrained model:\n\n```py\n>>> import timm\n>>> model = timm.create_model('seresnet152d', pretrained=True)\n>>> model.eval()\n```\n\nTo load and preprocess the image:\n\n```py \n>>> import urllib\n>>> from PIL import Image\n>>> from timm.data import resolve_data_config\n>>> from timm.data.transforms_factory import create_transform\n\n>>> config = resolve_data_config({}, model=model)\n>>> transform = create_transform(**config)\n\n>>> url, filename = (\"https://github.com/pytorch/hub/raw/master/images/dog.jpg\", \"dog.jpg\")\n>>> urllib.request.urlretrieve(url, filename)\n>>> img = Image.open(filename).convert('RGB')\n>>> tensor = transform(img).unsqueeze(0) # transform and add batch dimension\n```\n\nTo get the model predictions:\n\n```py\n>>> import torch\n>>> with torch.no_grad():\n... out = model(tensor)\n>>> probabilities = torch.nn.functional.softmax(out[0], dim=0)\n>>> print(probabilities.shape)\n>>> # prints: torch.Size([1000])\n```\n\nTo get the top-5 predictions class names:\n\n```py\n>>> # Get imagenet class mappings\n>>> url, filename = (\"https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt\", \"imagenet_classes.txt\")\n>>> urllib.request.urlretrieve(url, filename) \n>>> with open(\"imagenet_classes.txt\", \"r\") as f:\n... categories = [s.strip() for s in f.readlines()]\n\n>>> # Print top categories per image\n>>> top5_prob, top5_catid = torch.topk(probabilities, 5)\n>>> for i in range(top5_prob.size(0)):\n... print(categories[top5_catid[i]], top5_prob[i].item())\n>>> # prints class names and probabilities like:\n>>> # [('Samoyed', 0.6425196528434753), ('Pomeranian', 0.04062102362513542), ('keeshond', 0.03186424449086189), ('white wolf', 0.01739676296710968), ('Eskimo dog', 0.011717947199940681)]\n```\n\nReplace the model name with the variant you want to use, e.g. `seresnet152d`. You can find the IDs in the model summaries at the top of this page.\n\nTo extract image features with this model, follow the [timm feature extraction examples](../feature_extraction), just change the name of the model you want to use.\n\n## How do I finetune this model?\n\nYou can finetune any of the pre-trained models just by changing the classifier (the last layer).\n\n```py\n>>> model = timm.create_model('seresnet152d', pretrained=True, num_classes=NUM_FINETUNE_CLASSES)\n```\nTo finetune on your own dataset, you have to write a training loop or adapt [timm's training\nscript](https://github.com/rwightman/pytorch-image-models/blob/master/train.py) to use your dataset.\n\n## How do I train this model?\n\nYou can follow the [timm recipe scripts](../scripts) for training a new model afresh.\n\n## Citation\n\n```BibTeX\n@misc{hu2019squeezeandexcitation,\n title={Squeeze-and-Excitation Networks}, \n author={Jie Hu and Li Shen and Samuel Albanie and Gang Sun and Enhua Wu},\n year={2019},\n eprint={1709.01507},\n archivePrefix={arXiv},\n primaryClass={cs.CV}\n}\n```\n\n", "source": "huggingface_doc", "domain": "software" }, { "text": "--\ntitle: poseval\nemoji: šŸ¤— \ncolorFrom: blue\ncolorTo: red\nsdk: gradio\nsdk_version: 3.19.1\napp_file: app.py\npinned: false\ntags:\n- evaluate\n- metric\ndescription: >-\n The poseval metric can be used to evaluate POS taggers. Since seqeval does not work well with POS data \n that is not in IOB format the poseval is an alternative. It treats each token in the dataset as independant \n observation and computes the precision, recall and F1-score irrespective of sentences. It uses scikit-learns's\n classification report to compute the scores.\n---\n\n# Metric Card for peqeval\n\n## Metric description\n\nThe poseval metric can be used to evaluate POS taggers. Since seqeval does not work well with POS data (see e.g. [here](https://stackoverflow.com/questions/71327693/how-to-disable-seqeval-label-formatting-for-pos-tagging)) that is not in IOB format the poseval is an alternative. It treats each token in the dataset as independant observation and computes the precision, recall and F1-score irrespective of sentences. It uses scikit-learns's [classification report](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html) to compute the scores.\n\n## How to use \n\nPoseval produces labelling scores along with its sufficient statistics from a source against references.\n\nIt takes two mandatory arguments:\n\n`predictions`: a list of lists of predicted labels, i.e. estimated targets as returned by a tagger.\n\n`references`: a list of lists of reference labels, i.e. the ground truth/target values.\n\nIt can also take several optional arguments:\n\n`zero_division`: Which value to substitute as a metric value when encountering zero division. Should be one of [`0`,`1`,`\"warn\"`]. `\"warn\"` acts as `0`, but the warning is raised.\n\n```python\n>>> predictions = [['INTJ', 'ADP', 'PROPN', 'NOUN', 'PUNCT', 'INTJ', 'ADP', 'PROPN', 'VERB', 'SYM']]\n>>> references = [['INTJ', 'ADP', 'PROPN', 'PROPN', 'PUNCT', 'INTJ', 'ADP', 'PROPN', 'PROPN', 'SYM']]\n>>> poseval = evaluate.load(\"poseval\")\n>>> results = poseval.compute(predictions=predictions, references=references)\n>>> print(list(results.keys()))\n['ADP', 'INTJ', 'NOUN', 'PROPN', 'PUNCT', 'SYM', 'VERB', 'accuracy', 'macro avg', 'weighted avg']\n>>> print(results[\"accuracy\"])\n0.8\n>>> print(results[\"PROPN\"][\"recall\"])\n0.5\n```\n\n## Output values\n\nThis metric returns a a classification report as a dictionary with a summary of scores for overall and per type:\n\nOverall (weighted and macro avg):\n\n`accuracy`: the average [accuracy](https://huggingface.co/metrics/accuracy), on a scale between 0.0 and 1.0.\n \n`precision`: the average [precision](https://huggingface.co/metrics/precision), on a scale between 0.0 and 1.0.\n \n`recall`: the average [recall](https://huggingface.co/metrics/recall), on a scale between 0.0 and 1.0.\n\n`f1`: the average [F1 score](https://huggingface.co/metrics/f1), which is the harmonic mean of the precision and recall. It also has a scale of 0.0 to 1.0.\n\nPer type (e.g. `MISC`, `PER`, `LOC`,...):\n\n`precision`: the average [precision](https://huggingface.co/metrics/precision), on a scale between 0.0 and 1.0.\n\n`recall`: the average [recall](https://huggingface.co/metrics/recall), on a scale between 0.0 and 1.0.\n\n`f1`: the average [F1 score](https://huggingface.co/metrics/f1), on a scale between 0.0 and 1.0.\n\n## Examples \n\n```python\n>>> predictions = [['INTJ', 'ADP', 'PROPN', 'NOUN', 'PUNCT', 'INTJ', 'ADP', 'PROPN', 'VERB', 'SYM']]\n>>> references = [['INTJ', 'ADP', 'PROPN', 'PROPN', 'PUNCT', 'INTJ', 'ADP', 'PROPN', 'PROPN', 'SYM']]\n>>> poseval = evaluate.load(\"poseval\")\n>>> results = poseval.compute(predictions=predictions, references=references)\n>>> print(list(results.keys()))\n['ADP', 'INTJ', 'NOUN', 'PROPN', 'PUNCT', 'SYM', 'VERB', 'accuracy', 'macro avg', 'weighted avg']\n>>> print(results[\"accuracy\"])\n0.8\n>>> print(results[\"PROPN\"][\"recall\"])\n0.5\n```\n\n## Limitations and bias\n\nIn contrast to [seqeval](https://github.com/chakki-works/seqeval), the poseval metric treats each token independently and computes the classification report over all concatenated sequences..\n\n## Citation\n\n```bibtex\n@article{scikit-learn,\n title={Scikit-learn: Machine Learning in {P}ython},\n author={Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V.\n and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P.\n and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and\n Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.},\n journal={Journal of Machine Learning Research},\n volume={12},\n pages={2825--2830},\n year={2011}\n}\n```\n \n## Further References \n- [README for seqeval at GitHub](https://github.com/chakki-works/seqeval)\n- [Classification report](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html) \n- [Issues with seqeval](https://stackoverflow.com/questions/71327693/how-to-disable-seqeval-label-formatting-for-pos-tagging)", "source": "huggingface_doc", "domain": "software" }, { "text": "--\ntitle: \"Large Language Models: A New Moore's Law?\"\nthumbnail: /blog/assets/33_large_language_models/01_model_size.jpg\nauthors:\n- user: juliensimon\n---\n\n# Large Language Models: A New Moore's Law?\n\nA few days ago, Microsoft and NVIDIA [introduced](https://www.microsoft.com/en-us/research/blog/using-deepspeed-and-megatron-to-train-megatron-turing-nlg-530b-the-worlds-largest-and-most-powerful-generative-language-model/) Megatron-Turing NLG 530B, a Transformer-based model hailed as \"*the world’s largest and most powerful generative language model*.\"\n \nThis is an impressive show of Machine Learning engineering, no doubt about it. Yet, should we be excited about this mega-model trend? I, for one, am not. Here's why.\n\n\n \n\n\n### This is your Brain on Deep Learning\n\nResearchers estimate that the human brain contains an average of [86 billion neurons](https://pubmed.ncbi.nlm.nih.gov/19226510/) and 100 trillion synapses. It's safe to assume that not all of them are dedicated to language either. Interestingly, GPT-4 is [expected](https://www.wired.com/story/cerebras-chip-cluster-neural-networks-ai/) to have about 100 trillion parameters... As crude as this analogy is, shouldn't we wonder whether building language models that are about the size of the human brain is the best long-term approach?\n\nOf course, our brain is a marvelous device, produced by millions of years of evolution, while Deep Learning models are only a few decades old. Still, our intuition should tell us that something doesn't compute (pun intended).\n\n### Deep Learning, Deep Pockets?\n\nAs you would expect, training a 530-billion parameter model on humongous text datasets requires a fair bit of infrastructure. In fact, Microsoft and NVIDIA used hundreds of DGX A100 multi-GPU servers. At $199,000 a piece, and factoring in networking equipment, hosting costs, etc., anyone looking to replicate this experiment would have to spend close to $100 million dollars. Want fries with that?\n\nSeriously, which organizations have business use cases that would justify spending $100 million on Deep Learning infrastructure? Or even $10 million? Very few. So who are these models for, really?\n\n### That Warm Feeling is your GPU Cluster\n\nFor all its engineering brilliance, training Deep Learning models on GPUs is a brute force technique. According to the spec sheet, each DGX server can consume up to 6.5 kilowatts. Of course, you'll need at least as much cooling power in your datacenter (or your server closet). Unless you're the Starks and need to keep Winterfell warm in winter, that's another problem you'll have to deal with. \n\nIn addition, as public awareness grows on climate and social responsibility issues, organizations need to account for their carbon footprint. According to this 2019 [study](https://arxiv.org/pdf/1906.02243.pdf) from the University of Massachusetts, \"*training BERT on GPU is roughly equivalent to a trans-American flight*\".\n\nBERT-Large has 340 million parameters. One can only extrapolate what the footprint of Megatron-Turing could be... People who know me wouldn't call me a bleeding-heart environmentalist. Still, some numbers are hard to ignore.\n\n### So?\n\nAm I excited by Megatron-Turing NLG 530B and whatever beast is coming next? No. Do I think that the (relatively small) benchmark improvement is worth the added cost, complexity and carbon footprint? No. Do I think that building and promoting these huge models is helping organizations understand and adopt Machine Learning ? No.\n\nI'm left wondering what's the point of it all. Science for the sake of science? Good old marketing? Technological supremacy? Probably a bit of each. I'll leave them to it, then.\n\nInstead, let me focus on pragmatic and actionable techniques that you can all use to build high quality Machine Learning solutions.\n\n### Use Pretrained Models\n\nIn the vast majority of cases, you won't need a custom model architecture. Maybe you'll *want* a custom one (which is a different thing), but there be dragons. Experts only!\n\nA good starting point is to look for [models](https://huggingface.co/models) that have been pretrained for the task you're trying to solve (say, [summarizing English text](https://huggingface.co/models?language=en&pipeline_tag=summarization&sort=downloads)).\n\nThen, you should quickly try out a few models to predict your own data. If metrics tell you that one works well enough, you're done! If you need a little more accuracy, you should consider fine-tuning the model (more on this in a minute).\n\n### Use Smaller Models\n\nWhen evaluating models, you should pick the smallest one that can deliver the accuracy you need. It will predict faster and require fewer hardware resources for training and inference. Frugality goes a long way.\n\nIt's nothing new either. Computer Vision practitioners will remember when [SqueezeNet](https://arxiv.org/abs/1602.07360) came out in 2017, achieving a 50x reduction in model size compared to [AlexNet](https://papers.nips.cc/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html), while meeting or exceeding its accuracy. How clever that was!\n\nDownsizing efforts are also under way in the Natural Language Processing community, using transfer learning techniques such as [knowledge distillation](https://en.wikipedia.org/wiki/Knowledge_distillation). [DistilBERT](https://arxiv.org/abs/1910.01108) is perhaps its most widely known achievement. Compared to the original BERT model, it retains 97% of language understanding while being 40% smaller and 60% faster. You can try it [here](https://huggingface.co/distilbert-base-uncased). The same approach has been applied to other models, such as Facebook's [BART](https://arxiv.org/abs/1910.13461), and you can try DistilBART [here](https://huggingface.co/models?search=distilbart).\n\nRecent models from the [Big Science](https://bigscience.huggingface.co/) project are also very impressive. As visible in this graph included in the [research paper](https://arxiv.org/abs/2110.08207), their T0 model outperforms GPT-3 on many tasks while being 16x smaller.\n\n\n \n\n\nYou can try T0 [here](https://huggingface.co/bigscience/T0pp). This is the kind of research we need more of!\n\n### Fine-Tune Models\n\nIf you need to specialize a model, there should be very few reasons to train it from scratch. Instead, you should fine-tune it, that is to say train it only for a few epochs on your own data. If you're short on data, maybe of one these [datasets](https://huggingface.co/datasets) can get you started.\n\nYou guessed it, that's another way to do transfer learning, and it'll help you save on everything!\n \n* Less data to collect, store, clean and annotate,\n* Faster experiments and iterations,\n* Fewer resources required in production.\n\nIn other words: save time, save money, save hardware resources, save the world! \n\nIf you need a tutorial, the Hugging Face [course](https://huggingface.co/course) will get you started in no time.\n\n### Use Cloud-Based Infrastructure\n\nLike them or not, cloud companies know how to build efficient infrastructure. Sustainability studies show that cloud-based infrastructure is more energy and carbon efficient than the alternative: see [AWS](https://sustainability.aboutamazon.com/environment/the-cloud), [Azure](https://azure.microsoft.com/en-us/global-infrastructure/sustainability), and [Google](https://cloud.google.com/sustainability). Earth.org [says](https://earth.org/environmental-impact-of-cloud-computing/) that while cloud infrastructure is not perfect, \"[*it's] more energy efficient than the alternative and facilitates environmentally beneficial services and economic growth.*\"\n\nCloud certainly has a lot going for it when it comes to ease of use, flexibility and pay as you go. It's also a little greener than you probably thought. If you're short on GPUs, why not try fine-tune your Hugging Face models on [Amazon SageMaker](https://aws.amazon.com/sagemaker/), AWS' managed service for Machine Learning? We've got [plenty of examples](https://huggingface.co/docs/sagemaker/train) for you.\n\n### Optimize Your Models\n\nFrom compilers to virtual machines, software engineers have long used tools that automatically optimize their code for whatever hardware they're running on. \n\nHowever, the Machine Learning community is still struggling with this topic, and for good reason. Optimizing models for size and speed is a devilishly complex task, which involves techniques such as:\n\n* Specialized hardware that speeds up training ([Graphcore](https://www.graphcore.ai/), [Habana](https://habana.ai/)) and inference ([Google TPU](https://cloud.google.com/tpu), [AWS Inferentia](https://aws.amazon.com/machine-learning/inferentia/)).\n* Pruning: remove model parameters that have little or no impact on the predicted outcome.\n* Fusion: merge model layers (say, convolution and activation).\n* Quantization: storing model parameters in smaller values (say, 8 bits instead of 32 bits)\n\nFortunately, automated tools are starting to appear, such as the [Optimum](https://huggingface.co/hardware) open source library, and [Infinity](https://huggingface.co/infinity), a containerized solution that delivers Transformers accuracy at 1-millisecond latency.\n\n### Conclusion \n\nLarge language model size has been increasing 10x every year for the last few years. This is starting to look like another [Moore's Law](https://en.wikipedia.org/wiki/Moore%27s_law). \n\nWe've been there before, and we should know that this road leads to diminishing returns, higher cost, more complexity, and new risks. Exponentials tend not to end well. Remember [Meltdown and Spectre](https://meltdownattack.com/)? Do we want to find out what that looks like for AI?\n\nInstead of chasing trillion-parameter models (place your bets), wouldn't all be better off if we built practical and efficient solutions that all developers can use to solve real-world problems?\n\n*Interested in how Hugging Face can help your organization build and deploy production-grade Machine Learning solutions? Get in touch at [julsimon@huggingface.co](mailto:julsimon@huggingface.co) (no recruiters, no sales pitches, please).*", "source": "huggingface_doc", "domain": "software" }, { "text": "hat is dynamic padding? In the \"Batching Inputs together\" video, we have seen that to be able to group inputs of different lengths in the same batch, we need to add padding tokens to all the short inputs until they are all of the same length. Here for instance, the longest sentence is the third one, and we need to add 5, 2 and 7 pad tokens to the other to have four sentences of the same lengths. When dealing with a whole dataset, there are various padding strategies we can apply. The most obvious one is to pad all the elements of the dataset to the same length: the length of the longest sample. This will then give us batches that all have the same shape determined by the maximum sequence length. The downside is that batches composed from short sentences will have a lot of padding tokens which introduce more computations in the model we ultimately don't need. To avoid this, another strategy is to pad the elements when we batch them together, to the longest sentence inside the batch. This way batches composed of short inputs will be smaller than the batch containing the longest sentence in the dataset. This will yield some nice speedup on CPU and GPU. The downside is that all batches will then have different shapes, which slows down training on other accelerators like TPUs. Let's see how to apply both strategies in practice. We have actually seen how to apply fixed padding in the Datasets Overview video, when we preprocessed the MRPC dataset: after loading the dataset and tokenizer, we applied the tokenization to all the dataset with padding and truncation to make all samples of length 128. As a result, if we pass this dataset to a PyTorch DataLoader, we get batches of shape batch size (here 16) by 128. To apply dynamic padding, we must defer the padding to the batch preparation, so we remove that part from our tokenize function. We still leave the truncation part so that inputs that are bigger than the maximum length accepted by the model (usually 512) get truncated to that length. Then we pad our samples dynamically by using a data collator. Those classes in the Transformers library are responsible for applying all the final processing needed before forming a batch, here DataCollatorWithPadding will pad the samples to the maximum length inside the batch of sentences. We pass it to the PyTorch DataLoader as a collate function, then observe that the batches generated have various lenghs, all way below the 128 from before. Dynamic batching will almost always be faster on CPUs and GPUs, so you should apply it if you can. Remember to switch back to fixed padding however if you run your training script on TPU or need batches of fixed shapes.", "source": "huggingface_doc", "domain": "software" }, { "text": "Datasets server - worker\n\n> Workers that pre-compute and cache the response to /splits, /first-rows, /parquet, /info and /size.\n\n## Configuration\n\nUse environment variables to configure the workers. The prefix of each environment variable gives its scope.\n\n### Uvicorn\n\nThe following environment variables are used to configure the Uvicorn server (`WORKER_UVICORN_` prefix). It is used for the /healthcheck and the /metrics endpoints:\n\n- `WORKER_UVICORN_HOSTNAME`: the hostname. Defaults to `\"localhost\"`.\n- `WORKER_UVICORN_NUM_WORKERS`: the number of uvicorn workers. Defaults to `2`.\n- `WORKER_UVICORN_PORT`: the port. Defaults to `8000`.\n\n### Prometheus\n\n- `PROMETHEUS_MULTIPROC_DIR`: the directory where the uvicorn workers share their prometheus metrics. See https://github.com/prometheus/client_python#multiprocess-mode-eg-gunicorn. Defaults to empty, in which case every uvicorn worker manages its own metrics, and the /metrics endpoint returns the metrics of a random worker.\n\n## Worker configuration\n\nSet environment variables to configure the worker.\n\n- `WORKER_CONTENT_MAX_BYTES`: the maximum size in bytes of the response content computed by a worker (to prevent returning big responses in the REST API). Defaults to `10_000_000`.\n- `WORKER_DIFFICULTY_MAX`: the maximum difficulty of the jobs to process. Defaults to None.\n- `WORKER_DIFFICULTY_MIN`: the minimum difficulty of the jobs to process. Defaults to None.\n- `WORKER_HEARTBEAT_INTERVAL_SECONDS`: the time interval between two heartbeats. Each heartbeat updates the job \"last_heartbeat\" field in the queue. Defaults to `60` (1 minute).\n- `WORKER_JOB_TYPES_BLOCKED`: comma-separated list of job types that will not be processed, e.g. \"dataset-config-names,dataset-split-names\". If empty, no job type is blocked. Defaults to empty.\n- `WORKER_JOB_TYPES_ONLY`: comma-separated list of the non-blocked job types to process, e.g. \"dataset-config-names,dataset-split-names\". If empty, the worker processes all the non-blocked jobs. Defaults to empty.\n- `WORKER_KILL_LONG_JOB_INTERVAL_SECONDS`: the time interval at which the worker looks for long jobs to kill them. Defaults to `60` (1 minute).\n- `WORKER_KILL_ZOMBIES_INTERVAL_SECONDS`: the time interval at which the worker looks for zombie jobs to kill them. Defaults to `600` (10 minutes).\n- `WORKER_MAX_DISK_USAGE_PCT`: maximum disk usage of every storage disk in the list (in percentage) to allow a job to start. Set to 0 to disable the test. Defaults to 90.\n- `WORKER_MAX_JOB_DURATION_SECONDS`: the maximum duration allowed for a job to run. If the job runs longer, it is killed (see `WORKER_KILL_LONG_JOB_INTERVAL_SECONDS`). Defaults to `1200` (20 minutes).\n- `WORKER_MAX_LOAD_PCT`: maximum load of the machine (in percentage: the max between the 1m load and the 5m load divided by the number of CPUs \\*100) allowed to start a job. Set to 0 to disable the test. Defaults to 70.\n- `WORKER_MAX_MEMORY_PCT`: maximum memory (RAM + SWAP) usage of the machine (in percentage) allowed to start a job. Set to 0 to disable the test. Defaults to 80.\n- `WORKER_MAX_MISSING_HEARTBEATS`: the number of hearbeats a job must have missed to be considered a zombie job. Defaults to `5`.\n- `WORKER_SLEEP_SECONDS`: wait duration in seconds at each loop iteration before checking if resources are available and processing a job if any is available. Note that the loop doesn't wait just after finishing a job: the next job is immediately processed. Defaults to `15`.\n- `WORKER_STORAGE_PATHS`: comma-separated list of paths to check for disk usage. Defaults to empty.\n\nAlso, it's possible to force the parent directory in which the temporary files (as the current job state file and its associated lock file) will be created by setting `TMPDIR` to a writable directory. If not set, the worker will use the default temporary directory of the system, as described in https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir.\n\n### Datasets based worker\n\nSet environment variables to configure the datasets-based worker (`DATASETS_BASED_` prefix):\n\n- `DATASETS_BASED_HF_DATASETS_CACHE`: directory where the `datasets` library will store the cached datasets' data. If not set, the datasets library will choose the default location. Defaults to None.\n\nAlso, set the modules cache configuration for the datasets-based worker. See [../../libs/libcommon/README.md](../../libs/libcommon/README.md). Note that this variable has no `DATASETS_BASED_` prefix:\n\n- `HF_MODULES_CACHE`: directory where the `datasets` library will store the cached dataset scripts. If not set, the datasets library will choose the default location. Defaults to None.\n\nNote that both directories will be appended to `WORKER_STORAGE_PATHS` (see [../../libs/libcommon/README.md](../../libs/libcommon/README.md)) to hold the workers when the disk is full.\n\n### Numba library\n\nNumba requires setting the `NUMBA_CACHE_DIR` environment variable to a writable directory to cache the compiled functions. Required on cloud infrastructure (see https://stackoverflow.com/a/63367171/7351594):\n\n- `NUMBA_CACHE_DIR`: directory where the `numba` decorators (used by `librosa`) can write cache.\n\nNote that this directory will be appended to `WORKER_STORAGE_PATHS` (see [../../libs/libcommon/README.md](../../libs/libcommon/README.md)) to hold the workers when the disk is full.\n\n### Huggingface_hub library\n\nIf the Hub is not https://huggingface.co (i.e., if you set the `COMMON_HF_ENDPOINT` environment variable), you must set the `HF_ENDPOINT` environment variable to the same value. See https://github.com/huggingface/datasets/pull/5196#issuecomment-1322191411 for more details:\n\n- `HF_ENDPOINT`: the URL of the Hub. Defaults to `https://huggingface.co`.\n\n### First rows worker\n\nSet environment variables to configure the `first-rows` worker (`FIRST_ROWS_` prefix):\n\n- `FIRST_ROWS_MAX_BYTES`: the max size of the /first-rows response in bytes. Defaults to `1_000_000` (1 MB).\n- `FIRST_ROWS_MAX_NUMBER`: the max number of rows fetched by the worker for the split and provided in the /first-rows response. Defaults to `100`.\n- `FIRST_ROWS_MIN_CELL_BYTES`: the minimum size in bytes of a cell when truncating the content of a row (see `FIRST_ROWS_ROWS_MAX_BYTES`). Below this limit, the cell content will not be truncated. Defaults to `100`.\n- `FIRST_ROWS_MIN_NUMBER`: the min number of rows fetched by the worker for the split and provided in the /first-rows response. Defaults to `10`.\n- `FIRST_ROWS_COLUMNS_MAX_NUMBER`: the max number of columns (features) provided in the /first-rows response. If the number of columns is greater than the limit, an error is returned. Defaults to `1_000`.\n\nAlso, set the assets-related configuration for the first-rows worker. See [../../libs/libcommon/README.md](../../libs/libcommon/README.md).\n\n### Parquet and info worker\n\nSet environment variables to configure the `parquet-and-info` worker (`PARQUET_AND_INFO_` prefix):\n\n- `PARQUET_AND_INFO_COMMIT_MESSAGE`: the git commit message when the worker uploads the parquet files to the Hub. Defaults to `Update parquet files`.\n- `PARQUET_AND_INFO_COMMITTER_HF_TOKEN`: the HuggingFace token to commit the parquet files to the Hub. The token must be an app token associated with a user that has the right to 1. create the `refs/convert/parquet` branch (see `PARQUET_AND_INFO_TARGET_REVISION`) and 2. push commits to it on any dataset. [Datasets maintainers](https://huggingface.co/datasets-maintainers) members have these rights. The token must have permission to write. If not set, the worker will fail. Defaults to None.\n- `PARQUET_AND_INFO_MAX_DATASET_SIZE_BYTES`: the maximum size in bytes of the dataset to pre-compute the parquet files. Bigger datasets, or datasets without that information, are partially streamed to get parquet files up to this value. Defaults to `100_000_000`.\n- `PARQUET_AND_INFO_MAX_EXTERNAL_DATA_FILES`: the maximum number of external files of the datasets. Bigger datasets, or datasets without that information, are partially streamed to get parquet files up to `PARQUET_AND_INFO_MAX_DATASET_SIZE_BYTES` bytes. Defaults to `10_000`.\n- `PARQUET_AND_INFO_MAX_ROW_GROUP_BYTE_SIZE_FOR_COPY`: the maximum size in bytes of the row groups of parquet datasets that are copied to the target revision. Bigger datasets, or datasets without that information, are partially streamed to get parquet files up to `PARQUET_AND_INFO_MAX_DATASET_SIZE_BYTES` bytes. Defaults to `100_000_000`.\n- `PARQUET_AND_INFO_SOURCE_REVISION`: the git revision of the dataset to use to prepare the parquet files. Defaults to `main`.\n- `PARQUET_AND_INFO_TARGET_REVISION`: the git revision of the dataset where to store the parquet files. Make sure the committer token (`PARQUET_AND_INFO_COMMITTER_HF_TOKEN`) has the permission to write there. Defaults to `refs/convert/parquet`.\n- `PARQUET_AND_INFO_URL_TEMPLATE`: the URL template to build the parquet file URLs. Defaults to `/datasets/%s/resolve/%s/%s`.\n\n### Duckdb Index worker\n\nSet environment variables to configure the `duckdb-index` worker (`DUCKDB_INDEX_` prefix):\n\n- `DUCKDB_INDEX_CACHE_DIRECTORY`: directory where the temporal duckdb index files are stored. Defaults to empty.\n- `DUCKDB_INDEX_COMMIT_MESSAGE`: the git commit message when the worker uploads the duckdb index file to the Hub. Defaults to `Update duckdb index file`.\n- `DUCKDB_INDEX_COMMITTER_HF_TOKEN`: the HuggingFace token to commit the duckdb index file to the Hub. The token must be an app token associated with a user that has the right to 1. create the `refs/convert/parquet` branch (see `DUCKDB_INDEX_TARGET_REVISION`) and 2. push commits to it on any dataset. [Datasets maintainers](https://huggingface.co/datasets-maintainers) members have these rights. The token must have permission to write. If not set, the worker will fail. Defaults to None.\n- `DUCKDB_INDEX_MAX_DATASET_SIZE_BYTES`: the maximum size in bytes of the dataset's parquet files to index. Datasets with bigger size are ignored. Defaults to `100_000_000`.\n- `DUCKDB_INDEX_TARGET_REVISION`: the git revision of the dataset where to store the duckdb index file. Make sure the committer token (`DUCKDB_INDEX_COMMITTER_HF_TOKEN`) has the permission to write there. Defaults to `refs/convert/parquet`.\n- `DUCKDB_INDEX_URL_TEMPLATE`: the URL template to build the duckdb index file URL. Defaults to `/datasets/%s/resolve/%s/%s`.\n- `DUCKDB_INDEX_EXTENSIONS_DIRECTORY`: directory where the duckdb extensions will be downloaded. Defaults to empty.\n\n### Descriptive statistics worker\n\nSet environment variables to configure the `descriptive-statistics` worker (`DESCRIPTIVE_STATISTICS_` prefix):\n\n- `DESCRIPTIVE_STATISTICS_CACHE_DIRECTORY`: directory to which a dataset in parquet format is downloaded. Defaults to empty.\n- `DESCRIPTIVE_STATISTICS_HISTOGRAM_NUM_BINS`: number of histogram bins (see examples below for more info).\n- `DESCRIPTIVE_STATISTICS_MAX_PARQUET_SIZE_BYTES`: maximum size in bytes of the dataset's parquet files to compute statistics. Datasets with bigger size are ignored. Defaults to `100_000_000`.\n\n#### How descriptive statistics are computed \n\nDescriptive statistics are currently computed for the following data types: strings, floats, and ints (including `ClassLabel` int). \nResponse has two fields: `num_examples` and `statistics`. `statistics` field is a list of dicts with three keys: `column_name`, `column_type`, and `column_statistics`.\n\n`column_type` is one of the following values:\n* `class_label` - for `datasets.ClassLabel` feature\n* `float` - for float dtypes (\"float16\", \"float32\", \"float64\")\n* `int` - for integer dtypes (\"int8\", \"int16\", \"int32\", \"int64\", \"uint8\", \"uint16\", \"uint32\", \"uint64\")\n* `string_label` - for string dtypes (\"string\", \"large_string\") - if there are less than or equal to `MAX_NUM_STRING_LABELS` unique values (hardcoded in worker's code, for now it's 30)\n* `string_text` - for string dtypes (\"string\", \"large_string\") - if there are more than `MAX_NUM_STRING_LABELS` unique values\n* `bool` - for boolean dtype (\"bool\")\n\n`column_statistics` content depends on the feature type, see examples below.\n##### class_label\n\n
example: \n

\n\n```python\n{\n \"column_name\": \"class_col\",\n \"column_type\": \"class_label\",\n \"column_statistics\": {\n \"nan_count\": 0,\n \"nan_proportion\": 0.0,\n \"no_label_count\": 0, # number of -1 values - special value of the `datasets` lib to encode `no label` \n \"no_label_proportion\": 0.0,\n \"n_unique\": 5, # number of unique values (excluding `no label` and nan)\n \"frequencies\": { # mapping value -> its count\n \"this\": 19834,\n \"are\": 20159,\n \"random\": 20109,\n \"words\": 20172,\n \"test\": 19726\n }\n }\n}\n```\n

\n
\n\n##### float\n\nBin size for histogram is counted as `(max_value - min_value) / DESCRIPTIVE_STATISTICS_HISTOGRAM_NUM_BINS`\n\n
example: \n

\n\n```python\n{\n \"column_name\": \"delay\",\n \"column_type\": \"float\",\n \"column_statistics\": {\n \"nan_count\": 0,\n \"nan_proportion\": 0.0,\n \"min\": -10.206,\n \"max\": 8.48053,\n \"mean\": 2.10174,\n \"median\": 3.4012,\n \"std\": 3.12487,\n \"histogram\": {\n \"hist\": [\n 2,\n 34,\n 256,\n 15198,\n 9037,\n 2342,\n 12743,\n 45114,\n 14904,\n 370\n ],\n \"bin_edges\": [\n -10.206,\n -8.33734,\n -6.46869,\n -4.60004,\n -2.73139,\n -0.86273,\n 1.00592,\n 2.87457,\n 4.74322,\n 6.61188,\n 8.48053 # includes maximum value, so len is always len(hist) + 1\n ]\n }\n }\n}\n```\n

\n
\n\n##### int\n\nAs bin edges for integer values also must be integers, bin size is counted as `np.ceil((max_value - min_value + 1) / DESCRIPTIVE_STATISTICS_HISTOGRAM_NUM_BINS)`. Rounding up means that there might be smaller number of bins in response then provided `DESCRIPTIVE_STATISTICS_HISTOGRAM_NUM_BINS`. The last bin's size might be smaller than that of the others if the feature's range is not divisible by the rounded bin size. \n\n
examples: \n

\n\n```python\n{\n \"column_name\": \"direction\",\n \"column_type\": \"int\",\n \"column_statistics\": {\n \"nan_count\": 0,\n \"nan_proportion\": 0.0,\n \"min\": 0,\n \"max\": 1,\n \"mean\": 0.49925,\n \"median\": 0.0,\n \"std\": 0.5,\n \"histogram\": {\n \"hist\": [\n 50075,\n 49925\n ],\n \"bin_edges\": [\n 0,\n 1,\n 1 # if the last value is equal to the last but one, that means that this bin includes only this value\n ]\n }\n }\n},\n{\n \"column_name\": \"hour\",\n \"column_type\": \"int\",\n \"column_statistics\": {\n \"nan_count\": 0,\n \"nan_proportion\": 0.0,\n \"min\": 0,\n \"max\": 23,\n \"mean\": 13.44402,\n \"median\": 14.0,\n \"std\": 5.49455,\n \"histogram\": {\n \"hist\": [\n 2694,\n 2292,\n 16785,\n 16326,\n 16346,\n 17809,\n 16546,\n 11202\n ],\n \"bin_edges\": [\n 0,\n 3,\n 6,\n 9,\n 12,\n 15,\n 18,\n 21,\n 23\n ]\n }\n }\n},\n{\n \"column_name\": \"humidity\",\n \"column_type\": \"int\",\n \"column_statistics\": {\n \"nan_count\": 0,\n \"nan_proportion\": 0.0,\n \"min\": 54,\n \"max\": 99,\n \"mean\": 83.89878,\n \"median\": 85.0,\n \"std\": 8.65174,\n \"histogram\": {\n \"hist\": [\n 554,\n 1662,\n 3823,\n 6532,\n 12512,\n 17536,\n 23871,\n 20355,\n 12896,\n 259\n ],\n \"bin_edges\": [\n 54,\n 59,\n 64,\n 69,\n 74,\n 79,\n 84,\n 89,\n 94,\n 99,\n 99\n ]\n }\n }\n},\n{\n \"column_name\": \"weekday\",\n \"column_type\": \"int\",\n \"column_statistics\": {\n \"nan_count\": 0,\n \"nan_proportion\": 0.0,\n \"min\": 0,\n \"max\": 6,\n \"mean\": 3.08063,\n \"median\": 3.0,\n \"std\": 1.90347,\n \"histogram\": {\n \"hist\": [\n 10282,\n 15416,\n 15291,\n 15201,\n 15586,\n 15226,\n 12998\n ],\n \"bin_edges\": [\n 0,\n 1,\n 2,\n 3,\n 4,\n 5,\n 6,\n 6\n ]\n }\n }\n}\n```\n\n

\n
\n\n##### string_label\n\nIf the number of unique values in a column (within requested split) is <= `MAX_NUM_STRING_LABELS` (currently 30), the column is considered to be a category and the categories counts are computed.\n\n
examples: \n

\n\n```python\n{\n 'column_name': 'string_col',\n 'column_type': 'string_label',\n 'column_statistics': \n {\n \"nan_count\": 0,\n \"nan_proportion\": 0.0,\n \"n_unique\": 5, # number of unique values (excluding nan)\n \"frequencies\": { # mapping value -> its count\n \"this\": 19834,\n \"are\": 20159,\n \"random\": 20109,\n \"words\": 20172,\n \"test\": 19726\n }\n }\n}\n```\n

\n
\n\n##### string_text\n\nIf the number of unique values in a column (within requested split) is > `MAX_NUM_STRING_LABELS` (currently 30), the column is considered to be text and the distribution of text **lengths** is computed.\n\n
example: \n

\n\n```python\n{\n 'column_name': 'text_col',\n 'column_type': 'string_text',\n 'column_statistics': {\n 'max': 296,\n 'mean': 97.46649,\n 'median': 88.0,\n 'min': 11,\n 'nan_count': 0,\n 'nan_proportion': 0.0,\n 'std': 55.82714,\n 'histogram': {\n 'bin_edges': [\n 11,\n 40,\n 69,\n 98,\n 127,\n 156,\n 185,\n 214,\n 243,\n 272,\n 296\n ],\n 'hist': [\n 171,\n 224,\n 235,\n 180,\n 102,\n 99,\n 53,\n 28,\n 10,\n 2\n ]\n },\n }\n}\n```\n

\n
\n\n##### bool\n\n
example: \n

\n\n```python\n{\n 'column_name': 'bool__nan_column', \n 'column_type': 'bool', \n 'column_statistics': \n {\n 'nan_count': 3, \n 'nan_proportion': 0.15, \n 'frequencies': {\n 'False': 7, \n 'True': 10\n }\n }\n}\n```\n

\n
\n\n### Splits worker\n\nThe `splits` worker does not need any additional configuration.\n\n### Common\n\nSee [../../libs/libcommon/README.md](../../libs/libcommon/README.md) for more information about the common configuration.", "source": "huggingface_doc", "domain": "software" }, { "text": "Differences between Dataset and IterableDataset\n\nThere are two types of dataset objects, a [`Dataset`] and an [`IterableDataset`].\nWhichever type of dataset you choose to use or create depends on the size of the dataset.\nIn general, an [`IterableDataset`] is ideal for big datasets (think hundreds of GBs!) due to its lazy behavior and speed advantages, while a [`Dataset`] is great for everything else.\nThis page will compare the differences between a [`Dataset`] and an [`IterableDataset`] to help you pick the right dataset object for you.\n\n## Downloading and streaming\n\nWhen you have a regular [`Dataset`], you can access it using `my_dataset[0]`. This provides random access to the rows.\nSuch datasets are also called \"map-style\" datasets.\nFor example you can download ImageNet-1k like this and access any row:\n\n```python\nfrom datasets import load_dataset\n\nimagenet = load_dataset(\"imagenet-1k\", split=\"train\") # downloads the full dataset\nprint(imagenet[0])\n```\n\nBut one caveat is that you must have the entire dataset stored on your disk or in memory, which blocks you from accessing datasets bigger than the disk.\nBecause it can become inconvenient for big datasets, there exists another type of dataset, the [`IterableDataset`].\nWhen you have an `IterableDataset`, you can access it using a `for` loop to load the data progressively as you iterate over the dataset.\nThis way, only a small fraction of examples is loaded in memory, and you don't write anything on disk.\n\nFor example, you can stream the ImageNet-1k dataset without downloading it on disk:\n\n```python\nfrom datasets import load_dataset\n\nimagenet = load_dataset(\"imagenet-1k\", split=\"train\", streaming=True) # will start loading the data when iterated over\nfor example in imagenet:\n print(example)\n break\n```\n\nStreaming can read online data without writing any file to disk.\nFor example, you can stream datasets made out of multiple shards, each of which is hundreds of gigabytes like [C4](https://huggingface.co/datasets/c4), [OSCAR](https://huggingface.co/datasets/oscar) or [LAION-2B](https://huggingface.co/datasets/laion/laion2B-en).\nLearn more about how to stream a dataset in the [Dataset Streaming Guide](./stream).\n\nThis is not the only difference though, because the \"lazy\" behavior of an `IterableDataset` is also present when it comes to dataset creation and processing.\n\n## Creating map-style datasets and iterable datasets\n\nYou can create a [`Dataset`] using lists or dictionaries, and the data is entirely converted to Arrow so you can easily access any row:\n```python\nmy_dataset = Dataset.from_dict({\"col_1\": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]})\nprint(my_dataset[0])\n```\n\nTo create an `IterableDataset` on the other hand, you must provide a \"lazy\" way to load the data.\nIn Python, we generally use generator functions. These functions `yield` one example at a time, which means you can't access a row by slicing it like a regular `Dataset`:\n```python\ndef my_generator(n):\n for i in range(n):\n yield {\"col_1\": i}\n\nmy_iterable_dataset = IterableDataset.from_generator(my_generator, gen_kwargs={\"n\": 10})\nfor example in my_iterable_dataset:\n print(example)\n break\n```\n\n## Loading local files entirely and progressively\n\nIt is possible to convert local or remote data files to an Arrow [`Dataset`] using [`load_dataset`]:\n```python\ndata_files = {\"train\": [\"path/to/data.csv\"]}\nmy_dataset = load_dataset(\"csv\", data_files=data_files, split=\"train\")\nprint(my_dataset[0])\n```\n\nHowever, this requires a conversion step from CSV to Arrow format, which takes time and disk space if your dataset is big.\n\nTo save disk space and skip the conversion step, you can define an `IterableDataset` by streaming from the local files directly.\nThis way, the data is read progressively from the local files as you iterate over the dataset:\n\n```python\ndata_files = {\"train\": [\"path/to/data.csv\"]}\nmy_iterable_dataset = load_dataset(\"csv\", data_files=data_files, split=\"train\", streaming=True)\nfor example in my_iterable_dataset: # this reads the CSV file progressively as you iterate over the dataset\n print(example)\n break\n```\n\nMany file formats are supported, like CSV, JSONL, and Parquet, as well as image and audio files.\nYou can find more information in the corresponding guides for loading [tabular](./tabular_load), [text](./nlp_load), [vision](./image_load), and [audio](./audio_load]) datasets.\n\n## Eager data processing and lazy data processing\n\nWhen you process a [`Dataset`] object using [`Dataset.map`], the entire dataset is processed immediately and returned.\nThis is similar to how `pandas` works for example.\n\n```python\nmy_dataset = my_dataset.map(process_fn) # process_fn is applied on all the examples of the dataset\nprint(my_dataset[0])\n```\n\nOn the other hand, due to the \"lazy\" nature of an `IterableDataset`, calling [`IterableDataset.map`] does not apply your `map` function over the full dataset.\nInstead, your `map` function is applied on-the-fly.\n\nBecause of that, you can chain multiple processing steps and they will all run at once when you start iterating over the dataset:\n\n```python\nmy_iterable_dataset = my_iterable_dataset.map(process_fn_1)\nmy_iterable_dataset = my_iterable_dataset.filter(filter_fn)\nmy_iterable_dataset = my_iterable_dataset.map(process_fn_2)\n\n# process_fn_1, filter_fn and process_fn_2 are applied on-the-fly when iterating over the dataset\nfor example in my_iterable_dataset: \n print(example)\n break\n```\n\n## Exact and fast approximate shuffling\n\nWhen you shuffle a [`Dataset`] using [`Dataset.shuffle`], you apply an exact shuffling of the dataset.\nIt works by taking a list of indices `[0, 1, 2, ... len(my_dataset) - 1]` and shuffling this list.\nThen, accessing `my_dataset[0]` returns the row and index defined by the first element of the indices mapping that has been shuffled:\n```python\nmy_dataset = my_dataset.shuffle(seed=42)\nprint(my_dataset[0])\n```\n\nSince we don't have random access to the rows in the case of an `IterableDataset`, we can't use a shuffled list of indices and access a row at an arbitrary position.\nThis prevents the use of exact shuffling.\nInstead, a fast approximate shuffling is used in [`IterableDataset.shuffle`].\nIt uses a shuffle buffer to sample random examples iteratively from the dataset.\nSince the dataset is still read iteratively, it provides excellent speed performance:\n```python\nmy_iterable_dataset = my_iterable_dataset.shuffle(seed=42, buffer_size=100)\nfor example in my_iterable_dataset:\n print(example)\n break\n```\n\nBut using a shuffle buffer is not enough to provide a satisfactory shuffling for machine learning model training. So [`IterableDataset.shuffle`] also shuffles the dataset shards if your dataset is made of multiple files or sources:\n\n```python\n# Stream from the internet\nmy_iterable_dataset = load_dataset(\"deepmind/code_contests\", split=\"train\", streaming=True)\nmy_iterable_dataset.n_shards # 39\n\n# Stream from local files\ndata_files = {\"train\": [f\"path/to/data_{i}.csv\" for i in range(1024)]}\nmy_iterable_dataset = load_dataset(\"csv\", data_files=data_files, split=\"train\", streaming=True)\nmy_iterable_dataset.n_shards # 1024\n\n# From a generator function\ndef my_generator(n, sources):\n for source in sources:\n for example_id_for_current_source in range(n):\n yield {\"example_id\": f\"{source}_{example_id_for_current_source}\"}\n\ngen_kwargs = {\"n\": 10, \"sources\": [f\"path/to/data_{i}\" for i in range(1024)]}\nmy_iterable_dataset = IterableDataset.from_generator(my_generator, gen_kwargs=gen_kwargs)\nmy_iterable_dataset.n_shards # 1024\n```\n\n## Speed differences\n\nRegular [`Dataset`] objects are based on Arrow which provides fast random access to the rows.\nThanks to memory mapping and the fact that Arrow is an in-memory format, reading data from disk doesn't do expensive system calls and deserialization.\nIt provides even faster data loading when iterating using a `for` loop by iterating on contiguous Arrow record batches.\n\nHowever as soon as your [`Dataset`] has an indices mapping (via [`Dataset.shuffle`] for example), the speed can become 10x slower.\nThis is because there is an extra step to get the row index to read using the indices mapping, and most importantly, you aren't reading contiguous chunks of data anymore.\nTo restore the speed, you'd need to rewrite the entire dataset on your disk again using [`Dataset.flatten_indices`], which removes the indices mapping.\nThis may take a lot of time depending of the size of your dataset though:\n\n```python\nmy_dataset[0] # fast\nmy_dataset = my_dataset.shuffle(seed=42)\nmy_dataset[0] # up to 10x slower\nmy_dataset = my_dataset.flatten_indices() # rewrite the shuffled dataset on disk as contiguous chunks of data\nmy_dataset[0] # fast again\n```\n\nIn this case, we recommend switching to an [`IterableDataset`] and leveraging its fast approximate shuffling method [`IterableDataset.shuffle`].\nIt only shuffles the shards order and adds a shuffle buffer to your dataset, which keeps the speed of your dataset optimal.\nYou can also reshuffle the dataset easily:\n\n```python\nfor example in enumerate(my_iterable_dataset): # fast\n pass\n\nshuffled_iterable_dataset = my_iterable_dataset.shuffle(seed=42, buffer_size=100)\n\nfor example in enumerate(shuffled_iterable_dataset): # as fast as before\n pass\n\nshuffled_iterable_dataset = my_iterable_dataset.shuffle(seed=1337, buffer_size=100) # reshuffling using another seed is instantaneous\n\nfor example in enumerate(shuffled_iterable_dataset): # still as fast as before\n pass\n```\n\nIf you're using your dataset on multiple epochs, the effective seed to shuffle the shards order in the shuffle buffer is `seed + epoch`.\nIt makes it easy to reshuffle a dataset between epochs:\n```python\nfor epoch in range(n_epochs):\n my_iterable_dataset.set_epoch(epoch)\n for example in my_iterable_dataset: # fast + reshuffled at each epoch using `effective_seed = seed + epoch`\n pass\n```\n\n## Switch from map-style to iterable\n\nIf you want to benefit from the \"lazy\" behavior of an [`IterableDataset`] or their speed advantages, you can switch your map-style [`Dataset`] to an [`IterableDataset`]:\n```python\nmy_iterable_dataset = my_dataset.to_iterable_dataset()\n```\n\nIf you want to shuffle your dataset or [use it with a PyTorch DataLoader](./use_with_pytorch#stream-data), we recommend generating a sharded [`IterableDataset`]:\n```python\nmy_iterable_dataset = my_dataset.to_iterable_dataset(num_shards=1024)\nmy_iterable_dataset.n_shards # 1024\n```", "source": "huggingface_doc", "domain": "software" }, { "text": "Quiz\n\nThe best way to learn and [to avoid the illusion of competence](https://www.coursera.org/lecture/learning-how-to-learn/illusions-of-competence-BuFzf) **is to test yourself.** This will help you to find **where you need to reinforce your knowledge**.\n\n### Q1: Which of the following interpretations of bias-variance tradeoff is the most accurate in the field of Reinforcement Learning?\n\n\n\n### Q2: Which of the following statements are true, when talking about models with bias and/or variance in RL?\n\n\n\n### Q3: Which of the following statements are true about Monte Carlo method?\n\n\n\n### Q4: How would you describe, with your own words, the Actor-Critic Method (A2C)?\n\n
\nSolution\n\nThe idea behind Actor-Critic is that we learn two function approximations:\n1. A `policy` that controls how our agent acts (Ļ€)\n2. A `value` function to assist the policy update by measuring how good the action taken is (q)\n\n\"Actor-Critic,\n\n
\n\n### Q5: Which of the following statements are true about the Actor-Critic Method?\n\n\n\n### Q6: What is `Advantage` in the A2C method?\n\n
\nSolution\n\nInstead of using directly the Action-Value function of the Critic as it is, we could use an `Advantage` function. The idea behind an `Advantage` function is that we calculate the relative advantage of an action compared to the others possible at a state, averaging them.\n\nIn other words: how taking that action at a state is better compared to the average value of the state\n\n\"Advantage\n\n
\n\nCongrats on finishing this Quiz 🄳, if you missed some elements, take time to read the chapter again to reinforce (šŸ˜) your knowledge.", "source": "huggingface_doc", "domain": "software" }, { "text": "Res2Net\n\n**Res2Net** is an image model that employs a variation on bottleneck residual blocks, [Res2Net Blocks](https://paperswithcode.com/method/res2net-block). The motivation is to be able to represent features at multiple scales. This is achieved through a novel building block for CNNs that constructs hierarchical residual-like connections within one single residual block. This represents multi-scale features at a granular level and increases the range of receptive fields for each network layer.\n\n## How do I use this model on an image?\n\nTo load a pretrained model:\n\n```py\n>>> import timm\n>>> model = timm.create_model('res2net101_26w_4s', pretrained=True)\n>>> model.eval()\n```\n\nTo load and preprocess the image:\n\n```py \n>>> import urllib\n>>> from PIL import Image\n>>> from timm.data import resolve_data_config\n>>> from timm.data.transforms_factory import create_transform\n\n>>> config = resolve_data_config({}, model=model)\n>>> transform = create_transform(**config)\n\n>>> url, filename = (\"https://github.com/pytorch/hub/raw/master/images/dog.jpg\", \"dog.jpg\")\n>>> urllib.request.urlretrieve(url, filename)\n>>> img = Image.open(filename).convert('RGB')\n>>> tensor = transform(img).unsqueeze(0) # transform and add batch dimension\n```\n\nTo get the model predictions:\n\n```py\n>>> import torch\n>>> with torch.no_grad():\n... out = model(tensor)\n>>> probabilities = torch.nn.functional.softmax(out[0], dim=0)\n>>> print(probabilities.shape)\n>>> # prints: torch.Size([1000])\n```\n\nTo get the top-5 predictions class names:\n\n```py\n>>> # Get imagenet class mappings\n>>> url, filename = (\"https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt\", \"imagenet_classes.txt\")\n>>> urllib.request.urlretrieve(url, filename) \n>>> with open(\"imagenet_classes.txt\", \"r\") as f:\n... categories = [s.strip() for s in f.readlines()]\n\n>>> # Print top categories per image\n>>> top5_prob, top5_catid = torch.topk(probabilities, 5)\n>>> for i in range(top5_prob.size(0)):\n... print(categories[top5_catid[i]], top5_prob[i].item())\n>>> # prints class names and probabilities like:\n>>> # [('Samoyed', 0.6425196528434753), ('Pomeranian', 0.04062102362513542), ('keeshond', 0.03186424449086189), ('white wolf', 0.01739676296710968), ('Eskimo dog', 0.011717947199940681)]\n```\n\nReplace the model name with the variant you want to use, e.g. `res2net101_26w_4s`. You can find the IDs in the model summaries at the top of this page.\n\nTo extract image features with this model, follow the [timm feature extraction examples](../feature_extraction), just change the name of the model you want to use.\n\n## How do I finetune this model?\n\nYou can finetune any of the pre-trained models just by changing the classifier (the last layer).\n\n```py\n>>> model = timm.create_model('res2net101_26w_4s', pretrained=True, num_classes=NUM_FINETUNE_CLASSES)\n```\nTo finetune on your own dataset, you have to write a training loop or adapt [timm's training\nscript](https://github.com/rwightman/pytorch-image-models/blob/master/train.py) to use your dataset.\n\n## How do I train this model?\n\nYou can follow the [timm recipe scripts](../scripts) for training a new model afresh.\n\n## Citation\n\n```BibTeX\n@article{Gao_2021,\n title={Res2Net: A New Multi-Scale Backbone Architecture},\n volume={43},\n ISSN={1939-3539},\n url={http://dx.doi.org/10.1109/TPAMI.2019.2938758},\n DOI={10.1109/tpami.2019.2938758},\n number={2},\n journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},\n publisher={Institute of Electrical and Electronics Engineers (IEEE)},\n author={Gao, Shang-Hua and Cheng, Ming-Ming and Zhao, Kai and Zhang, Xin-Yu and Yang, Ming-Hsuan and Torr, Philip},\n year={2021},\n month={Feb},\n pages={652–662}\n}\n```\n\n", "source": "huggingface_doc", "domain": "software" }, { "text": "ow to slice and dice a dataset. Most of the time, the data you work with won’t be perfectly prepared for training models. In this video we’ll explore various features that Datasets provides to clean up your datasets. The Datasets library provides several built-in methods that allow you to wrangle your data. In this video we'll see how you can shuffle and split your data, select the rows you're interested in, tweak the columns, and apply processing functions with the map() method. Let's start with shuffling. It is generally a good idea to apply shuffling to the training set so that your model doesn't learn any artificial ordering in the data. If you want to shuffle the whole dataset, you can apply the appropriately named shuffle() method to your dataset. You can see an example of this method in action here, where we've downloaded the training split of the SQUAD dataset and shuffled all the rows randomly.Another way to shuffle the data is to create random train and test splits. This can be useful if you have to create your own test splits from raw data. To do this, you just apply the train_test_split method and specify how large the test split should be. In this example, we've specified that the test set should be 10% of the total dataset size. You can see that the output of train_test_split is a DatasetDict object, whose keys correspond to the new splits. Now that we know how to shuffle a dataset, let's take a look at returning the rows we're interested in. The most common way to do this is with the select method. This method expects a list or generator of the dataset's indices, and will then return a new Dataset object containing just those rows. If you want to create a random sample of rows, you can do this by chaining the shuffle and select methods together. In this example, we've created a sample of 5 elements from the SQuAD dataset. The last way to pick out specific rows in a dataset is by applying the filter method. This method checks whether each rows fulfills some condition or not. For example, here we've created a small lambda function that checks whether the title starts with the letter \"L\". Once we apply this function with the filter method, we get a subset of the data consisting of just these titles. So far we've been talking about the rows of a dataset, but what about the columns? The Datasets library has two main methods for transforming columns: a rename_column method to change the name of a column, and a remove_columns method to delete them. You can see examples of both these method here. Some datasets have nested columns and you can expand these by applying the flatten method. For example in the SQUAD dataset, the answers column contains a text and answer_start field. If we want to promote them to their own separate columns, we can apply flatten as shown here. Of course, no discussion of the Datasets library would be complete without mentioning the famous map method. This method applies a custom processing function to each row in the dataset. For example,here we first define a lowercase_title function that simply lowercases the text in the title column and then we feed that to the map method and voila! we now have lowercase titles. The map method can also be used to feed batches of rows to the processing function. This is especially useful for tokenization, where the tokenizers are backed by the Tokenizers library can use fast multithreading to process batches in parallel.", "source": "huggingface_doc", "domain": "software" }, { "text": "!--Copyright 2022 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with\nthe License. You may obtain a copy of the License at\n\nhttp://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software distributed under the License is distributed on\nan \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the\nspecific language governing permissions and limitations under the License.\n\nāš ļø Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be\nrendered properly in your Markdown viewer.\n\n-->\n\n# Share a model\n\nThe last two tutorials showed how you can fine-tune a model with PyTorch, Keras, and šŸ¤— Accelerate for distributed setups. The next step is to share your model with the community! At Hugging Face, we believe in openly sharing knowledge and resources to democratize artificial intelligence for everyone. We encourage you to consider sharing your model with the community to help others save time and resources.\n\nIn this tutorial, you will learn two methods for sharing a trained or fine-tuned model on the [Model Hub](https://huggingface.co/models):\n\n- Programmatically push your files to the Hub.\n- Drag-and-drop your files to the Hub with the web interface.\n\n\n\n\n\nTo share a model with the community, you need an account on [huggingface.co](https://huggingface.co/join). You can also join an existing organization or create a new one.\n\n\n\n## Repository features\n\nEach repository on the Model Hub behaves like a typical GitHub repository. Our repositories offer versioning, commit history, and the ability to visualize differences.\n\nThe Model Hub's built-in versioning is based on git and [git-lfs](https://git-lfs.github.com/). In other words, you can treat one model as one repository, enabling greater access control and scalability. Version control allows *revisions*, a method for pinning a specific version of a model with a commit hash, tag or branch.\n\nAs a result, you can load a specific model version with the `revision` parameter:\n\n```py\n>>> model = AutoModel.from_pretrained(\n... \"julien-c/EsperBERTo-small\", revision=\"v2.0.1\" # tag name, or branch name, or commit hash\n... )\n```\n\nFiles are also easily edited in a repository, and you can view the commit history as well as the difference:\n\n![vis_diff](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/vis_diff.png)\n\n## Setup\n\nBefore sharing a model to the Hub, you will need your Hugging Face credentials. If you have access to a terminal, run the following command in the virtual environment where šŸ¤— Transformers is installed. This will store your access token in your Hugging Face cache folder (`~/.cache/` by default):\n\n```bash\nhuggingface-cli login\n```\n\nIf you are using a notebook like Jupyter or Colaboratory, make sure you have the [`huggingface_hub`](https://huggingface.co/docs/hub/adding-a-library) library installed. This library allows you to programmatically interact with the Hub.\n\n```bash\npip install huggingface_hub\n```\n\nThen use `notebook_login` to sign-in to the Hub, and follow the link [here](https://huggingface.co/settings/token) to generate a token to login with:\n\n```py\n>>> from huggingface_hub import notebook_login\n\n>>> notebook_login()\n```\n\n## Convert a model for all frameworks\n\nTo ensure your model can be used by someone working with a different framework, we recommend you convert and upload your model with both PyTorch and TensorFlow checkpoints. While users are still able to load your model from a different framework if you skip this step, it will be slower because šŸ¤— Transformers will need to convert the checkpoint on-the-fly.\n\nConverting a checkpoint for another framework is easy. Make sure you have PyTorch and TensorFlow installed (see [here](installation) for installation instructions), and then find the specific model for your task in the other framework. \n\n\n\nSpecify `from_tf=True` to convert a checkpoint from TensorFlow to PyTorch:\n\n```py\n>>> pt_model = DistilBertForSequenceClassification.from_pretrained(\"path/to/awesome-name-you-picked\", from_tf=True)\n>>> pt_model.save_pretrained(\"path/to/awesome-name-you-picked\")\n```\n\n\nSpecify `from_pt=True` to convert a checkpoint from PyTorch to TensorFlow:\n\n```py\n>>> tf_model = TFDistilBertForSequenceClassification.from_pretrained(\"path/to/awesome-name-you-picked\", from_pt=True)\n```\n\nThen you can save your new TensorFlow model with its new checkpoint:\n\n```py\n>>> tf_model.save_pretrained(\"path/to/awesome-name-you-picked\")\n```\n\n\nIf a model is available in Flax, you can also convert a checkpoint from PyTorch to Flax:\n\n```py\n>>> flax_model = FlaxDistilBertForSequenceClassification.from_pretrained(\n... \"path/to/awesome-name-you-picked\", from_pt=True\n... )\n```\n\n\n\n## Push a model during training\n\n\n\n\n\nSharing a model to the Hub is as simple as adding an extra parameter or callback. Remember from the [fine-tuning tutorial](training), the [`TrainingArguments`] class is where you specify hyperparameters and additional training options. One of these training options includes the ability to push a model directly to the Hub. Set `push_to_hub=True` in your [`TrainingArguments`]:\n\n```py\n>>> training_args = TrainingArguments(output_dir=\"my-awesome-model\", push_to_hub=True)\n```\n\nPass your training arguments as usual to [`Trainer`]:\n\n```py\n>>> trainer = Trainer(\n... model=model,\n... args=training_args,\n... train_dataset=small_train_dataset,\n... eval_dataset=small_eval_dataset,\n... compute_metrics=compute_metrics,\n... )\n```\n\nAfter you fine-tune your model, call [`~transformers.Trainer.push_to_hub`] on [`Trainer`] to push the trained model to the Hub. šŸ¤— Transformers will even automatically add training hyperparameters, training results and framework versions to your model card!\n\n```py\n>>> trainer.push_to_hub()\n```\n\n\nShare a model to the Hub with [`PushToHubCallback`]. In the [`PushToHubCallback`] function, add:\n\n- An output directory for your model.\n- A tokenizer.\n- The `hub_model_id`, which is your Hub username and model name.\n\n```py\n>>> from transformers import PushToHubCallback\n\n>>> push_to_hub_callback = PushToHubCallback(\n... output_dir=\"./your_model_save_path\", tokenizer=tokenizer, hub_model_id=\"your-username/my-awesome-model\"\n... )\n```\n\nAdd the callback to [`fit`](https://keras.io/api/models/model_training_apis/), and šŸ¤— Transformers will push the trained model to the Hub:\n\n```py\n>>> model.fit(tf_train_dataset, validation_data=tf_validation_dataset, epochs=3, callbacks=push_to_hub_callback)\n```\n\n\n\n## Use the `push_to_hub` function\n\nYou can also call `push_to_hub` directly on your model to upload it to the Hub.\n\nSpecify your model name in `push_to_hub`:\n\n```py\n>>> pt_model.push_to_hub(\"my-awesome-model\")\n```\n\nThis creates a repository under your username with the model name `my-awesome-model`. Users can now load your model with the `from_pretrained` function:\n\n```py\n>>> from transformers import AutoModel\n\n>>> model = AutoModel.from_pretrained(\"your_username/my-awesome-model\")\n```\n\nIf you belong to an organization and want to push your model under the organization name instead, just add it to the `repo_id`:\n\n```py\n>>> pt_model.push_to_hub(\"my-awesome-org/my-awesome-model\")\n```\n\nThe `push_to_hub` function can also be used to add other files to a model repository. For example, add a tokenizer to a model repository:\n\n```py\n>>> tokenizer.push_to_hub(\"my-awesome-model\")\n```\n\nOr perhaps you'd like to add the TensorFlow version of your fine-tuned PyTorch model:\n\n```py\n>>> tf_model.push_to_hub(\"my-awesome-model\")\n```\n\nNow when you navigate to your Hugging Face profile, you should see your newly created model repository. Clicking on the **Files** tab will display all the files you've uploaded to the repository.\n\nFor more details on how to create and upload files to a repository, refer to the Hub documentation [here](https://huggingface.co/docs/hub/how-to-upstream).\n\n## Upload with the web interface\n\nUsers who prefer a no-code approach are able to upload a model through the Hub's web interface. Visit [huggingface.co/new](https://huggingface.co/new) to create a new repository:\n\n![new_model_repo](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/new_model_repo.png)\n\nFrom here, add some information about your model:\n\n- Select the **owner** of the repository. This can be yourself or any of the organizations you belong to.\n- Pick a name for your model, which will also be the repository name.\n- Choose whether your model is public or private.\n- Specify the license usage for your model.\n\nNow click on the **Files** tab and click on the **Add file** button to upload a new file to your repository. Then drag-and-drop a file to upload and add a commit message.\n\n![upload_file](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/upload_file.png)\n\n## Add a model card\n\nTo make sure users understand your model's capabilities, limitations, potential biases and ethical considerations, please add a model card to your repository. The model card is defined in the `README.md` file. You can add a model card by:\n\n* Manually creating and uploading a `README.md` file.\n* Clicking on the **Edit model card** button in your model repository.\n\nTake a look at the DistilBert [model card](https://huggingface.co/distilbert-base-uncased) for a good example of the type of information a model card should include. For more details about other options you can control in the `README.md` file such as a model's carbon footprint or widget examples, refer to the documentation [here](https://huggingface.co/docs/hub/models-cards).", "source": "huggingface_doc", "domain": "software" }, { "text": "--\ntitle: MAPE\nemoji: šŸ¤— \ncolorFrom: blue\ncolorTo: red\nsdk: gradio\nsdk_version: 3.19.1\napp_file: app.py\npinned: false\ntags:\n- evaluate\n- metric\ndescription: >-\n Mean Absolute Percentage Error (MAPE) is the mean percentage error difference between the predicted and actual\n values.\n---\n\n# Metric Card for MAPE\n\n## Metric Description\n\nMean Absolute Error (MAPE) is the mean of the percentage error of difference between the predicted $x_i$ and actual $y_i$ numeric values:\n![image](https://user-images.githubusercontent.com/8100/200005316-c3975d32-8978-40f3-b541-c2ef57ec7c5b.png)\n\n## How to Use\n\nAt minimum, this metric requires predictions and references as inputs.\n\n```python\n>>> mape_metric = evaluate.load(\"mape\")\n>>> predictions = [2.5, 0.0, 2, 8]\n>>> references = [3, -0.5, 2, 7]\n>>> results = mape_metric.compute(predictions=predictions, references=references)\n```\n\n### Inputs\n\nMandatory inputs: \n- `predictions`: numeric array-like of shape (`n_samples,`) or (`n_samples`, `n_outputs`), representing the estimated target values.\n- `references`: numeric array-like of shape (`n_samples,`) or (`n_samples`, `n_outputs`), representing the ground truth (correct) target values.\n\nOptional arguments:\n- `sample_weight`: numeric array-like of shape (`n_samples,`) representing sample weights. The default is `None`.\n- `multioutput`: `raw_values`, `uniform_average` or numeric array-like of shape (`n_outputs,`), which defines the aggregation of multiple output values. The default value is `uniform_average`.\n - `raw_values` returns a full set of errors in case of multioutput input.\n - `uniform_average` means that the errors of all outputs are averaged with uniform weight. \n - the array-like value defines weights used to average errors.\n\n### Output Values\nThis metric outputs a dictionary, containing the mean absolute error score, which is of type:\n- `float`: if multioutput is `uniform_average` or an ndarray of weights, then the weighted average of all output errors is returned.\n- numeric array-like of shape (`n_outputs,`): if multioutput is `raw_values`, then the score is returned for each output separately. \n\nEach MAPE `float` value is postive with the best value being 0.0.\n\nOutput Example(s):\n```python\n{'mape': 0.5}\n```\n\nIf `multioutput=\"raw_values\"`:\n```python\n{'mape': array([0.5, 1. ])}\n```\n\n#### Values from Popular Papers\n\n### Examples\n\nExample with the `uniform_average` config:\n```python\n>>> mape_metric = evaluate.load(\"mape\")\n>>> predictions = [2.5, 0.0, 2, 8]\n>>> references = [3, -0.5, 2, 7]\n>>> results = mape_metric.compute(predictions=predictions, references=references)\n>>> print(results)\n{'mape': 0.3273...}\n```\n\nExample with multi-dimensional lists, and the `raw_values` config:\n```python\n>>> mape_metric = evaluate.load(\"mape\", \"multilist\")\n>>> predictions = [[0.5, 1], [-1, 1], [7, -6]]\n>>> references = [[0.1, 2], [-1, 2], [8, -5]]\n>>> results = mape_metric.compute(predictions=predictions, references=references)\n>>> print(results)\n{'mape': 0.8874...}\n>>> results = mape_metric.compute(predictions=predictions, references=references, multioutput='raw_values')\n>>> print(results)\n{'mape': array([1.3749..., 0.4])}\n```\n\n## Limitations and Bias\nOne limitation of MAPE is that it cannot be used if the ground truth is zero or close to zero. This metric is also asymmetric in that it puts a heavier penalty on predictions less than the ground truth and a smaller penalty on predictions bigger than the ground truth and thus can lead to a bias of methods being select which under-predict if selected via this metric.\n\n## Citation(s)\n```bibtex\n@article{scikit-learn,\n title={Scikit-learn: Machine Learning in {P}ython},\n author={Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V.\n and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P.\n and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and\n Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.},\n journal={Journal of Machine Learning Research},\n volume={12},\n pages={2825--2830},\n year={2011}\n}\n```\n\n```bibtex\n@article{DEMYTTENAERE201638,\n title = {Mean Absolute Percentage Error for regression models},\n journal = {Neurocomputing},\n volume = {192},\n pages = {38--48},\n year = {2016},\n note = {Advances in artificial neural networks, machine learning and computational intelligence},\n issn = {0925-2312},\n doi = {https://doi.org/10.1016/j.neucom.2015.12.114},\n url = {https://www.sciencedirect.com/science/article/pii/S0925231216003325},\n author = {Arnaud {de Myttenaere} and Boris Golden and BĆ©nĆ©dicte {Le Grand} and Fabrice Rossi},\n}\n```\n\n## Further References\n- [Mean absolute percentage error - Wikipedia](https://en.wikipedia.org/wiki/Mean_absolute_percentage_error)", "source": "huggingface_doc", "domain": "software" }, { "text": "# Ensemble Adversarial Inception ResNet v2\n\n**Inception-ResNet-v2** is a convolutional neural architecture that builds on the Inception family of architectures but incorporates [residual connections](https://paperswithcode.com/method/residual-connection) (replacing the filter concatenation stage of the Inception architecture).\n\nThis particular model was trained for study of adversarial examples (adversarial training).\n\nThe weights from this model were ported from [Tensorflow/Models](https://github.com/tensorflow/models).\n\n## How do I use this model on an image?\nTo load a pretrained model:\n\n```python\nimport timm\nmodel = timm.create_model('ens_adv_inception_resnet_v2', pretrained=True)\nmodel.eval()\n```\n\nTo load and preprocess the image:\n```python \nimport urllib\nfrom PIL import Image\nfrom timm.data import resolve_data_config\nfrom timm.data.transforms_factory import create_transform\n\nconfig = resolve_data_config({}, model=model)\ntransform = create_transform(**config)\n\nurl, filename = (\"https://github.com/pytorch/hub/raw/master/images/dog.jpg\", \"dog.jpg\")\nurllib.request.urlretrieve(url, filename)\nimg = Image.open(filename).convert('RGB')\ntensor = transform(img).unsqueeze(0) # transform and add batch dimension\n```\n\nTo get the model predictions:\n```python\nimport torch\nwith torch.no_grad():\n out = model(tensor)\nprobabilities = torch.nn.functional.softmax(out[0], dim=0)\nprint(probabilities.shape)\n# prints: torch.Size([1000])\n```\n\nTo get the top-5 predictions class names:\n```python\n# Get imagenet class mappings\nurl, filename = (\"https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt\", \"imagenet_classes.txt\")\nurllib.request.urlretrieve(url, filename) \nwith open(\"imagenet_classes.txt\", \"r\") as f:\n categories = [s.strip() for s in f.readlines()]\n\n# Print top categories per image\ntop5_prob, top5_catid = torch.topk(probabilities, 5)\nfor i in range(top5_prob.size(0)):\n print(categories[top5_catid[i]], top5_prob[i].item())\n# prints class names and probabilities like:\n# [('Samoyed', 0.6425196528434753), ('Pomeranian', 0.04062102362513542), ('keeshond', 0.03186424449086189), ('white wolf', 0.01739676296710968), ('Eskimo dog', 0.011717947199940681)]\n```\n\nReplace the model name with the variant you want to use, e.g. `ens_adv_inception_resnet_v2`. You can find the IDs in the model summaries at the top of this page.\n\nTo extract image features with this model, follow the [timm feature extraction examples](https://rwightman.github.io/pytorch-image-models/feature_extraction/), just change the name of the model you want to use.\n\n## How do I finetune this model?\nYou can finetune any of the pre-trained models just by changing the classifier (the last layer).\n```python\nmodel = timm.create_model('ens_adv_inception_resnet_v2', pretrained=True, num_classes=NUM_FINETUNE_CLASSES)\n```\nTo finetune on your own dataset, you have to write a training loop or adapt [timm's training\nscript](https://github.com/rwightman/pytorch-image-models/blob/master/train.py) to use your dataset.\n\n## How do I train this model?\n\nYou can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.\n\n## Citation\n\n```BibTeX\n@article{DBLP:journals/corr/abs-1804-00097,\n author = {Alexey Kurakin and\n Ian J. Goodfellow and\n Samy Bengio and\n Yinpeng Dong and\n Fangzhou Liao and\n Ming Liang and\n Tianyu Pang and\n Jun Zhu and\n Xiaolin Hu and\n Cihang Xie and\n Jianyu Wang and\n Zhishuai Zhang and\n Zhou Ren and\n Alan L. Yuille and\n Sangxia Huang and\n Yao Zhao and\n Yuzhe Zhao and\n Zhonglin Han and\n Junjiajia Long and\n Yerkebulan Berdibekov and\n Takuya Akiba and\n Seiya Tokui and\n Motoki Abe},\n title = {Adversarial Attacks and Defences Competition},\n journal = {CoRR},\n volume = {abs/1804.00097},\n year = {2018},\n url = {http://arxiv.org/abs/1804.00097},\n archivePrefix = {arXiv},\n eprint = {1804.00097},\n timestamp = {Thu, 31 Oct 2019 16:31:22 +0100},\n biburl = {https://dblp.org/rec/journals/corr/abs-1804-00097.bib},\n bibsource = {dblp computer science bibliography, https://dblp.org}\n}\n```\n\n", "source": "huggingface_doc", "domain": "software" }, { "text": "he Hugging Face Datasets library: A Quick overview. The Hugging Face Datasets library is a library that provides an API to quickly download many public datasets and preprocess them. In this video we will explore how to do that. The downloading part is easy: with the load_dataset function, you can directly download and cache a dataset from its identifier on the Dataset hub. Here we fetch the MRPC dataset from the GLUE benchmark, which is a dataset containing pairs of sentences where the task is to determine the paraphrases. The object returned by the load_dataset function is a DatasetDict, which is a sort of dictionary containing each split of our dataset. We can access each split by indexing with its name. This split is then an instance of the Dataset class, with columns (here sentence1, sentence2. label and idx) and rows. We can access a given element by its index. The amazing thing about the Hugging Face Datasets library is that everything is saved to disk using Apache Arrow, which means that even if your dataset is huge you won't get out of RAM: only the elements you request are loaded in memory. Accessing a slice of your dataset is as easy as one element. The result is then a dictionary with list of values for each keys (here the list of labels, the list of first sentences and the list of second sentences). The features attribute of a Dataset gives us more information about its columns. In particular, we can see here it gives us the correspondence between the integers and names for the labels. 0 stands for not equivalent and 1 for equivalent. To preprocess all the elements of our dataset, we need to tokenize them. Have a look at the video \"Preprocess sentence pairs\" for a refresher, but you just have to send the two sentences to the tokenizer with some additional keyword arguments. Here we indicate a maximum length of 128 and pad inputs shorter than this length, truncate inputs that are longer. We put all of this in a tokenize_function that we can directly apply to all the splits in our dataset with the map method. As long as the function returns a dictionary-like object, the map method will add new columns as needed or update existing ones. To speed up preprocessing and take advantage of the fact our tokenizer is backed by Rust thanks to the Hugging Face Tokenizers library, we can process several elements at the same time to our tokenize function, using the batched=True argument. Since the tokenizer can handle list of first/second sentences, the tokenize_function does not need to change for this. You can also use multiprocessing with the map method, check out its documentation! Once this is done, we are almost ready for training: we just remove the columns we don't need anymore with the remove_columns method, rename label to labels (since the models from Hugging Face Transformers expect that) and set the output format to our desired backend: torch, tensorflow or numpy. If needed, we can also generate a short sample of a dataset using the select method.", "source": "huggingface_doc", "domain": "software" }, { "text": "--\ntitle: \"Why we’re switching to Hugging Face Inference Endpoints, and maybe you should too\"\nthumbnail: /blog/assets/78_ml_director_insights/mantis1.png\nauthors:\n- user: mattupson\n guest: true\n---\n\n# Why we’re switching to Hugging Face Inference Endpoints, and maybe you should too\n\nHugging Face recently launched [Inference Endpoints](https://huggingface.co/inference-endpoints); which as they put it: solves transformers in production. Inference Endpoints is a managed service that allows you to:\n\n- Deploy (almost) any model on Hugging Face Hub\n- To any cloud (AWS, and Azure, GCP on the way)\n- On a range of instance types (including GPU)\n- We’re switching some of our Machine Learning (ML) models that do inference on a CPU to this new service. This blog is about why, and why you might also want to consider it.\n\n## What were we doing?\n\nThe models that we have switched over to Inference Endpoints were previously managed internally and were running on AWS [Elastic Container Service](https://aws.amazon.com/ecs/) (ECS) backed by [AWS Fargate](https://aws.amazon.com/fargate/). This gives you a serverless cluster which can run container based tasks. Our process was as follows:\n\n- Train model on a GPU instance (provisioned by [CML](https://cml.dev/), trained with [transformers](https://huggingface.co/docs/transformers/main/))\n- Upload to [Hugging Face Hub](https://huggingface.co/models)\n- Build API to serve model [(FastAPI)](https://fastapi.tiangolo.com/)\n- Wrap API in container [(Docker)](https://www.docker.com/)\n- Upload container to AWS [Elastic Container Repository](https://aws.amazon.com/ecr/) (ECR)\n- Deploy model to ECS Cluster\n\nNow, you can reasonably argue that ECS was not the best approach to serving ML models, but it served us up until now, and also allowed ML models to sit alongside other container based services, so it reduced cognitive load.\n\n## What do we do now?\n\nWith Inference Endpoints, our flow looks like this:\n\n- Train model on a GPU instance (provisioned by [CML](https://cml.dev/), trained with [transformers](https://huggingface.co/docs/transformers/main/))\n- Upload to [Hugging Face Hub](https://huggingface.co/models)\n- Deploy using Hugging Face Inference Endpoints.\n\nSo this is significantly easier. We could also use another managed service such as [SageMaker](https://aws.amazon.com/es/sagemaker/), [Seldon](https://www.seldon.io/), or [Bento ML](https://www.bentoml.com/), etc., but since we are already uploading our model to Hugging Face hub to act as a model registry, and we’re pretty invested in Hugging Face’s other tools (like transformers, and [AutoTrain](https://huggingface.co/autotrain)) using Inference Endpoints makes a lot of sense for us.\n\n## What about Latency and Stability?\n\nBefore switching to Inference Endpoints we tested different CPU endpoints types using [ab](https://httpd.apache.org/docs/2.4/programs/ab.html).\n\nFor ECS we didn’t test so extensively, but we know that a large container had a latency of about ~200ms from an instance in the same region. The tests we did for Inference Endpoints we based on text classification model fine tuned on [RoBERTa](https://huggingface.co/roberta-base) with the following test parameters:\n\n- Requester region: eu-east-1\n- Requester instance size: t3-medium\n- Inference endpoint region: eu-east-1\n- Endpoint Replicas: 1\n- Concurrent connections: 1\n- Requests: 1000 (1000 requests in 1–2 minutes even from a single connection would represent very heavy use for this particular application)\n\nThe following table shows latency (ms ± standard deviation and time to complete test in seconds) for four Intel Ice Lake equipped CPU endpoints.\n\n```bash\nsize | vCPU (cores) | Memory (GB) | ECS (ms) | šŸ¤— (ms)\n----------------------------------------------------------------------\nsmall | 1 | 2 | _ | ~ 296 \nmedium | 2 | 4 | _ | 156 ± 51 (158s) \nlarge | 4 | 8 | ~200 | 80 ± 30 (80s) \nxlarge | 8 | 16 | _ | 43 ± 31 (43s) \n```\nWhat we see from these results is pretty encouraging. The application that will consume these endpoints serves requests in real time, so we need as low latency as possible. We can see that the vanilla Hugging Face container was more than twice as fast as our bespoke container run on ECS — the slowest response we received from the large Inference Endpoint was just 108ms.\n\n## What about the cost?\n\nSo how much does this all cost? The table below shows a price comparison for what we were doing previously (ECS + Fargate) and using Inference Endpoints.\n\n```bash\nsize | vCPU | Memory (GB) | ECS | šŸ¤— | % diff\n----------------------------------------------------------------------\nsmall | 1 | 2 | $ 33.18 | $ 43.80 | 0.24\nmedium | 2 | 4 | $ 60.38 | $ 87.61 | 0.31 \nlarge | 4 | 8 | $ 114.78 | $ 175.22 | 0.34\nxlarge | 8 | 16 | $ 223.59 | $ 350.44 | 0.5 \n```\n\nWe can say a couple of things about this. Firstly, we want a managed solution to deployment, we don’t have a dedicated MLOPs team (yet), so we’re looking for a solution that helps us minimize the time we spend on deploying models, even if it costs a little more than handling the deployments ourselves.\n\nInference Endpoints are more expensive that what we were doing before, there’s an increased cost of between 24% and 50%. At the scale we’re currently operating, this additional cost, a difference of ~$60 a month for a large CPU instance is nothing compared to the time and cognitive load we are saving by not having to worry about APIs, and containers. If we were deploying 100s of ML microservices we would probably want to think again, but that is probably true of many approaches to hosting.\n\n## Some notes and caveats:\n\n- You can find pricing for Inference Endpoints [here](https://huggingface.co/pricing#endpoints), but a different number is displayed when you deploy a new endpoint from the [GUI](https://ui.endpoints.huggingface.co/new). I’ve used the latter, which is higher.\n- The values that I present in the table for ECS + Fargate are an underestimate, but probably not by much. I extracted them from the [fargate pricing page](https://aws.amazon.com/fargate/pricing/) and it includes just the cost of hosting the instance. I’m not including the data ingress/egress (probably the biggest thing is downloading the model from Hugging Face hub), nor have I included the costs related to ECR.\n\n## Other considerations\n\n### Deployment Options\n\nCurrently you can deploy an Inference Endpoint from the [GUI](https://ui.endpoints.huggingface.co/new) or using a [RESTful API](https://huggingface.co/docs/inference-endpoints/api_reference). You can also make use of our command line tool [hugie](https://github.com/MantisAI/hfie) (which will be the subject of a future blog) to launch Inference Endpoints in one line of code by passing a configuration, it’s really this simple:\n\n```bash\nhugie endpoint create example/development.json\n```\n\nFor me, what’s lacking is a [custom terraform provider](https://www.hashicorp.com/blog/writing-custom-terraform-providers). It’s all well and good deploying an inference endpoint from a [GitHub action](https://github.com/features/actions) using hugie, as we do, but it would be better if we could use the awesome state machine that is terraform to keep track of these. I’m pretty sure that someone (if not Hugging Face) will write one soon enough — if not, we will.\n\n### Hosting multiple models on a single endpoint\n\nPhilipp Schmid posted a really nice blog about how to write a custom [Endpoint Handler](https://www.philschmid.de/multi-model-inference-endpoints) class to allow you to host multiple models on a single endpoint, potentially saving you quite a bit of money. His blog was about GPU inference, and the only real limitation is how many models you can fit into the GPU memory. I assume this will also work for CPU instances, though I’ve not tried yet.\n\n## To conclude…\n\nWe find Hugging Face Inference Endpoints to be a very simple and convenient way to deploy transformer (and [sklearn](https://huggingface.co/scikit-learn)) models into an endpoint so they can be consumed by an application. Whilst they cost a little more than the ECS approach we were using before, it’s well worth it because it saves us time on thinking about deployment, we can concentrate on the thing we want to: building NLP solutions for our clients to help solve their problems.\n\n_If you’re interested in Hugging Face Inference Endpoints for your company, please contact us [here](https://huggingface.co/inference-endpoints/enterprise) - our team will contact you to discuss your requirements!_\n\n_This article was originally published on February 15, 2023 [in Medium](https://medium.com/mantisnlp/why-were-switching-to-hugging-face-inference-endpoints-and-maybe-you-should-too-829371dcd330)._", "source": "huggingface_doc", "domain": "software" }, { "text": "--\ntitle: ROUGE\nemoji: šŸ¤— \ncolorFrom: blue\ncolorTo: red\nsdk: gradio\nsdk_version: 3.19.1\napp_file: app.py\npinned: false\ntags:\n- evaluate\n- metric\ndescription: >-\n ROUGE, or Recall-Oriented Understudy for Gisting Evaluation, is a set of metrics and a software package used for\n evaluating automatic summarization and machine translation software in natural language processing.\n The metrics compare an automatically produced summary or translation against a reference or a set of references (human-produced) summary or translation.\n \n Note that ROUGE is case insensitive, meaning that upper case letters are treated the same way as lower case letters.\n \n This metrics is a wrapper around Google Research reimplementation of ROUGE:\n https://github.com/google-research/google-research/tree/master/rouge\n---\n\n# Metric Card for ROUGE\n\n## Metric Description\nROUGE, or Recall-Oriented Understudy for Gisting Evaluation, is a set of metrics and a software package used for evaluating automatic summarization and machine translation software in natural language processing. The metrics compare an automatically produced summary or translation against a reference or a set of references (human-produced) summary or translation.\n\nNote that ROUGE is case insensitive, meaning that upper case letters are treated the same way as lower case letters.\n\nThis metrics is a wrapper around the [Google Research reimplementation of ROUGE](https://github.com/google-research/google-research/tree/master/rouge)\n\n## How to Use\nAt minimum, this metric takes as input a list of predictions and a list of references:\n```python\n>>> rouge = evaluate.load('rouge')\n>>> predictions = [\"hello there\", \"general kenobi\"]\n>>> references = [\"hello there\", \"general kenobi\"]\n>>> results = rouge.compute(predictions=predictions,\n... references=references)\n>>> print(results)\n{'rouge1': 1.0, 'rouge2': 1.0, 'rougeL': 1.0, 'rougeLsum': 1.0}\n```\n\nOne can also pass a custom tokenizer which is especially useful for non-latin languages.\n```python\n>>> results = rouge.compute(predictions=predictions,\n... references=references,\n tokenizer=lambda x: x.split())\n>>> print(results)\n{'rouge1': 1.0, 'rouge2': 1.0, 'rougeL': 1.0, 'rougeLsum': 1.0}\n```\n\nIt can also deal with lists of references for each predictions:\n```python\n>>> rouge = evaluate.load('rouge')\n>>> predictions = [\"hello there\", \"general kenobi\"]\n>>> references = [[\"hello\", \"there\"], [\"general kenobi\", \"general yoda\"]]\n>>> results = rouge.compute(predictions=predictions,\n... references=references)\n>>> print(results)\n{'rouge1': 0.8333, 'rouge2': 0.5, 'rougeL': 0.8333, 'rougeLsum': 0.8333}```\n```\n\n### Inputs\n- **predictions** (`list`): list of predictions to score. Each prediction\n should be a string with tokens separated by spaces.\n- **references** (`list` or `list[list]`): list of reference for each prediction or a list of several references per prediction. Each\n reference should be a string with tokens separated by spaces.\n- **rouge_types** (`list`): A list of rouge types to calculate. Defaults to `['rouge1', 'rouge2', 'rougeL', 'rougeLsum']`.\n - Valid rouge types:\n - `\"rouge1\"`: unigram (1-gram) based scoring\n - `\"rouge2\"`: bigram (2-gram) based scoring\n - `\"rougeL\"`: Longest common subsequence based scoring.\n - `\"rougeLSum\"`: splits text using `\"\\n\"`\n - See [here](https://github.com/huggingface/datasets/issues/617) for more information\n- **use_aggregator** (`boolean`): If True, returns aggregates. Defaults to `True`.\n- **use_stemmer** (`boolean`): If `True`, uses Porter stemmer to strip word suffixes. Defaults to `False`.\n\n### Output Values\nThe output is a dictionary with one entry for each rouge type in the input list `rouge_types`. If `use_aggregator=False`, each dictionary entry is a list of scores, with one score for each sentence. E.g. if `rouge_types=['rouge1', 'rouge2']` and `use_aggregator=False`, the output is:\n\n```python\n{'rouge1': [0.6666666666666666, 1.0], 'rouge2': [0.0, 1.0]}\n```\n\nIf `rouge_types=['rouge1', 'rouge2']` and `use_aggregator=True`, the output is of the following format:\n```python\n{'rouge1': 1.0, 'rouge2': 1.0}\n```\n\nThe ROUGE values are in the range of 0 to 1.\n\n#### Values from Popular Papers\n\n### Examples\nAn example without aggregation:\n```python\n>>> rouge = evaluate.load('rouge')\n>>> predictions = [\"hello goodbye\", \"ankh morpork\"]\n>>> references = [\"goodbye\", \"general kenobi\"]\n>>> results = rouge.compute(predictions=predictions,\n... references=references,\n... use_aggregator=False)\n>>> print(list(results.keys()))\n['rouge1', 'rouge2', 'rougeL', 'rougeLsum']\n>>> print(results[\"rouge1\"])\n[0.5, 0.0]\n```\n\nThe same example, but with aggregation:\n```python\n>>> rouge = evaluate.load('rouge')\n>>> predictions = [\"hello goodbye\", \"ankh morpork\"]\n>>> references = [\"goodbye\", \"general kenobi\"]\n>>> results = rouge.compute(predictions=predictions,\n... references=references,\n... use_aggregator=True)\n>>> print(list(results.keys()))\n['rouge1', 'rouge2', 'rougeL', 'rougeLsum']\n>>> print(results[\"rouge1\"])\n0.25\n```\n\nThe same example, but only calculating `rouge_1`:\n```python\n>>> rouge = evaluate.load('rouge')\n>>> predictions = [\"hello goodbye\", \"ankh morpork\"]\n>>> references = [\"goodbye\", \"general kenobi\"]\n>>> results = rouge.compute(predictions=predictions,\n... references=references,\n... rouge_types=['rouge_1'],\n... use_aggregator=True)\n>>> print(list(results.keys()))\n['rouge1']\n>>> print(results[\"rouge1\"])\n0.25\n```\n\n## Limitations and Bias\nSee [Schluter (2017)](https://aclanthology.org/E17-2007/) for an in-depth discussion of many of ROUGE's limits.\n\n## Citation\n```bibtex\n@inproceedings{lin-2004-rouge,\n title = \"{ROUGE}: A Package for Automatic Evaluation of Summaries\",\n author = \"Lin, Chin-Yew\",\n booktitle = \"Text Summarization Branches Out\",\n month = jul,\n year = \"2004\",\n address = \"Barcelona, Spain\",\n publisher = \"Association for Computational Linguistics\",\n url = \"https://www.aclweb.org/anthology/W04-1013\",\n pages = \"74--81\",\n}\n```\n\n## Further References\n- This metrics is a wrapper around the [Google Research reimplementation of ROUGE](https://github.com/google-research/google-research/tree/master/rouge)", "source": "huggingface_doc", "domain": "software" }, { "text": "efore diving in character-based tokenization, understanding why this kind of tokenization is interesting requires understanding the flaws of word-based tokenization. If you haven't seen the first video on word-based tokenization we recommend you check it out before looking at this video. Let's take a look at character-based tokenization. We now split our text into individual characters, rather than words. There are generally a lot of different words in languages, while the number of characters stays low. Here for example, for the English language that has an estimated 170,000 different words, we would need a very large vocabulary to encompass all words. With a character-based vocabulary, we can get by with only 256 characters! Even languages with a lot of different characters like the Chinese languages have dictionaries with ~20,000 different characters but more than 375,000 different words. Character-based vocabularies let us fewer different tokens than the word-based tokenization dictionaries we would otherwise use. These vocabularies are also more complete than their word-based vocabularies counterparts. As our vocabulary contains all characters used in a language, even words unseen during the tokenizer training can still be tokenized, so out-of-vocabulary tokens will be less frequent. This includes the ability to correctly tokenize misspelled words, rather than discarding them as unknown straight away. However, this algorithm isn't perfect either! Intuitively, characters do not hold as much information individually as a word would hold. For example, \"Let's\" holds more information than \"l\". Of course, this is not true for all languages, as some languages like ideogram-based languages have a lot of information held in single characters, but for others like roman-based languages, the model will have to make sense of multiple tokens at a time to get the information held in a single word. This leads to another issue with character-based tokenizers: their sequences are translated into very large amount of tokens to be processed by the model. This can have an impact on the size of the context the model will carry around, and will reduce the size of the text we can use as input for our model. This tokenization, while it has some issues, has seen some very good results in the past and should be considered when approaching a new problem as it solves some issues encountered in the word-based algorithm.", "source": "huggingface_doc", "domain": "software" }, { "text": "Hands-on\n\nNow that you learned the basics of multi-agents, you're ready to train your first agents in a multi-agent system: **a 2vs2 soccer team that needs to beat the opponent team**.\n\nAnd you’re going to participate in AI vs. AI challenges where your trained agent will compete against other classmates’ **agents every day and be ranked on a new leaderboard.**\n\nTo validate this hands-on for the certification process, you just need to push a trained model. There **are no minimal results to attain to validate it.**\n\nFor more information about the certification process, check this section šŸ‘‰ [https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process](https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process)\n\nThis hands-on will be different since to get correct results **you need to train your agents from 4 hours to 8 hours**. And given the risk of timeout in Colab, we advise you to train on your computer. You don’t need a supercomputer: a simple laptop is good enough for this exercise.\n\nLet's get started! šŸ”„\n\n## What is AI vs. AI?\n\nAI vs. AI is an open-source tool we developed at Hugging Face to compete agents on the Hub against one another in a multi-agent setting. These models are then ranked in a leaderboard.\n\nThe idea of this tool is to have a robust evaluation tool: **by evaluating your agent with a lot of others, you’ll get a good idea of the quality of your policy.**\n\nMore precisely, AI vs. AI is three tools:\n\n- A *matchmaking process* defining the matches (which model against which) and running the model fights using a background task in the Space.\n- A *leaderboard* getting the match history results and displaying the models’ ELO ratings: [https://huggingface.co/spaces/huggingface-projects/AIvsAI-SoccerTwos](https://huggingface.co/spaces/huggingface-projects/AIvsAI-SoccerTwos)\n- A *Space demo* to visualize your agents playing against others: [https://huggingface.co/spaces/unity/ML-Agents-SoccerTwos](https://huggingface.co/spaces/unity/ML-Agents-SoccerTwos)\n\nIn addition to these three tools, your classmate cyllum created a šŸ¤— SoccerTwos Challenge Analytics where you can check the detailed match results of a model: [https://huggingface.co/spaces/cyllum/soccertwos-analytics](https://huggingface.co/spaces/cyllum/soccertwos-analytics)\n\nWe're [wrote a blog post to explain this AI vs. AI tool in detail](https://huggingface.co/blog/aivsai), but to give you the big picture it works this way:\n\n- Every four hours, our algorithm **fetches all the available models for a given environment (in our case ML-Agents-SoccerTwos).**\n- It creates a **queue of matches with the matchmaking algorithm.**\n- We simulate the match in a Unity headless process and **gather the match result** (1 if the first model won, 0.5 if it’s a draw, 0 if the second model won) in a Dataset.\n- Then, when all matches from the matches queue are done, **we update the ELO score for each model and update the leaderboard.**\n\n### Competition Rules\n\nThis first AI vs. AI competition **is an experiment**: the goal is to improve the tool in the future with your feedback. So some **breakups can happen during the challenge**. But don't worry\n**all the results are saved in a dataset so we can always restart the calculation correctly without losing information**.\n\nIn order for your model to get correctly evaluated against others you need to follow these rules:\n\n1. **You can't change the observation space or action space of the agent.** By doing that your model will not work during evaluation.\n2. You **can't use a custom trainer for now,** you need to use the Unity MLAgents ones.\n3. We provide executables to train your agents. You can also use the Unity Editor if you prefer **, but to avoid bugs, we advise that you use our executables**.\n\nWhat will make the difference during this challenge are **the hyperparameters you choose**.\n\nWe're constantly trying to improve our tutorials, soĀ **if you find some issues in this notebook**, pleaseĀ [open an issue on the GitHub Repo](https://github.com/huggingface/deep-rl-class/issues).\n\n### Chat with your classmates, share advice and ask questions on Discord\n\n- We created a new channel called `ai-vs-ai-challenge` to exchange advice and ask questions.\n- If you didn’t join the discord server yet, you can [join here](https://discord.gg/ydHrjt3WP5)\n\n## Step 0: Install MLAgents and download the correct executable\n\nWe advise you to use [conda](https://docs.conda.io/en/latest/) as a package manager and create a new environment.\n\nWith conda, we create a new environment called rl with **Python 3.10.12**:\n\n```bash\nconda create --name rl python=3.10.12\nconda activate rl\n```\n\nTo be able to train our agents correctly and push to the Hub, we need to install ML-Agents\n\n```bash\ngit clone https://github.com/Unity-Technologies/ml-agents\n```\n\nWhen the cloning is done (it takes 2.63 GB), we go inside the repository and install the package\n\n```bash\ncd ml-agents\npip install -e ./ml-agents-envs\npip install -e ./ml-agents\n```\n\nFinally, you need to install git-lfs: https://git-lfs.com/\n\nNow that it’s installed, we need to add the environment training executable. Based on your operating system you need to download one of them, unzip it and place it in a new folder inside `ml-agents` that you call `training-envs-executables`\n\nAt the end your executable should be in `ml-agents/training-envs-executables/SoccerTwos`\n\nWindows: Download [this executable](https://drive.google.com/file/d/1sqFxbEdTMubjVktnV4C6ICjp89wLhUcP/view?usp=sharing)\n\nLinux (Ubuntu): Download [this executable](https://drive.google.com/file/d/1KuqBKYiXiIcU4kNMqEzhgypuFP5_45CL/view?usp=sharing)\n\nMac: Download [this executable](https://drive.google.com/drive/folders/1h7YB0qwjoxxghApQdEUQmk95ZwIDxrPG?usp=share_link)\n⚠ For Mac you need also to call this `xattr -cr training-envs-executables/SoccerTwos/SoccerTwos.app` to be able to run SoccerTwos\n\n## Step 1: Understand the environment\n\nThe environment is called `SoccerTwos`. The Unity MLAgents Team made it. You can find its documentation [here](https://github.com/Unity-Technologies/ml-agents/blob/develop/docs/Learning-Environment-Examples.md#soccer-twos)\n\nThe goal in this environment **is to get the ball into the opponent's goal while preventing the ball from entering your own goal.**\n\n
\n\"SoccerTwos\"/\n\n
This environment was made by the Unity MLAgents Team
\n\n
\n\n### The reward function\n\nThe reward function is:\n\n\"SoccerTwos\n\n### The observation space\n\nThe observation space is composed of vectors of size 336:\n\n- 11 ray-casts forward distributed over 120 degrees (264 state dimensions)\n- 3 ray-casts backward distributed over 90 degrees (72 state dimensions)\n- Both of these ray-casts can detect 6 objects:\n - Ball\n - Blue Goal\n - Purple Goal\n - Wall\n - Blue Agent\n - Purple Agent\n\n### The action space\n\nThe action space is three discrete branches:\n\n\"SoccerTwos\n\n## Step 2: Understand MA-POCA\n\nWe know how to train agents to play against others: **we can use self-play.** This is a perfect technique for a 1vs1.\n\nBut in our case we’re 2vs2, and each team has 2 agents. How then can we **train cooperative behavior for groups of agents?**\n\nAs explained in the [Unity Blog](https://blog.unity.com/technology/ml-agents-v20-release-now-supports-training-complex-cooperative-behaviors), agents typically receive a reward as a group (+1 - penalty) when the team scores a goal. This implies that **every agent on the team is rewarded even if each agent didn’t contribute the same to the win**, which makes it difficult to learn what to do independently.\n\nThe Unity MLAgents team developed the solution in a new multi-agent trainer called *MA-POCA (Multi-Agent POsthumous Credit Assignment)*.\n\nThe idea is simple but powerful: a centralized critic **processes the states of all agents in the team to estimate how well each agent is doing**. Think of this critic as a coach.\n\nThis allows each agent to **make decisions based only on what it perceives locally**, and **simultaneously evaluate how good its behavior is in the context of the whole group**.\n\n
\n\"MA\n\n
This illustrates MA-POCA’s centralized learning and decentralized execution. Source: MLAgents Plays Dodgeball\n
\n\n
\n\nThe solution then is to use Self-Play with an MA-POCA trainer (called poca). The poca trainer will help us to train cooperative behavior and self-play to win against an opponent team.\n\nIf you want to dive deeper into this MA-POCA algorithm, you need to read the paper they published [here](https://arxiv.org/pdf/2111.05992.pdf) and the sources we put on the additional readings section.\n\n## Step 3: Define the config file\n\nWe already learned in [Unit 5](https://huggingface.co/deep-rl-course/unit5/introduction) that in ML-Agents, you define **the training hyperparameters in `config.yaml` files.**\n\nThere are multiple hyperparameters. To understand them better, you should read the explanations for each of them inĀ **[the documentation](https://github.com/Unity-Technologies/ml-agents/blob/release_20_docs/docs/Training-Configuration-File.md)**\n\nThe config file we’re going to use here is in `./config/poca/SoccerTwos.yaml`. It looks like this:\n\n```csharp\nbehaviors:\n SoccerTwos:\n trainer_type: poca\n hyperparameters:\n batch_size: 2048\n buffer_size: 20480\n learning_rate: 0.0003\n beta: 0.005\n epsilon: 0.2\n lambd: 0.95\n num_epoch: 3\n learning_rate_schedule: constant\n network_settings:\n normalize: false\n hidden_units: 512\n num_layers: 2\n vis_encode_type: simple\n reward_signals:\n extrinsic:\n gamma: 0.99\n strength: 1.0\n keep_checkpoints: 5\n max_steps: 5000000\n time_horizon: 1000\n summary_freq: 10000\n self_play:\n save_steps: 50000\n team_change: 200000\n swap_steps: 2000\n window: 10\n play_against_latest_model_ratio: 0.5\n initial_elo: 1200.0\n```\n\nCompared to Pyramids or SnowballTarget, we have new hyperparameters with a self-play part. How you modify them can be critical in getting good results.\n\nThe advice I can give you here is to check the explanation and recommended value for each parameters (especially self-play ones) againstĀ **[the documentation](https://github.com/Unity-Technologies/ml-agents/blob/release_20_docs/docs/Training-Configuration-File.md).**\n\nNow that you’ve modified our config file, you’re ready to train your agents.\n\n## Step 4: Start the training\n\nTo train the agents, we need toĀ **launch mlagents-learn and select the executable containing the environment.**\n\nWe define four parameters:\n\n1. `mlagents-learn `: the path where the hyperparameter config file is.\n2. `-env`: where the environment executable is.\n3. `-run_id`: the name you want to give to your training run id.\n4. `-no-graphics`: to not launch the visualization during the training.\n\nDepending on your hardware, 5M timesteps (the recommended value, but you can also try 10M) will take 5 to 8 hours of training. You can continue using your computer in the meantime, but I advise deactivating the computer standby mode to prevent the training from being stopped.\n\nDepending on the executable you use (windows, ubuntu, mac) the training command will look like this (your executable path can be different so don’t hesitate to check before running).\n\n```bash\nmlagents-learn ./config/poca/SoccerTwos.yaml --env=./training-envs-executables/SoccerTwos.exe --run-id=\"SoccerTwos\" --no-graphics\n```\n\nThe executable contains 8 copies of SoccerTwos.\n\nāš ļø It’s normal if you don’t see a big increase of ELO score (and even a decrease below 1200) before 2M timesteps, since your agents will spend most of their time moving randomly on the field before being able to goal.\n\nāš ļø You can stop the training with Ctrl + C but beware of typing this command only once to stop the training since MLAgents needs to generate a final .onnx file before closing the run.\n\n## Step 5: **Push the agent to the Hugging Face Hub**\n\nNow that we trained our agents, we’reĀ **ready to push them to the Hub to be able to participate in the AI vs. AI challenge and visualize them playing on your browseršŸ”„.**\n\nTo be able to share your model with the community, there are three more steps to follow:\n\n1ļøāƒ£ (If it’s not already done) create an account to HF āž”Ā [https://huggingface.co/join](https://huggingface.co/join)\n\n2ļøāƒ£ Sign in and store your authentication token from the Hugging Face website.\n\nCreate a new token (https://huggingface.co/settings/tokens)Ā **with write role**\n\n\"Create\n\nCopy the token, run this, and paste the token\n\n```bash\nhuggingface-cli login\n```\n\nThen, we need to run `mlagents-push-to-hf`.\n\nAnd we define four parameters:\n\n1. `-run-id`: the name of the training run id.\n2. `-local-dir`: where the agent was saved, it’s results/, so in my case results/First Training.\n3. `-repo-id`: the name of the Hugging Face repo you want to create or update. It’s always /\nIf the repo does not exist **it will be created automatically**\n4. `--commit-message`: since HF repos are git repositories you need to give a commit message.\n\nIn my case\n\n```bash\nmlagents-push-to-hf --run-id=\"SoccerTwos\" --local-dir=\"./results/SoccerTwos\" --repo-id=\"ThomasSimonini/poca-SoccerTwos\" --commit-message=\"First Push\"`\n```\n\n```bash\nmlagents-push-to-hf --run-id= # Add your run id --local-dir= # Your local dir --repo-id= # Your repo id --commit-message=\"First Push\"\n```\n\nIf everything worked you should see this at the end of the process (but with a different url šŸ˜†) :\n\nYour model is pushed to the Hub. You can view your model here: https://huggingface.co/ThomasSimonini/poca-SoccerTwos\n\nIt's the link to your model. It contains a model card that explains how to use it, your Tensorboard, and your config file. **What's awesome is that it's a git repository, which means you can have different commits, update your repository with a new push, etc.**\n\n## Step 6: Verify that your model is ready for AI vs AI Challenge\n\nNow that your model is pushed to the Hub, **it’s going to be added automatically to the AI vs AI Challenge model pool.** It can take a little bit of time before your model is added to the leaderboard given we do a run of matches every 4h.\n\nBut to ensure that everything works perfectly you need to check:\n\n1. That you have this tag in your model: ML-Agents-SoccerTwos. This is the tag we use to select models to be added to the challenge pool. To do that go to your model and check the tags\n\n\"Verify\"/\n\nIf it’s not the case you just need to modify the readme and add it\n\n\"Verify\"/\n\n2. That you have a `SoccerTwos.onnx` file\n\n\"Verify\"/\n\nWe strongly suggest that you create a new model when you push to the Hub if you want to train it again or train a new version.\n\n## Step 7: Visualize some match in our demo\n\nNow that your model is part of AI vs AI Challenge, **you can visualize how good it is compared to others**: https://huggingface.co/spaces/unity/ML-Agents-SoccerTwos\n\nIn order to do that, you just need to go to this demo:\n\n- Select your model as team blue (or team purple if you prefer) and another model to compete against. The best opponents to compare your model to are either whoever is on top of the leaderboard or the [baseline model](https://huggingface.co/unity/MLAgents-SoccerTwos)\n\nThe matches you see live are not used in the calculation of your result **but they are a good way to visualize how good your agent is**.\n\nAnd don't hesitate to share the best score your agent gets on discord in the #rl-i-made-this channel šŸ”„", "source": "huggingface_doc", "domain": "software" }, { "text": "Metric Card for F1\n\n## Metric Description\n\nThe F1 score is the harmonic mean of the precision and recall. It can be computed with the equation:\nF1 = 2 * (precision * recall) / (precision + recall)\n\n## How to Use\n\nAt minimum, this metric requires predictions and references as input\n\n```python\n>>> f1_metric = datasets.load_metric(\"f1\")\n>>> results = f1_metric.compute(predictions=[0, 1], references=[0, 1])\n>>> print(results)\n[\"{'f1': 1.0}\"]\n```\n\n### Inputs\n- **predictions** (`list` of `int`): Predicted labels.\n- **references** (`list` of `int`): Ground truth labels.\n- **labels** (`list` of `int`): The set of labels to include when `average` is not set to `'binary'`, and the order of the labels if `average` is `None`. Labels present in the data can be excluded, for example to calculate a multiclass average ignoring a majority negative class. Labels not present in the data will result in 0 components in a macro average. For multilabel targets, labels are column indices. By default, all labels in `predictions` and `references` are used in sorted order. Defaults to None.\n- **pos_label** (`int`): The class to be considered the positive class, in the case where `average` is set to `binary`. Defaults to 1.\n- **average** (`string`): This parameter is required for multiclass/multilabel targets. If set to `None`, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data. Defaults to `'binary'`.\n - 'binary': Only report results for the class specified by `pos_label`. This is applicable only if the classes found in `predictions` and `references` are binary.\n - 'micro': Calculate metrics globally by counting the total true positives, false negatives and false positives.\n - 'macro': Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.\n - 'weighted': Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters `'macro'` to account for label imbalance. This option can result in an F-score that is not between precision and recall.\n - 'samples': Calculate metrics for each instance, and find their average (only meaningful for multilabel classification).\n- **sample_weight** (`list` of `float`): Sample weights Defaults to None.\n\n### Output Values\n- **f1**(`float` or `array` of `float`): F1 score or list of f1 scores, depending on the value passed to `average`. Minimum possible value is 0. Maximum possible value is 1. Higher f1 scores are better.\n\nOutput Example(s):\n```python\n{'f1': 0.26666666666666666}\n```\n```python\n{'f1': array([0.8, 0.0, 0.0])}\n```\n\nThis metric outputs a dictionary, with either a single f1 score, of type `float`, or an array of f1 scores, with entries of type `float`.\n\n#### Values from Popular Papers\n\n### Examples\n\nExample 1-A simple binary example\n```python\n>>> f1_metric = datasets.load_metric(\"f1\")\n>>> results = f1_metric.compute(references=[0, 1, 0, 1, 0], predictions=[0, 0, 1, 1, 0])\n>>> print(results)\n{'f1': 0.5}\n```\n\nExample 2-The same simple binary example as in Example 1, but with `pos_label` set to `0`.\n```python\n>>> f1_metric = datasets.load_metric(\"f1\")\n>>> results = f1_metric.compute(references=[0, 1, 0, 1, 0], predictions=[0, 0, 1, 1, 0], pos_label=0)\n>>> print(round(results['f1'], 2))\n0.67\n```\n\nExample 3-The same simple binary example as in Example 1, but with `sample_weight` included.\n```python\n>>> f1_metric = datasets.load_metric(\"f1\")\n>>> results = f1_metric.compute(references=[0, 1, 0, 1, 0], predictions=[0, 0, 1, 1, 0], sample_weight=[0.9, 0.5, 3.9, 1.2, 0.3])\n>>> print(round(results['f1'], 2))\n0.35\n```\n\nExample 4-A multiclass example, with different values for the `average` input.\n```python\n>>> predictions = [0, 2, 1, 0, 0, 1]\n>>> references = [0, 1, 2, 0, 1, 2]\n>>> results = f1_metric.compute(predictions=predictions, references=references, average=\"macro\")\n>>> print(round(results['f1'], 2))\n0.27\n>>> results = f1_metric.compute(predictions=predictions, references=references, average=\"micro\")\n>>> print(round(results['f1'], 2))\n0.33\n>>> results = f1_metric.compute(predictions=predictions, references=references, average=\"weighted\")\n>>> print(round(results['f1'], 2))\n0.27\n>>> results = f1_metric.compute(predictions=predictions, references=references, average=None)\n>>> print(results)\n{'f1': array([0.8, 0. , 0. ])}\n```\n\n## Limitations and Bias\n\n## Citation(s)\n```bibtex\n@article{scikit-learn,\n title={Scikit-learn: Machine Learning in {P}ython},\n author={Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V.\n and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P.\n and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and\n Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.},\n journal={Journal of Machine Learning Research},\n volume={12},\n pages={2825--2830},\n year={2011}\n}\n```\n\n## Further References", "source": "huggingface_doc", "domain": "software" }, { "text": "!--Copyright 2020 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with\nthe License. You may obtain a copy of the License at\n\nhttp://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software distributed under the License is distributed on\nan \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the\nspecific language governing permissions and limitations under the License.\n\nāš ļø Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be\nrendered properly in your Markdown viewer.\n\n-->\n\n# RemBERT\n\n## Overview\n\nThe RemBERT model was proposed in [Rethinking Embedding Coupling in Pre-trained Language Models](https://arxiv.org/abs/2010.12821) by Hyung Won Chung, Thibault FĆ©vry, Henry Tsai, Melvin Johnson, Sebastian Ruder.\n\nThe abstract from the paper is the following:\n\n*We re-evaluate the standard practice of sharing weights between input and output embeddings in state-of-the-art\npre-trained language models. We show that decoupled embeddings provide increased modeling flexibility, allowing us to\nsignificantly improve the efficiency of parameter allocation in the input embedding of multilingual models. By\nreallocating the input embedding parameters in the Transformer layers, we achieve dramatically better performance on\nstandard natural language understanding tasks with the same number of parameters during fine-tuning. We also show that\nallocating additional capacity to the output embedding provides benefits to the model that persist through the\nfine-tuning stage even though the output embedding is discarded after pre-training. Our analysis shows that larger\noutput embeddings prevent the model's last layers from overspecializing to the pre-training task and encourage\nTransformer representations to be more general and more transferable to other tasks and languages. Harnessing these\nfindings, we are able to train models that achieve strong performance on the XTREME benchmark without increasing the\nnumber of parameters at the fine-tuning stage.*\n\n## Usage tips\n\nFor fine-tuning, RemBERT can be thought of as a bigger version of mBERT with an ALBERT-like factorization of the\nembedding layer. The embeddings are not tied in pre-training, in contrast with BERT, which enables smaller input\nembeddings (preserved during fine-tuning) and bigger output embeddings (discarded at fine-tuning). The tokenizer is\nalso similar to the Albert one rather than the BERT one.\n\n## Resources\n\n- [Text classification task guide](../tasks/sequence_classification)\n- [Token classification task guide](../tasks/token_classification)\n- [Question answering task guide](../tasks/question_answering)\n- [Causal language modeling task guide](../tasks/language_modeling)\n- [Masked language modeling task guide](../tasks/masked_language_modeling)\n- [Multiple choice task guide](../tasks/multiple_choice)\n\n## RemBertConfig\n\n[[autodoc]] RemBertConfig\n\n## RemBertTokenizer\n\n[[autodoc]] RemBertTokenizer\n - build_inputs_with_special_tokens\n - get_special_tokens_mask\n - create_token_type_ids_from_sequences\n - save_vocabulary\n\n## RemBertTokenizerFast\n\n[[autodoc]] RemBertTokenizerFast\n - build_inputs_with_special_tokens\n - get_special_tokens_mask\n - create_token_type_ids_from_sequences\n - save_vocabulary\n\n\n\n\n## RemBertModel\n\n[[autodoc]] RemBertModel\n - forward\n\n## RemBertForCausalLM\n\n[[autodoc]] RemBertForCausalLM\n - forward\n\n## RemBertForMaskedLM\n\n[[autodoc]] RemBertForMaskedLM\n - forward\n\n## RemBertForSequenceClassification\n\n[[autodoc]] RemBertForSequenceClassification\n - forward\n\n## RemBertForMultipleChoice\n\n[[autodoc]] RemBertForMultipleChoice\n - forward\n\n## RemBertForTokenClassification\n\n[[autodoc]] RemBertForTokenClassification\n - forward\n\n## RemBertForQuestionAnswering\n\n[[autodoc]] RemBertForQuestionAnswering\n - forward\n\n\n\n\n## TFRemBertModel\n\n[[autodoc]] TFRemBertModel\n - call\n\n## TFRemBertForMaskedLM\n\n[[autodoc]] TFRemBertForMaskedLM\n - call\n\n## TFRemBertForCausalLM\n\n[[autodoc]] TFRemBertForCausalLM\n - call\n\n## TFRemBertForSequenceClassification\n\n[[autodoc]] TFRemBertForSequenceClassification\n - call\n\n## TFRemBertForMultipleChoice\n\n[[autodoc]] TFRemBertForMultipleChoice\n - call\n\n## TFRemBertForTokenClassification\n\n[[autodoc]] TFRemBertForTokenClassification\n - call\n\n## TFRemBertForQuestionAnswering\n\n[[autodoc]] TFRemBertForQuestionAnswering\n - call\n\n\n", "source": "huggingface_doc", "domain": "software" }, { "text": "Latent Consistency Distillation Example:\n\n[Latent Consistency Models (LCMs)](https://arxiv.org/abs/2310.04378) is a method to distill a latent diffusion model to enable swift inference with minimal steps. This example demonstrates how to use latent consistency distillation to distill SDXL for inference with few timesteps.\n\n## Full model distillation\n\n### Running locally with PyTorch\n\n#### Installing the dependencies\n\nBefore running the scripts, make sure to install the library's training dependencies:\n\n**Important**\n\nTo make sure you can successfully run the latest versions of the example scripts, we highly recommend **installing from source** and keeping the install up to date as we update the example scripts frequently and install some example-specific requirements. To do this, execute the following steps in a new virtual environment:\n```bash\ngit clone https://github.com/huggingface/diffusers\ncd diffusers\npip install -e .\n```\n\nThen cd in the example folder and run\n```bash\npip install -r requirements.txt\n```\n\nAnd initialize an [šŸ¤— Accelerate](https://github.com/huggingface/accelerate/) environment with:\n\n```bash\naccelerate config\n```\n\nOr for a default accelerate configuration without answering questions about your environment\n\n```bash\naccelerate config default\n```\n\nOr if your environment doesn't support an interactive shell e.g. a notebook\n\n```python\nfrom accelerate.utils import write_basic_config\nwrite_basic_config()\n```\n\nWhen running `accelerate config`, if we specify torch compile mode to True there can be dramatic speedups.\n\n#### Example\n\nThe following uses the [Conceptual Captions 12M (CC12M) dataset](https://github.com/google-research-datasets/conceptual-12m) as an example, and for illustrative purposes only. For best results you may consider large and high-quality text-image datasets such as [LAION](https://laion.ai/blog/laion-400-open-dataset/). You may also need to search the hyperparameter space according to the dataset you use.\n\n```bash\nexport MODEL_NAME=\"stabilityai/stable-diffusion-xl-base-1.0\"\nexport OUTPUT_DIR=\"path/to/saved/model\"\n\naccelerate launch train_lcm_distill_sdxl_wds.py \\\n --pretrained_teacher_model=$MODEL_NAME \\\n --pretrained_vae_model_name_or_path=madebyollin/sdxl-vae-fp16-fix \\\n --output_dir=$OUTPUT_DIR \\\n --mixed_precision=fp16 \\\n --resolution=1024 \\\n --learning_rate=1e-6 --loss_type=\"huber\" --use_fix_crop_and_size --ema_decay=0.95 --adam_weight_decay=0.0 \\\n --max_train_steps=1000 \\\n --max_train_samples=4000000 \\\n --dataloader_num_workers=8 \\\n --train_shards_path_or_url=\"pipe:curl -L -s https://huggingface.co/datasets/laion/conceptual-captions-12m-webdataset/resolve/main/data/{00000..01099}.tar?download=true\" \\\n --validation_steps=200 \\\n --checkpointing_steps=200 --checkpoints_total_limit=10 \\\n --train_batch_size=12 \\\n --gradient_checkpointing --enable_xformers_memory_efficient_attention \\\n --gradient_accumulation_steps=1 \\\n --use_8bit_adam \\\n --resume_from_checkpoint=latest \\\n --report_to=wandb \\\n --seed=453645634 \\\n --push_to_hub \\\n```\n\n## LCM-LoRA\n\nInstead of fine-tuning the full model, we can also just train a LoRA that can be injected into any SDXL model.\n\n### Example\n\nThe following uses the [Conceptual Captions 12M (CC12M) dataset](https://github.com/google-research-datasets/conceptual-12m) as an example. For best results you may consider large and high-quality text-image datasets such as [LAION](https://laion.ai/blog/laion-400-open-dataset/).\n\n```bash\nexport MODEL_NAME=\"stabilityai/stable-diffusion-xl-base-1.0\"\nexport OUTPUT_DIR=\"path/to/saved/model\"\n\naccelerate launch train_lcm_distill_lora_sdxl_wds.py \\\n --pretrained_teacher_model=$MODEL_DIR \\\n --pretrained_vae_model_name_or_path=madebyollin/sdxl-vae-fp16-fix \\\n --output_dir=$OUTPUT_DIR \\\n --mixed_precision=fp16 \\\n --resolution=1024 \\\n --lora_rank=64 \\\n --learning_rate=1e-6 --loss_type=\"huber\" --use_fix_crop_and_size --adam_weight_decay=0.0 \\\n --max_train_steps=1000 \\\n --max_train_samples=4000000 \\\n --dataloader_num_workers=8 \\\n --train_shards_path_or_url=\"pipe:curl -L -s https://huggingface.co/datasets/laion/conceptual-captions-12m-webdataset/resolve/main/data/{00000..01099}.tar?download=true\" \\\n --validation_steps=200 \\\n --checkpointing_steps=200 --checkpoints_total_limit=10 \\\n --train_batch_size=12 \\\n --gradient_checkpointing --enable_xformers_memory_efficient_attention \\\n --gradient_accumulation_steps=1 \\\n --use_8bit_adam \\\n --resume_from_checkpoint=latest \\\n --report_to=wandb \\\n --seed=453645634 \\\n --push_to_hub \\\n```", "source": "huggingface_doc", "domain": "software" } ]