Buckets:
MisterAI/LocalAI_Demo_backends / cpu-diffusers.upgrade-tmp /venv /lib /python3.10 /site-packages /regex /_main.py
| # | |
| # Secret Labs' Regular Expression Engine | |
| # | |
| # Copyright (c) 1998-2001 by Secret Labs AB. All rights reserved. | |
| # | |
| # This version of the SRE library can be redistributed under CNRI's | |
| # Python 1.6 license. For any other use, please contact Secret Labs | |
| # AB (info@pythonware.com). | |
| # | |
| # Portions of this engine have been developed in cooperation with | |
| # CNRI. Hewlett-Packard provided funding for 1.6 integration and | |
| # other compatibility work. | |
| # | |
| # 2010-01-16 mrab Python front-end re-written and extended | |
| r"""Support for regular expressions (RE). | |
| This module provides regular expression matching operations similar to those | |
| found in Perl. It supports both 8-bit and Unicode strings; both the pattern and | |
| the strings being processed can contain null bytes and characters outside the | |
| US ASCII range. | |
| Regular expressions can contain both special and ordinary characters. Most | |
| ordinary characters, like "A", "a", or "0", are the simplest regular | |
| expressions; they simply match themselves. You can concatenate ordinary | |
| characters, so last matches the string 'last'. | |
| There are a few differences between the old (legacy) behaviour and the new | |
| (enhanced) behaviour, which are indicated by VERSION0 or VERSION1. | |
| The special characters are: | |
| "." Matches any character except a newline. | |
| "^" Matches the start of the string. | |
| "$" Matches the end of the string or just before the | |
| newline at the end of the string. | |
| "*" Matches 0 or more (greedy) repetitions of the preceding | |
| RE. Greedy means that it will match as many repetitions | |
| as possible. | |
| "+" Matches 1 or more (greedy) repetitions of the preceding | |
| RE. | |
| "?" Matches 0 or 1 (greedy) of the preceding RE. | |
| *?,+?,?? Non-greedy versions of the previous three special | |
| characters. | |
| *+,++,?+ Possessive versions of the previous three special | |
| characters. | |
| {m,n} Matches from m to n repetitions of the preceding RE. | |
| {m,n}? Non-greedy version of the above. | |
| {m,n}+ Possessive version of the above. | |
| {...} Fuzzy matching constraints. | |
| "\\" Either escapes special characters or signals a special | |
| sequence. | |
| [...] Indicates a set of characters. A "^" as the first | |
| character indicates a complementing set. | |
| "|" A|B, creates an RE that will match either A or B. | |
| (...) Matches the RE inside the parentheses. The contents are | |
| captured and can be retrieved or matched later in the | |
| string. | |
| (?flags-flags) VERSION1: Sets/clears the flags for the remainder of | |
| the group or pattern; VERSION0: Sets the flags for the | |
| entire pattern. | |
| (?:...) Non-capturing version of regular parentheses. | |
| (?>...) Atomic non-capturing version of regular parentheses. | |
| (?flags-flags:...) Non-capturing version of regular parentheses with local | |
| flags. | |
| (?P<name>...) The substring matched by the group is accessible by | |
| name. | |
| (?<name>...) The substring matched by the group is accessible by | |
| name. | |
| (?P=name) Matches the text matched earlier by the group named | |
| name. | |
| (?#...) A comment; ignored. | |
| (?=...) Matches if ... matches next, but doesn't consume the | |
| string. | |
| (?!...) Matches if ... doesn't match next. | |
| (?<=...) Matches if preceded by .... | |
| (?<!...) Matches if not preceded by .... | |
| (?(id)yes|no) Matches yes pattern if group id matched, the (optional) | |
| no pattern otherwise. | |
| (?(DEFINE)...) If there's no group called "DEFINE", then ... will be | |
| ignored, but any group definitions will be available. | |
| (?|...|...) (?|A|B), creates an RE that will match either A or B, | |
| but reuses capture group numbers across the | |
| alternatives. | |
| (*FAIL) Forces matching to fail, which means immediate | |
| backtracking. | |
| (*F) Abbreviation for (*FAIL). | |
| (*PRUNE) Discards the current backtracking information. Its | |
| effect doesn't extend outside an atomic group or a | |
| lookaround. | |
| (*SKIP) Similar to (*PRUNE), except that it also sets where in | |
| the text the next attempt at matching the entire | |
| pattern will start. Its effect doesn't extend outside | |
| an atomic group or a lookaround. | |
| The fuzzy matching constraints are: "i" to permit insertions, "d" to permit | |
| deletions, "s" to permit substitutions, "e" to permit any of these. Limits are | |
| optional with "<=" and "<". If any type of error is provided then any type not | |
| provided is not permitted. | |
| A cost equation may be provided. | |
| Examples: | |
| (?:fuzzy){i<=2} | |
| (?:fuzzy){i<=1,s<=2,d<=1,1i+1s+1d<3} | |
| VERSION1: Set operators are supported, and a set can include nested sets. The | |
| set operators, in order of increasing precedence, are: | |
| || Set union ("x||y" means "x or y"). | |
| ~~ (double tilde) Symmetric set difference ("x~~y" means "x or y, but not | |
| both"). | |
| && Set intersection ("x&&y" means "x and y"). | |
| -- (double dash) Set difference ("x--y" means "x but not y"). | |
| Implicit union, ie, simple juxtaposition like in [ab], has the highest | |
| precedence. | |
| VERSION0 and VERSION1: | |
| The special sequences consist of "\\" and a character from the list below. If | |
| the ordinary character is not on the list, then the resulting RE will match the | |
| second character. | |
| \number Matches the contents of the group of the same number if | |
| number is no more than 2 digits, otherwise the character | |
| with the 3-digit octal code. | |
| \a Matches the bell character. | |
| \A Matches only at the start of the string. | |
| \b Matches the empty string, but only at the start or end of a | |
| word. | |
| \B Matches the empty string, but not at the start or end of a | |
| word. | |
| \d Matches any decimal digit; equivalent to the set [0-9] when | |
| matching a bytestring or a Unicode string with the ASCII | |
| flag, or the whole range of Unicode digits when matching a | |
| Unicode string. | |
| \D Matches any non-digit character; equivalent to [^\d]. | |
| \f Matches the formfeed character. | |
| \g<name> Matches the text matched by the group named name. | |
| \G Matches the empty string, but only at the position where | |
| the search started. | |
| \h Matches horizontal whitespace. | |
| \K Keeps only what follows for the entire match. | |
| \L<name> Named list. The list is provided as a keyword argument. | |
| \m Matches the empty string, but only at the start of a word. | |
| \M Matches the empty string, but only at the end of a word. | |
| \n Matches the newline character. | |
| \N{name} Matches the named character. | |
| \p{name=value} Matches the character if its property has the specified | |
| value. | |
| \P{name=value} Matches the character if its property hasn't the specified | |
| value. | |
| \r Matches the carriage-return character. | |
| \s Matches any whitespace character; equivalent to | |
| [ \t\n\r\f\v]. | |
| \S Matches any non-whitespace character; equivalent to [^\s]. | |
| \t Matches the tab character. | |
| \uXXXX Matches the Unicode codepoint with 4-digit hex code XXXX. | |
| \UXXXXXXXX Matches the Unicode codepoint with 8-digit hex code | |
| XXXXXXXX. | |
| \v Matches the vertical tab character. | |
| \w Matches any alphanumeric character; equivalent to | |
| [a-zA-Z0-9_] when matching a bytestring or a Unicode string | |
| with the ASCII flag, or the whole range of Unicode | |
| alphanumeric characters (letters plus digits plus | |
| underscore) when matching a Unicode string. With LOCALE, it | |
| will match the set [0-9_] plus characters defined as | |
| letters for the current locale. | |
| \W Matches the complement of \w; equivalent to [^\w]. | |
| \xXX Matches the character with 2-digit hex code XX. | |
| \X Matches a grapheme. | |
| \Z Matches only at the end of the string. | |
| \z Matches only at the end of the string. Alias of \Z. | |
| \\ Matches a literal backslash. | |
| This module exports the following functions: | |
| match Match a regular expression pattern at the beginning of a string. | |
| prefixmatch Match a regular expression pattern at the beginning of a string. | |
| Alias of match. | |
| fullmatch Match a regular expression pattern against all of a string. | |
| search Search a string for the presence of a pattern. | |
| sub Substitute occurrences of a pattern found in a string using a | |
| template string. | |
| subf Substitute occurrences of a pattern found in a string using a | |
| format string. | |
| subn Same as sub, but also return the number of substitutions made. | |
| subfn Same as subf, but also return the number of substitutions made. | |
| split Split a string by the occurrences of a pattern. VERSION1: will | |
| split at zero-width match; VERSION0: won't split at zero-width | |
| match. | |
| splititer Return an iterator yielding the parts of a split string. | |
| findall Find all occurrences of a pattern in a string. | |
| finditer Return an iterator yielding a match object for each match. | |
| compile Compile a pattern into a Pattern object. | |
| purge Clear the regular expression cache. | |
| escape Backslash all non-alphanumerics or special characters in a | |
| string. | |
| Most of the functions support a concurrent parameter: if True, the GIL will be | |
| released during matching, allowing other Python threads to run concurrently. If | |
| the string changes during matching, the behaviour is undefined. This parameter | |
| is not needed when working on the builtin (immutable) string classes. | |
| Some of the functions in this module take flags as optional parameters. Most of | |
| these flags can also be set within an RE: | |
| A a ASCII Make \w, \W, \b, \B, \d, and \D match the | |
| corresponding ASCII character categories. Default | |
| when matching a bytestring. | |
| B b BESTMATCH Find the best fuzzy match (default is first). | |
| D DEBUG Print the parsed pattern. | |
| E e ENHANCEMATCH Attempt to improve the fit after finding the first | |
| fuzzy match. | |
| F f FULLCASE Use full case-folding when performing | |
| case-insensitive matching in Unicode. | |
| I i IGNORECASE Perform case-insensitive matching. | |
| L L LOCALE Make \w, \W, \b, \B, \d, and \D dependent on the | |
| current locale. (One byte per character only.) | |
| M m MULTILINE "^" matches the beginning of lines (after a newline) | |
| as well as the string. "$" matches the end of lines | |
| (before a newline) as well as the end of the string. | |
| P p POSIX Perform POSIX-standard matching (leftmost longest). | |
| R r REVERSE Searches backwards. | |
| S s DOTALL "." matches any character at all, including the | |
| newline. | |
| U u UNICODE Make \w, \W, \b, \B, \d, and \D dependent on the | |
| Unicode locale. Default when matching a Unicode | |
| string. | |
| V0 V0 VERSION0 Turn on the old legacy behaviour. | |
| V1 V1 VERSION1 Turn on the new enhanced behaviour. This flag | |
| includes the FULLCASE flag. | |
| W w WORD Make \b and \B work with default Unicode word breaks | |
| and make ".", "^" and "$" work with Unicode line | |
| breaks. | |
| X x VERBOSE Ignore whitespace and comments for nicer looking REs. | |
| This module also defines an exception 'error'. | |
| """ | |
| # Public symbols. | |
| __all__ = ["cache_all", "compile", "DEFAULT_VERSION", "escape", "findall", | |
| "finditer", "fullmatch", "match", "prefixmatch", "purge", "search", "split", "splititer", | |
| "sub", "subf", "subfn", "subn", "template", "Scanner", "A", "ASCII", "B", | |
| "BESTMATCH", "D", "DEBUG", "E", "ENHANCEMATCH", "S", "DOTALL", "F", | |
| "FULLCASE", "I", "IGNORECASE", "L", "LOCALE", "M", "MULTILINE", "P", "POSIX", | |
| "R", "REVERSE", "T", "TEMPLATE", "U", "UNICODE", "V0", "VERSION0", "V1", | |
| "VERSION1", "X", "VERBOSE", "W", "WORD", "error", "Regex", "__version__", | |
| "__doc__", "RegexFlag"] | |
| __version__ = "2026.5.9" | |
| # -------------------------------------------------------------------- | |
| # Public interface. | |
| def match(pattern, string, flags=0, pos=None, endpos=None, partial=False, | |
| concurrent=None, timeout=None, ignore_unused=False, **kwargs): | |
| """Try to apply the pattern at the start of the string, returning a match | |
| object, or None if no match was found.""" | |
| pat = _compile(pattern, flags, ignore_unused, kwargs, True) | |
| return pat.match(string, pos, endpos, concurrent, partial, timeout) | |
| def prefixmatch(pattern, string, flags=0, pos=None, endpos=None, partial=False, | |
| concurrent=None, timeout=None, ignore_unused=False, **kwargs): | |
| """Try to apply the pattern at the start of the string, returning a match | |
| object, or None if no match was found. Alias of match().""" | |
| pat = _compile(pattern, flags, ignore_unused, kwargs, True) | |
| return pat.match(string, pos, endpos, concurrent, partial, timeout) | |
| def fullmatch(pattern, string, flags=0, pos=None, endpos=None, partial=False, | |
| concurrent=None, timeout=None, ignore_unused=False, **kwargs): | |
| """Try to apply the pattern against all of the string, returning a match | |
| object, or None if no match was found.""" | |
| pat = _compile(pattern, flags, ignore_unused, kwargs, True) | |
| return pat.fullmatch(string, pos, endpos, concurrent, partial, timeout) | |
| def search(pattern, string, flags=0, pos=None, endpos=None, partial=False, | |
| concurrent=None, timeout=None, ignore_unused=False, **kwargs): | |
| """Search through string looking for a match to the pattern, returning a | |
| match object, or None if no match was found.""" | |
| pat = _compile(pattern, flags, ignore_unused, kwargs, True) | |
| return pat.search(string, pos, endpos, concurrent, partial, timeout) | |
| def sub(pattern, repl, string, count=0, flags=0, pos=None, endpos=None, | |
| concurrent=None, timeout=None, ignore_unused=False, **kwargs): | |
| """Return the string obtained by replacing the leftmost (or rightmost with a | |
| reverse pattern) non-overlapping occurrences of the pattern in string by the | |
| replacement repl. repl can be either a string or a callable; if a string, | |
| backslash escapes in it are processed; if a callable, it's passed the match | |
| object and must return a replacement string to be used.""" | |
| pat = _compile(pattern, flags, ignore_unused, kwargs, True) | |
| return pat.sub(repl, string, count, pos, endpos, concurrent, timeout) | |
| def subf(pattern, format, string, count=0, flags=0, pos=None, endpos=None, | |
| concurrent=None, timeout=None, ignore_unused=False, **kwargs): | |
| """Return the string obtained by replacing the leftmost (or rightmost with a | |
| reverse pattern) non-overlapping occurrences of the pattern in string by the | |
| replacement format. format can be either a string or a callable; if a string, | |
| it's treated as a format string; if a callable, it's passed the match object | |
| and must return a replacement string to be used.""" | |
| pat = _compile(pattern, flags, ignore_unused, kwargs, True) | |
| return pat.subf(format, string, count, pos, endpos, concurrent, timeout) | |
| def subn(pattern, repl, string, count=0, flags=0, pos=None, endpos=None, | |
| concurrent=None, timeout=None, ignore_unused=False, **kwargs): | |
| """Return a 2-tuple containing (new_string, number). new_string is the string | |
| obtained by replacing the leftmost (or rightmost with a reverse pattern) | |
| non-overlapping occurrences of the pattern in the source string by the | |
| replacement repl. number is the number of substitutions that were made. repl | |
| can be either a string or a callable; if a string, backslash escapes in it | |
| are processed; if a callable, it's passed the match object and must return a | |
| replacement string to be used.""" | |
| pat = _compile(pattern, flags, ignore_unused, kwargs, True) | |
| return pat.subn(repl, string, count, pos, endpos, concurrent, timeout) | |
| def subfn(pattern, format, string, count=0, flags=0, pos=None, endpos=None, | |
| concurrent=None, timeout=None, ignore_unused=False, **kwargs): | |
| """Return a 2-tuple containing (new_string, number). new_string is the string | |
| obtained by replacing the leftmost (or rightmost with a reverse pattern) | |
| non-overlapping occurrences of the pattern in the source string by the | |
| replacement format. number is the number of substitutions that were made. format | |
| can be either a string or a callable; if a string, it's treated as a format | |
| string; if a callable, it's passed the match object and must return a | |
| replacement string to be used.""" | |
| pat = _compile(pattern, flags, ignore_unused, kwargs, True) | |
| return pat.subfn(format, string, count, pos, endpos, concurrent, timeout) | |
| def split(pattern, string, maxsplit=0, flags=0, concurrent=None, timeout=None, | |
| ignore_unused=False, **kwargs): | |
| """Split the source string by the occurrences of the pattern, returning a | |
| list containing the resulting substrings. If capturing parentheses are used | |
| in pattern, then the text of all groups in the pattern are also returned as | |
| part of the resulting list. If maxsplit is nonzero, at most maxsplit splits | |
| occur, and the remainder of the string is returned as the final element of | |
| the list.""" | |
| pat = _compile(pattern, flags, ignore_unused, kwargs, True) | |
| return pat.split(string, maxsplit, concurrent, timeout) | |
| def splititer(pattern, string, maxsplit=0, flags=0, concurrent=None, | |
| timeout=None, ignore_unused=False, **kwargs): | |
| "Return an iterator yielding the parts of a split string." | |
| pat = _compile(pattern, flags, ignore_unused, kwargs, True) | |
| return pat.splititer(string, maxsplit, concurrent, timeout) | |
| def findall(pattern, string, flags=0, pos=None, endpos=None, overlapped=False, | |
| concurrent=None, timeout=None, ignore_unused=False, **kwargs): | |
| """Return a list of all matches in the string. The matches may be overlapped | |
| if overlapped is True. If one or more groups are present in the pattern, | |
| return a list of groups; this will be a list of tuples if the pattern has | |
| more than one group. Empty matches are included in the result.""" | |
| pat = _compile(pattern, flags, ignore_unused, kwargs, True) | |
| return pat.findall(string, pos, endpos, overlapped, concurrent, timeout) | |
| def finditer(pattern, string, flags=0, pos=None, endpos=None, overlapped=False, | |
| partial=False, concurrent=None, timeout=None, ignore_unused=False, **kwargs): | |
| """Return an iterator over all matches in the string. The matches may be | |
| overlapped if overlapped is True. For each match, the iterator returns a | |
| match object. Empty matches are included in the result.""" | |
| pat = _compile(pattern, flags, ignore_unused, kwargs, True) | |
| return pat.finditer(string, pos, endpos, overlapped, concurrent, partial, | |
| timeout) | |
| def compile(pattern, flags=0, ignore_unused=False, cache_pattern=None, **kwargs): | |
| "Compile a regular expression pattern, returning a pattern object." | |
| if cache_pattern is None: | |
| cache_pattern = _cache_all | |
| return _compile(pattern, flags, ignore_unused, kwargs, cache_pattern) | |
| def purge(): | |
| "Clear the regular expression cache" | |
| _cache.clear() | |
| _locale_sensitive.clear() | |
| # Whether to cache all patterns. | |
| _cache_all = True | |
| def cache_all(value=True): | |
| """Sets whether to cache all patterns, even those are compiled explicitly. | |
| Passing None has no effect, but returns the current setting.""" | |
| global _cache_all | |
| if value is None: | |
| return _cache_all | |
| _cache_all = value | |
| def template(pattern, flags=0): | |
| "Compile a template pattern, returning a pattern object." | |
| return _compile(pattern, flags | TEMPLATE, False, {}, False) | |
| def escape(pattern, special_only=True, literal_spaces=False): | |
| """Escape a string for use as a literal in a pattern. If special_only is | |
| True, escape only special characters, else escape all non-alphanumeric | |
| characters. If literal_spaces is True, don't escape spaces.""" | |
| # Convert it to Unicode. | |
| if isinstance(pattern, bytes): | |
| p = pattern.decode("latin-1") | |
| else: | |
| p = pattern | |
| s = [] | |
| if special_only: | |
| for c in p: | |
| if c == " " and literal_spaces: | |
| s.append(c) | |
| elif c in _METACHARS or c.isspace(): | |
| s.append("\\") | |
| s.append(c) | |
| else: | |
| s.append(c) | |
| else: | |
| for c in p: | |
| if c == " " and literal_spaces: | |
| s.append(c) | |
| elif c in _ALNUM: | |
| s.append(c) | |
| else: | |
| s.append("\\") | |
| s.append(c) | |
| r = "".join(s) | |
| # Convert it back to bytes if necessary. | |
| if isinstance(pattern, bytes): | |
| r = r.encode("latin-1") | |
| return r | |
| # -------------------------------------------------------------------- | |
| # Internals. | |
| from regex import _regex_core | |
| from regex import _regex | |
| from threading import RLock as _RLock | |
| from locale import getpreferredencoding as _getpreferredencoding | |
| from regex._regex_core import * | |
| from regex._regex_core import (_ALL_VERSIONS, _ALL_ENCODINGS, _FirstSetError, | |
| _UnscopedFlagSet, _check_group_features, _compile_firstset, | |
| _compile_replacement, _flatten_code, _fold_case, _get_required_string, | |
| _parse_pattern, _shrink_cache) | |
| from regex._regex_core import (ALNUM as _ALNUM, Info as _Info, OP as _OP, Source | |
| as _Source, Fuzzy as _Fuzzy) | |
| # Version 0 is the old behaviour, compatible with the original 're' module. | |
| # Version 1 is the new behaviour, which differs slightly. | |
| DEFAULT_VERSION = RegexFlag.VERSION0 | |
| _METACHARS = frozenset("()[]{}?*+|^$\\.-#&~") | |
| _regex_core.DEFAULT_VERSION = DEFAULT_VERSION | |
| # Caches for the patterns and replacements. | |
| _cache = {} | |
| _cache_lock = _RLock() | |
| _named_args = {} | |
| _replacement_cache = {} | |
| _locale_sensitive = {} | |
| # Maximum size of the cache. | |
| _MAXCACHE = 500 | |
| _MAXREPCACHE = 500 | |
| def _compile(pattern, flags, ignore_unused, kwargs, cache_it): | |
| "Compiles a regular expression to a PatternObject." | |
| global DEFAULT_VERSION | |
| try: | |
| from regex import DEFAULT_VERSION | |
| except ImportError: | |
| pass | |
| # We won't bother to cache the pattern if we're debugging. | |
| if (flags & DEBUG) != 0: | |
| cache_it = False | |
| # What locale is this pattern using? | |
| locale_key = (type(pattern), pattern) | |
| if _locale_sensitive.get(locale_key, True) or (flags & LOCALE) != 0: | |
| # This pattern is, or might be, locale-sensitive. | |
| pattern_locale = _getpreferredencoding() | |
| else: | |
| # This pattern is definitely not locale-sensitive. | |
| pattern_locale = None | |
| def complain_unused_args(): | |
| if ignore_unused: | |
| return | |
| # Complain about any unused keyword arguments, possibly resulting from a typo. | |
| unused_kwargs = set(kwargs) - {k for k, v in args_needed} | |
| if unused_kwargs: | |
| any_one = next(iter(unused_kwargs)) | |
| raise ValueError('unused keyword argument {!a}'.format(any_one)) | |
| if cache_it: | |
| try: | |
| # Do we know what keyword arguments are needed? | |
| args_key = pattern, type(pattern), flags | |
| args_needed = _named_args[args_key] | |
| # Are we being provided with its required keyword arguments? | |
| args_supplied = set() | |
| if args_needed: | |
| for k, v in args_needed: | |
| try: | |
| args_supplied.add((k, frozenset(kwargs[k]))) | |
| except KeyError: | |
| raise error("missing named list: {!r}".format(k)) | |
| complain_unused_args() | |
| args_supplied = frozenset(args_supplied) | |
| # Have we already seen this regular expression and named list? | |
| pattern_key = (pattern, type(pattern), flags, args_supplied, | |
| DEFAULT_VERSION, pattern_locale) | |
| return _cache[pattern_key] | |
| except KeyError: | |
| # It's a new pattern, or new named list for a known pattern. | |
| pass | |
| # Guess the encoding from the class of the pattern string. | |
| if isinstance(pattern, str): | |
| guess_encoding = UNICODE | |
| elif isinstance(pattern, bytes): | |
| guess_encoding = ASCII | |
| elif isinstance(pattern, Pattern): | |
| if flags: | |
| raise ValueError("cannot process flags argument with a compiled pattern") | |
| return pattern | |
| else: | |
| raise TypeError("first argument must be a string or compiled pattern") | |
| # Set the default version in the core code in case it has been changed. | |
| _regex_core.DEFAULT_VERSION = DEFAULT_VERSION | |
| global_flags = flags | |
| while True: | |
| caught_exception = None | |
| try: | |
| source = _Source(pattern) | |
| info = _Info(global_flags, source.char_type, kwargs) | |
| info.guess_encoding = guess_encoding | |
| source.ignore_space = bool(info.flags & VERBOSE) | |
| parsed = _parse_pattern(source, info) | |
| break | |
| except _UnscopedFlagSet: | |
| # Remember the global flags for the next attempt. | |
| global_flags = info.global_flags | |
| except error as e: | |
| caught_exception = e | |
| if caught_exception: | |
| raise error(caught_exception.msg, caught_exception.pattern, | |
| caught_exception.pos) | |
| if not source.at_end(): | |
| raise error("unbalanced parenthesis", pattern, source.pos) | |
| # Check the global flags for conflicts. | |
| version = (info.flags & _ALL_VERSIONS) or DEFAULT_VERSION | |
| if version not in (0, VERSION0, VERSION1): | |
| raise ValueError("VERSION0 and VERSION1 flags are mutually incompatible") | |
| if (info.flags & _ALL_ENCODINGS) not in (0, ASCII, LOCALE, UNICODE): | |
| raise ValueError("ASCII, LOCALE and UNICODE flags are mutually incompatible") | |
| if isinstance(pattern, bytes) and (info.flags & UNICODE): | |
| raise ValueError("cannot use UNICODE flag with a bytes pattern") | |
| if not (info.flags & _ALL_ENCODINGS): | |
| if isinstance(pattern, str): | |
| info.flags |= UNICODE | |
| else: | |
| info.flags |= ASCII | |
| reverse = bool(info.flags & REVERSE) | |
| fuzzy = isinstance(parsed, _Fuzzy) | |
| # Remember whether this pattern as an inline locale flag. | |
| _locale_sensitive[locale_key] = info.inline_locale | |
| # Fix the group references. | |
| caught_exception = None | |
| try: | |
| parsed.fix_groups(pattern, reverse, False) | |
| except error as e: | |
| caught_exception = e | |
| if caught_exception: | |
| raise error(caught_exception.msg, caught_exception.pattern, | |
| caught_exception.pos) | |
| # Should we print the parsed pattern? | |
| if flags & DEBUG: | |
| parsed.dump(indent=0, reverse=reverse) | |
| # Optimise the parsed pattern. | |
| parsed = parsed.optimise(info, reverse) | |
| parsed = parsed.pack_characters(info) | |
| # Get the required string. | |
| req_offset, req_chars, req_flags = _get_required_string(parsed, info.flags) | |
| # Build the named lists. | |
| named_lists = {} | |
| named_list_indexes = [None] * len(info.named_lists_used) | |
| args_needed = set() | |
| for key, index in info.named_lists_used.items(): | |
| name, case_flags = key | |
| values = frozenset(kwargs[name]) | |
| if case_flags: | |
| items = frozenset(_fold_case(info, v) for v in values) | |
| else: | |
| items = values | |
| named_lists[name] = values | |
| named_list_indexes[index] = items | |
| args_needed.add((name, values)) | |
| complain_unused_args() | |
| # Check the features of the groups. | |
| _check_group_features(info, parsed) | |
| # Compile the parsed pattern. The result is a list of tuples. | |
| code = parsed.compile(reverse) | |
| # Is there a group call to the pattern as a whole? | |
| key = (0, reverse, fuzzy) | |
| ref = info.call_refs.get(key) | |
| if ref is not None: | |
| code = [(_OP.CALL_REF, ref)] + code + [(_OP.END, )] | |
| # Add the final 'success' opcode. | |
| code += [(_OP.SUCCESS, )] | |
| # Compile the additional copies of the groups that we need. | |
| for group, rev, fuz in info.additional_groups: | |
| code += group.compile(rev, fuz) | |
| # Flatten the code into a list of ints. | |
| code = _flatten_code(code) | |
| if not parsed.has_simple_start(): | |
| # Get the first set, if possible. | |
| try: | |
| fs_code = _compile_firstset(info, parsed.get_firstset(reverse)) | |
| fs_code = _flatten_code(fs_code) | |
| code = fs_code + code | |
| except _FirstSetError: | |
| pass | |
| # The named capture groups. | |
| index_group = dict((v, n) for n, v in info.group_index.items()) | |
| # Create the PatternObject. | |
| # | |
| # Local flags like IGNORECASE affect the code generation, but aren't needed | |
| # by the PatternObject itself. Conversely, global flags like LOCALE _don't_ | |
| # affect the code generation but _are_ needed by the PatternObject. | |
| compiled_pattern = _regex.compile(pattern, info.flags | version, code, | |
| info.group_index, index_group, named_lists, named_list_indexes, | |
| req_offset, req_chars, req_flags, info.group_count) | |
| # Do we need to reduce the size of the cache? | |
| if len(_cache) >= _MAXCACHE: | |
| with _cache_lock: | |
| _shrink_cache(_cache, _named_args, _locale_sensitive, _MAXCACHE) | |
| if cache_it: | |
| if (info.flags & LOCALE) == 0: | |
| pattern_locale = None | |
| args_needed = frozenset(args_needed) | |
| # Store this regular expression and named list. | |
| pattern_key = (pattern, type(pattern), flags, args_needed, | |
| DEFAULT_VERSION, pattern_locale) | |
| _cache[pattern_key] = compiled_pattern | |
| # Store what keyword arguments are needed. | |
| _named_args[args_key] = args_needed | |
| return compiled_pattern | |
| def _compile_replacement_helper(pattern, template): | |
| "Compiles a replacement template." | |
| # This function is called by the _regex module. | |
| # Have we seen this before? | |
| key = pattern.pattern, pattern.flags, template | |
| compiled = _replacement_cache.get(key) | |
| if compiled is not None: | |
| return compiled | |
| if len(_replacement_cache) >= _MAXREPCACHE: | |
| _replacement_cache.clear() | |
| is_unicode = isinstance(template, str) | |
| source = _Source(template) | |
| if is_unicode: | |
| def make_string(char_codes): | |
| return "".join(chr(c) for c in char_codes) | |
| else: | |
| def make_string(char_codes): | |
| return bytes(char_codes) | |
| compiled = [] | |
| literal = [] | |
| while True: | |
| ch = source.get() | |
| if not ch: | |
| break | |
| if ch == "\\": | |
| # '_compile_replacement' will return either an int group reference | |
| # or a string literal. It returns items (plural) in order to handle | |
| # a 2-character literal (an invalid escape sequence). | |
| is_group, items = _compile_replacement(source, pattern, is_unicode) | |
| if is_group: | |
| # It's a group, so first flush the literal. | |
| if literal: | |
| compiled.append(make_string(literal)) | |
| literal = [] | |
| compiled.extend(items) | |
| else: | |
| literal.extend(items) | |
| else: | |
| literal.append(ord(ch)) | |
| # Flush the literal. | |
| if literal: | |
| compiled.append(make_string(literal)) | |
| _replacement_cache[key] = compiled | |
| return compiled | |
| # We define Pattern here after all the support objects have been defined. | |
| _pat = _compile('', 0, False, {}, False) | |
| Pattern = type(_pat) | |
| Match = type(_pat.match('')) | |
| del _pat | |
| # Make Pattern public for typing annotations. | |
| __all__.append("Pattern") | |
| __all__.append("Match") | |
| # We'll define an alias for the 'compile' function so that the repr of a | |
| # pattern object is eval-able. | |
| Regex = compile | |
| # Register myself for pickling. | |
| import copyreg as _copy_reg | |
| def _pickle(pattern): | |
| return _regex.compile, pattern._pickled_data | |
| _copy_reg.pickle(Pattern, _pickle) | |
Xet Storage Details
- Size:
- 33.3 kB
- Xet hash:
- 63c162053f634a2ae3f2b5a436a5c9f94af93b44fe31fdba05df65e68e639c36
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.