Buckets:
| """Windows UTF-8 bootstrap for Hermes entry points. | |
| Python on Windows has two long-standing text-encoding footguns: | |
| 1. ``sys.stdout`` / ``sys.stderr`` are bound to the console code page | |
| (``cp1252`` on US-locale installs), so ``print("café")`` crashes with | |
| ``UnicodeEncodeError: 'charmap' codec can't encode character``. | |
| 2. Child processes spawned via ``subprocess`` don't know to use UTF-8 | |
| unless ``PYTHONUTF8`` and/or ``PYTHONIOENCODING`` are set in their | |
| environment — so any Python subprocess (the execute_code sandbox, | |
| delegation children, linter subprocesses, etc.) inherits the same | |
| cp1252 defaults and hits the same UnicodeEncodeError. | |
| This module fixes both on Windows *only* — POSIX is untouched. It | |
| should be imported at the very top of every Hermes entry point | |
| (``hermes``, ``hermes-agent``, ``hermes-acp``, ``python -m gateway.run``, | |
| ``batch_runner.py``, ``cron/scheduler.py``) before any other imports | |
| that might do file I/O or print to stdout. | |
| What this module does on Windows: | |
| - Sets ``os.environ["PYTHONUTF8"] = "1"`` (PEP 540 UTF-8 mode) so | |
| every child process we spawn uses UTF-8 for ``open()`` and stdio. | |
| - Sets ``os.environ["PYTHONIOENCODING"] = "utf-8"`` for belt-and- | |
| suspenders — some tools read this instead of / in addition to | |
| ``PYTHONUTF8``. | |
| - Reconfigures ``sys.stdout`` / ``sys.stderr`` to UTF-8 in the current | |
| process, using the ``reconfigure()`` API (Python 3.7+). This fixes | |
| ``print("café")`` in the parent without a re-exec. | |
| What this module does NOT do: | |
| - It does not re-exec Python with ``-X utf8``, so ``open()`` calls in | |
| the *current* process still default to locale encoding. Those need | |
| an explicit ``encoding="utf-8"`` at the call site (lint rule | |
| ``PLW1514`` / ``PYI058``). Ruff is the right tool for that sweep. | |
| What this module does on POSIX: | |
| - Nothing. POSIX systems are already UTF-8 by default in 99% of cases, | |
| and we don't want to touch ``LANG``/``LC_*`` behavior that users may | |
| have configured intentionally. If someone hits a C/POSIX locale on | |
| Linux, they can export ``PYTHONUTF8=1`` themselves — we won't override. | |
| Idempotent: safe to call multiple times. ``_bootstrap_once`` guards | |
| against double-reconfigure. | |
| """ | |
| from __future__ import annotations | |
| import os | |
| import sys | |
| _IS_WINDOWS = sys.platform == "win32" | |
| _bootstrap_applied = False | |
| def apply_windows_utf8_bootstrap() -> bool: | |
| """Apply the Windows UTF-8 bootstrap if we're on Windows. | |
| Returns True if bootstrap was applied (i.e. we're on Windows and | |
| haven't already done this), False otherwise. The return value is | |
| advisory — callers normally don't need it, but tests may want to | |
| assert the path was taken. | |
| Idempotent: subsequent calls after the first are a no-op. | |
| """ | |
| global _bootstrap_applied | |
| if not _IS_WINDOWS: | |
| return False | |
| if _bootstrap_applied: | |
| return False | |
| # 1. Child processes inherit these and run in UTF-8 mode. | |
| # We use setdefault() rather than overwriting so the user can | |
| # explicitly opt out by setting PYTHONUTF8=0 in their environment | |
| # (or PYTHONIOENCODING=something-else) if they really want to. | |
| os.environ.setdefault("PYTHONUTF8", "1") | |
| os.environ.setdefault("PYTHONIOENCODING", "utf-8") | |
| # 2. Reconfigure the current process's stdio to UTF-8. Needed | |
| # because os.environ changes don't retroactively rebind sys.stdout | |
| # — those were bound at interpreter startup based on the console | |
| # code page. ``reconfigure`` is a TextIOWrapper method since 3.7. | |
| # | |
| # errors="replace" means that if we ever *read* something from | |
| # stdin that isn't UTF-8 (unlikely but possible with piped input | |
| # from legacy tools), we'll get U+FFFD replacement chars rather | |
| # than a crash. Output is pure UTF-8. | |
| for stream_name in ("stdout", "stderr"): | |
| stream = getattr(sys, stream_name, None) | |
| if stream is None: | |
| continue | |
| reconfigure = getattr(stream, "reconfigure", None) | |
| if reconfigure is None: | |
| # Not a TextIOWrapper (could be redirected to a BytesIO in | |
| # tests, or a non-standard stream in some embedded cases). | |
| # Skip silently — the env-var fix is still in effect for | |
| # child processes, which is the bigger win. | |
| continue | |
| try: | |
| reconfigure(encoding="utf-8", errors="replace") | |
| except (OSError, ValueError): | |
| # Already closed, or someone replaced it with something | |
| # non-reconfigurable. Non-fatal. | |
| pass | |
| # stdin is reconfigured separately with errors="replace" too — input | |
| # from a legacy pipe shouldn't crash the process. | |
| stdin = getattr(sys, "stdin", None) | |
| if stdin is not None: | |
| reconfigure = getattr(stdin, "reconfigure", None) | |
| if reconfigure is not None: | |
| try: | |
| reconfigure(encoding="utf-8", errors="replace") | |
| except (OSError, ValueError): | |
| pass | |
| _bootstrap_applied = True | |
| return True | |
| # Apply on import — entry points just need ``import hermes_bootstrap`` | |
| # (or ``from hermes_bootstrap import apply_windows_utf8_bootstrap``) at | |
| # the very top of their module, before importing anything else. The | |
| # import side effect does the right thing. | |
| apply_windows_utf8_bootstrap() | |
Xet Storage Details
- Size:
- 5.4 kB
- Xet hash:
- 4bbc85049532a2aa63b1be6a3da45223d4ef6a43401964d077f8d4fa09a2ff20
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.