site_name: transformer-deploy by Lefebvre Dalloz repo_name: ELS-RD/transformer-deploy/ repo_url: https://github.com/ELS-RD/transformer-deploy/ site_description: Efficient, scalable and enterprise-grade CPU/GPU inference server for Hugging Face transformer models copyright: Copyright © 2020 - 2021 Lefebvre Dalloz edit_uri: "" theme: name: material custom_dir: docs/overrides palette: scheme: default primary: black accent: deep orange # toggle: # icon: material/weather-night # name: Switch to light mode # - scheme: default # primary: black # accent: deep orange # toggle: # icon: material/weather-sunny # name: Switch to dark mode icon: repo: fontawesome/brands/github logo: material/speedometer markdown_extensions: - admonition - pymdownx.details - pymdownx.superfences - pymdownx.emoji: emoji_index: !!python/name:materialx.emoji.twemoji emoji_generator: !!python/name:materialx.emoji.to_svg - pymdownx.highlight: anchor_linenums: true - pymdownx.inlinehilite - pymdownx.snippets - pymdownx.superfences - toc: permalink: "#" - pymdownx.critic - pymdownx.caret - pymdownx.mark - pymdownx.tilde - abbr - attr_list - md_in_html plugins: - search - mkdocstrings: handlers: python: selection: docstring_style: restructured-text - include-markdown - mkdocs-jupyter: theme: dark ignore_h1_titles: True - gen-files: scripts: - resources/gen_doc_stubs.py - literate-nav: nav_file: SUMMARY.md extra: social: - icon: fontawesome/brands/twitter link: https://twitter.com/pommedeterre33 - icon: fontawesome/brands/medium link: https://medium.com/@pommedeterre33 generator: false nav: - Getting started: index.md - Installation (local or Docker only): setup_local.md - Run (1 command): run.md - Which tool to choose for your inference?: compare.md - How ONNX conversion works? : onnx_convert.md - Understanding model optimization : optimizations.md - Direct use TensorRT in Python script (no server): python.md - GPU quantization for X2 speed-up: - Why using quantization?: quantization/quantization_intro.md - Quantization theory: quantization/quantization_theory.md - How is it implemented in this library?: quantization/quantization_ast.md - PTQ and QAT, what are they?: quantization/quantization_ptq.md - End to end demo: quantization/quantization.ipynb - "From optimization to deployment: end to end demo": demo.md - "Accelerate text generation with GPT-2": gpt2.ipynb - "Accelerate text generation with T5": t5.ipynb - Benchmarks run on AWS GPU instances: benchmarks.md - FAQ: faq.md - API: reference/