Mirrowel commited on
Commit
ae7ffce
Β·
1 Parent(s): e2f4e9e

docs(timeout): πŸ“š add comprehensive HTTP timeout configuration documentation

Browse files

Adds detailed documentation section explaining the TimeoutConfig class and its usage across LLM providers:

- Explains timeout types (connect, read, write, pool) and their purposes
- Documents default values with rationale for streaming vs non-streaming requests
- Provides environment variable override reference
- Details behavioral differences between streaming (3 min read timeout) and non-streaming (10 min read timeout) configurations
- Maps which providers use which timeout configurations
- Includes tuning recommendations for different use cases
- Provides example configurations for complex reasoning tasks and unstable networks

Files changed (1) hide show
  1. DOCUMENTATION.md +100 -0
DOCUMENTATION.md CHANGED
@@ -858,6 +858,106 @@ class AntigravityAuthBase(GoogleOAuthBase):
858
 
859
  ---
860
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
861
 
862
  ---
863
 
 
858
 
859
  ---
860
 
861
+ ### 2.14. HTTP Timeout Configuration (`timeout_config.py`)
862
+
863
+ Centralized timeout configuration for all HTTP requests to LLM providers.
864
+
865
+ #### Purpose
866
+
867
+ The `TimeoutConfig` class provides fine-grained control over HTTP timeouts for streaming and non-streaming LLM requests. This addresses the common issue of proxy hangs when upstream providers stall during connection establishment or response generation.
868
+
869
+ #### Timeout Types Explained
870
+
871
+ | Timeout | Description |
872
+ |---------|-------------|
873
+ | **connect** | Maximum time to establish a TCP/TLS connection to the upstream server |
874
+ | **read** | Maximum time to wait between receiving data chunks (resets on each chunk for streaming) |
875
+ | **write** | Maximum time to wait while sending the request body |
876
+ | **pool** | Maximum time to wait for a connection from the connection pool |
877
+
878
+ #### Default Values
879
+
880
+ | Setting | Streaming | Non-Streaming | Rationale |
881
+ |---------|-----------|---------------|-----------|
882
+ | **connect** | 30s | 30s | Fast fail if server is unreachable |
883
+ | **read** | 180s (3 min) | 600s (10 min) | Streaming expects periodic chunks; non-streaming may wait for full generation |
884
+ | **write** | 30s | 30s | Request bodies are typically small |
885
+ | **pool** | 60s | 60s | Reasonable wait for connection pool |
886
+
887
+ #### Environment Variable Overrides
888
+
889
+ All timeout values can be customized via environment variables:
890
+
891
+ ```env
892
+ # Connection establishment timeout (seconds)
893
+ TIMEOUT_CONNECT=30
894
+
895
+ # Request body send timeout (seconds)
896
+ TIMEOUT_WRITE=30
897
+
898
+ # Connection pool acquisition timeout (seconds)
899
+ TIMEOUT_POOL=60
900
+
901
+ # Read timeout between chunks for streaming requests (seconds)
902
+ # If no data arrives for this duration, the connection is considered stalled
903
+ TIMEOUT_READ_STREAMING=180
904
+
905
+ # Read timeout for non-streaming responses (seconds)
906
+ # Longer to accommodate models that take time to generate full responses
907
+ TIMEOUT_READ_NON_STREAMING=600
908
+ ```
909
+
910
+ #### Streaming vs Non-Streaming Behavior
911
+
912
+ **Streaming Requests** (`TimeoutConfig.streaming()`):
913
+ - Uses shorter read timeout (default 3 minutes)
914
+ - Timer resets every time a chunk arrives
915
+ - If no data for 3 minutes β†’ connection considered dead β†’ failover to next credential
916
+ - Appropriate for chat completions where tokens should arrive periodically
917
+
918
+ **Non-Streaming Requests** (`TimeoutConfig.non_streaming()`):
919
+ - Uses longer read timeout (default 10 minutes)
920
+ - Server may take significant time to generate the complete response before sending anything
921
+ - Complex reasoning tasks or large outputs may legitimately take several minutes
922
+ - Only used by Antigravity provider's `_handle_non_streaming()` method
923
+
924
+ #### Provider Usage
925
+
926
+ The following providers use `TimeoutConfig`:
927
+
928
+ | Provider | Method | Timeout Type |
929
+ |----------|--------|--------------|
930
+ | `antigravity_provider.py` | `_handle_non_streaming()` | `non_streaming()` |
931
+ | `antigravity_provider.py` | `_handle_streaming()` | `streaming()` |
932
+ | `gemini_cli_provider.py` | `acompletion()` | `streaming()` |
933
+ | `iflow_provider.py` | `acompletion()` | `streaming()` |
934
+ | `qwen_code_provider.py` | `acompletion()` | `streaming()` |
935
+
936
+ **Note:** iFlow, Qwen Code, and Gemini CLI providers always use streaming internally (even for non-streaming requests), aggregating chunks into a complete response. Only Antigravity has a true non-streaming path.
937
+
938
+ #### Tuning Recommendations
939
+
940
+ | Use Case | Recommendation |
941
+ |----------|----------------|
942
+ | **Long thinking tasks** | Increase `TIMEOUT_READ_STREAMING` to 300-360s |
943
+ | **Unstable network** | Increase `TIMEOUT_CONNECT` to 60s |
944
+ | **High concurrency** | Increase `TIMEOUT_POOL` if seeing pool exhaustion |
945
+ | **Large context/output** | Increase `TIMEOUT_READ_NON_STREAMING` to 900s+ |
946
+
947
+ #### Example Configuration
948
+
949
+ ```env
950
+ # For environments with complex reasoning tasks
951
+ TIMEOUT_READ_STREAMING=300
952
+ TIMEOUT_READ_NON_STREAMING=900
953
+
954
+ # For unstable network conditions
955
+ TIMEOUT_CONNECT=60
956
+ TIMEOUT_POOL=120
957
+ ```
958
+
959
+ ---
960
+
961
 
962
  ---
963