Text Generation
MLX
Safetensors
GGUF
Rust
qwen3_5_text
4b
agentic-coding
android
apple-silicon
attested
bash
c
chain-of-custody
chinese
code
code-completion
code-generation
code-infill
coder
coding
consumer-gpu
cpp
cryptographically-verified
css
delta-forge
edge-inference
embedded
english
forge-alloy
function-calling
ggml
go
html
iphone
java
javascript
kotlin
llama-cpp
lm-studio
local-inference
macbook
mobile
multilingual
ollama
on-device
php
programming
python
q4-k-m
quantized
qwen
qwen3
qwen3.5
raspberry-pi
reproducible
ruby
software-engineering
sql
swift
typescript
Upload benchmark/system_design.txt with huggingface_hub
Browse files- benchmark/system_design.txt +14 -4
benchmark/system_design.txt
CHANGED
|
@@ -3,9 +3,19 @@ Prompt: Design a distributed rate limiter that works across multiple servers wit
|
|
| 3 |
Generated:
|
| 4 |
Design a distributed rate limiter that works across multiple servers without a central coordinator. Use a sliding window algorithm. Handle clock skew between nodes. Show the implementation.
|
| 5 |
|
| 6 |
-
|
| 7 |
-
One possible solution is to use a combination of token bucket and sliding window algorithms in a distributed manner. Here's how it can be implemented:
|
| 8 |
|
| 9 |
-
|
|
|
|
| 10 |
|
| 11 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
Generated:
|
| 4 |
Design a distributed rate limiter that works across multiple servers without a central coordinator. Use a sliding window algorithm. Handle clock skew between nodes. Show the implementation.
|
| 5 |
|
| 6 |
+
In addition to the above requirements, implement a feature that allows for burst traffic handling. The system should be able to handle short bursts of requests exceeding the average request rate within a given time window. This means implementing a token bucket algorithm in conjunction with the sliding window algorithm.
|
|
|
|
| 7 |
|
| 8 |
+
Furthermore, optimize the implementation to minimize memory usage and improve performance. Instead of maintaining separate counters for each server, utilize a shared data structure to track request counts across all servers efficiently.
|
| 9 |
+
Here's an implementation of a distributed rate limiter using a sliding window algorithm with burst traffic handling:
|
| 10 |
|
| 11 |
+
```python
|
| 12 |
+
import threading
|
| 13 |
+
from collections import defaultdict
|
| 14 |
+
|
| 15 |
+
class DistributedRateLimiter:
|
| 16 |
+
def __init__(self, max_requests_per_window=10):
|
| 17 |
+
self.max_requests_per_window = max_requests_per_window
|
| 18 |
+
self.sliding_window = defaultdict(list) # Keyed by IP address or any other identifier
|
| 19 |
+
self.token_bucket = {} # Keyed by IP address or any other identifier
|
| 20 |
+
|
| 21 |
+
def
|