Update README.md

#3
by wamreyaz - opened
Files changed (1) hide show
  1. README.md +6 -0
README.md CHANGED
@@ -42,6 +42,9 @@ print(texts[0])
42
 
43
  > The first `generate()` call is slower due to `torch.compile` building optimized kernels. Subsequent calls are much faster.
44
 
 
 
 
45
  ## Categories
46
 
47
  By default, category is `"plain"` (general text extraction). You can specify a category to use a task-specific prompt:
@@ -96,6 +99,9 @@ for det in results[0]:
96
  }
97
  ```
98
 
 
 
 
99
  ## Citation
100
 
101
 
 
42
 
43
  > The first `generate()` call is slower due to `torch.compile` building optimized kernels. Subsequent calls are much faster.
44
 
45
+ We already use PagedInference which is quite fast for most interactive tasks, but for large-scale deployment, check the Deployment section
46
+ which provides a vLLM backend.
47
+
48
  ## Categories
49
 
50
  By default, category is `"plain"` (general text extraction). You can specify a category to use a task-specific prompt:
 
99
  }
100
  ```
101
 
102
+ ## Deployment
103
+ TODO: explain how to set up the vLLM server.
104
+
105
  ## Citation
106
 
107