Update README.md
Browse files
README.md
CHANGED
|
@@ -74,6 +74,25 @@ Stride: 255 tokens (50% overlap)
|
|
| 74 |
- **Fallback mechanisms**: Intelligent splitting when no semantic boundaries found
|
| 75 |
- **Combined limits**: Supports both token AND character limits simultaneously
|
| 76 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 77 |
## Quick Start
|
| 78 |
|
| 79 |
### Installation
|
|
@@ -963,25 +982,6 @@ Average tokens per chunk: 236.9
|
|
| 963 |
- Semantic boundaries preserved
|
| 964 |
- No text loss or duplication
|
| 965 |
|
| 966 |
-
|
| 967 |
-
|
| 968 |
-
# Use Cases
|
| 969 |
-
|
| 970 |
-
## Perfect for RAG Systems
|
| 971 |
-
- **Vector Databases**: Ensure chunks fit embedding model limits
|
| 972 |
-
- **Search Applications**: Optimal chunk sizes for retrieval
|
| 973 |
-
- **Question Answering**: Maintain semantic coherence
|
| 974 |
-
|
| 975 |
-
## Document Processing
|
| 976 |
-
- **Academic Papers**: Respect section and paragraph boundaries
|
| 977 |
-
- **Legal Documents**: Maintain clause integrity
|
| 978 |
-
- **News Articles**: Preserve story flow and context
|
| 979 |
-
|
| 980 |
-
## Content Management
|
| 981 |
-
- **CMS Integration**: Automatic content segmentation
|
| 982 |
-
- **API Limits**: Respect external service constraints
|
| 983 |
-
- **Storage Optimization**: Consistent chunk sizes for databases
|
| 984 |
-
|
| 985 |
---
|
| 986 |
|
| 987 |
# Chunking Strategies
|
|
|
|
| 74 |
- **Fallback mechanisms**: Intelligent splitting when no semantic boundaries found
|
| 75 |
- **Combined limits**: Supports both token AND character limits simultaneously
|
| 76 |
|
| 77 |
+
|
| 78 |
+
# Use Cases
|
| 79 |
+
|
| 80 |
+
## Perfect for RAG Systems
|
| 81 |
+
- **Vector Databases**: Ensure chunks fit embedding model limits
|
| 82 |
+
- **Search Applications**: Optimal chunk sizes for retrieval
|
| 83 |
+
- **Question Answering**: Maintain semantic coherence
|
| 84 |
+
|
| 85 |
+
## Document Processing
|
| 86 |
+
- **Academic Papers**: Respect section and paragraph boundaries
|
| 87 |
+
- **Legal Documents**: Maintain clause integrity
|
| 88 |
+
- **News Articles**: Preserve story flow and context
|
| 89 |
+
|
| 90 |
+
## Content Management
|
| 91 |
+
- **CMS Integration**: Automatic content segmentation
|
| 92 |
+
- **API Limits**: Respect external service constraints
|
| 93 |
+
- **Storage Optimization**: Consistent chunk sizes for databases
|
| 94 |
+
|
| 95 |
+
|
| 96 |
## Quick Start
|
| 97 |
|
| 98 |
### Installation
|
|
|
|
| 982 |
- Semantic boundaries preserved
|
| 983 |
- No text loss or duplication
|
| 984 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 985 |
---
|
| 986 |
|
| 987 |
# Chunking Strategies
|