9x25dillon commited on
Commit
9f04601
ยท
verified ยท
1 Parent(s): 968c919

Upload folder using huggingface_hub

Browse files
Files changed (2) hide show
  1. README_HF.md +39 -0
  2. requirements.txt +14 -58
README_HF.md ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Advanced Tokenizer System for LiMp
2
+
3
+ ## ๐Ÿง  Overview
4
+ Sophisticated multi-modal tokenization system with semantic awareness, mathematical processing, and fractal-based tokenization.
5
+
6
+ ## ๐Ÿš€ Key Features
7
+ - **Multi-Modal Tokenization**: Traditional, semantic, mathematical, and fractal
8
+ - **High Capacity Processing**: Handles unlimited character counts
9
+ - **Intelligent Chunking**: Semantic-aware with context preservation
10
+ - **Batch Processing**: High-performance parallel processing
11
+ - **Training Data Generation**: Creates high-quality training datasets
12
+ - **Mathematical AI**: Advanced mathematical expression processing
13
+
14
+ ## ๐Ÿ›  Quick Start
15
+ ```python
16
+ from advanced_tokenizer_system import AdvancedTokenizer, TokenizerConfig
17
+
18
+ config = TokenizerConfig()
19
+ tokenizer = AdvancedTokenizer(config)
20
+
21
+ import asyncio
22
+ result = await tokenizer.tokenize("Hello world! x^2 + y^2 = z^2")
23
+ print(f"Tokens: {result.total_tokens}")
24
+ ```
25
+
26
+ ## ๐Ÿ“ Files
27
+ - `advanced_tokenizer_system.py` - Main tokenizer
28
+ - `batch_processing_system.py` - Batch processing
29
+ - `high_capacity_input_processor.py` - Large text processing
30
+ - `intelligent_chunking_processor.py` - Smart chunking
31
+ - `advanced_training_data_generator.py` - Training data
32
+ - `matrix_training_data.jsonl` - Sample data
33
+
34
+ ## ๐Ÿงช Test
35
+ ```bash
36
+ python3 working_test.py
37
+ ```
38
+
39
+ Ready for advanced AI tokenization! ๐Ÿš€
requirements.txt CHANGED
@@ -1,58 +1,14 @@
1
- # Numbskull - Advanced AI Embedding Pipeline Requirements
2
- # Core dependencies for the sophisticated multi-modal embedding system
3
- # Updated: October 2024 - Pinned to latest secure versions
4
-
5
- # Core scientific computing
6
- numpy==2.3.3 # Updated from >=1.24.0
7
- scipy==1.16.2 # Updated from >=1.10.0
8
-
9
- # Mathematical processing
10
- sympy==1.14.0 # Updated from >=1.12
11
- matplotlib==3.10.7 # Updated from >=3.7.0
12
-
13
- # Machine learning
14
- scikit-learn==1.7.2 # Updated from >=1.3.0
15
-
16
- # Async HTTP and networking
17
- httpx==0.28.1 # Updated from >=0.24.0 - includes security fixes
18
- aiofiles==25.1.0 # Updated from >=23.2.1
19
-
20
- # Database connectivity
21
- asyncpg==0.30.0 # Updated from >=0.28.0
22
- psycopg2-binary==2.9.11 # Updated from >=2.9.0 - includes security patches
23
-
24
- # Data processing
25
- pandas==2.3.3 # Updated from >=2.0.0
26
- pydantic==2.12.0 # Updated from >=2.0.0 - includes validation improvements
27
-
28
- # Web framework (for API endpoints)
29
- fastapi==0.118.3 # Updated from >=0.100.0 - includes security fixes
30
- uvicorn==0.37.0 # Updated from >=0.23.0 - includes security updates
31
-
32
- # Utilities
33
- python-dateutil==2.9.0.post0 # Updated from >=2.8.0
34
- python-multipart==0.0.20 # Updated from >=0.0.6
35
-
36
- # Development and testing
37
- pytest==8.4.2 # Updated from >=7.4.0
38
- pytest-asyncio==1.2.0 # Updated from >=0.21.0
39
- black==25.9.0 # Updated from >=23.0.0
40
- flake8==7.3.0 # Updated from >=6.0.0
41
-
42
- # Graph/complex networks for emergent modules
43
- networkx==3.5 # Updated from >=3.1
44
-
45
- # Optional dependencies (install separately if needed)
46
- # sentence-transformers>=2.2.0
47
- # transformers>=4.30.0
48
- # torch>=2.0.0
49
- # faiss-cpu>=1.7.4
50
- # annoy>=1.17.0
51
- # hnswlib>=0.7.0
52
-
53
- # Numbskull integration - Advanced embedding pipeline
54
- # Install as editable package from local path
55
- -e /home/kill/numbskull
56
-
57
- # Additional dependency for HTTP requests in dual orchestrator
58
- requests>=2.31.0
 
1
+ numpy>=1.21.0
2
+ torch>=1.9.0
3
+ asyncio
4
+ pathlib
5
+ dataclasses
6
+ typing
7
+ datetime
8
+ json
9
+ hashlib
10
+ re
11
+ multiprocessing
12
+ threading
13
+ queue
14
+ psutil