RAG_AIEXP_01 / documents_prep.py

Commit History

Fixed HF_REPO_ID + Added force_download=True + Enhanced status messages
da3779b

MrSimple07 commited on

new version with document uploading + fixed readme + xlsx processer inside documents prep
5099a0a

MrSimple07 commited on

old version with fixed, 3000, 30
0d6b2c5

MrSimple07 commited on

added the new llm query expanding + 4000,30 + latin to cyrilic
3dcab53

MrSimple07 commited on

added the new llm query expanding
6db5f4f

MrSimple07 commited on

new keyboard based approachj
379f6e4

MrSimple07 commited on

from cry to latin + 2000,30
15ae02f

MrSimple07 commited on

added a new loggers for normalizations
4c96122

MrSimple07 commited on

added a new loggers for normalizations
8362ae9

MrSimple07 commited on

added a new loggers for normalizations
035fbdc

MrSimple07 commited on

added a new loggers for normalizations
0f89c6b

MrSimple07 commited on

added a new loggers for normalizations
bd2b030

MrSimple07 commited on

added a new loggers for normalizations
5263b61

MrSimple07 commited on

added a new loggers for normalizations
9ce9909

MrSimple07 commited on

new steel latin to crylic
fbed18d

MrSimple07 commited on

added the sheet name + - table number handling + 4500, 20
07e9959

MrSimple07 commited on

new normalizer C to Latin C + max table = 20, max chunk = 4000
6c839c3

MrSimple07 commited on

new normalizer C to Latin C + max table = 20, max chunk = 3000
7c27a96

MrSimple07 commited on

new normalizer C to Latin C
78e6c03

MrSimple07 commited on

old version of documents prep
0fa3553

MrSimple07 commited on

added the 100 topk
75fe00d

MrSimple07 commited on

removed the part removing hyperh + top 80, cutoff = 0.55
429d2d4

MrSimple07 commited on

normalized fixed + in header text as well
57a8908

MrSimple07 commited on

fixing normalizing hypens
9c77451

MrSimple07 commited on

remove hyphens
60178fd

MrSimple07 commited on

normalize anyways + max row = 15 + max chars = 3000
30336c3

MrSimple07 commited on

removed normalization
4834e86

MrSimple07 commited on

added the new function to replace latin crylic c25
11e130c

MrSimple07 commited on

big debug change
9c9aff4

MrSimple07 commited on

big debug change
04f5154

MrSimple07 commited on

new debug functions + 2000 chunk size
aafe88b

MrSimple07 commited on

new debug functions + 2000 chunk size
05c597d

MrSimple07 commited on

added debugging functions for the c25
8d6a517

MrSimple07 commited on

top k = 150 + max chunk size is 4000 + max rows =15 + sim cut off = 0.45
35eb459

MrSimple07 commited on

top k = 80 + max chunk size is 3000
9f55dc6

MrSimple07 commited on

new api = retrieve chunks + some more text fixing
33c996e

MrSimple07 commited on

max chunk size= 4000 + max row = 5
5f6b6af

MrSimple07 commited on

max chunk size= 4000 + max row = 5
8c371f8

MrSimple07 commited on

max chunk size= 4000 + max row = 5
15a7dee

MrSimple07 commited on

new documents prep
63ebb90

MrSimple07 commited on

new documents prep
d1e7fd2

MrSimple07 commited on

new function for should keep table whole for some files
b91dfb0

MrSimple07 commited on

max rows = 20, 150 + 150 bm25
4c7b0a2

MrSimple07 commited on

chunk size = 2048 + rows=15
2eb8b63

MrSimple07 commited on

chunk size = 2048 + rows=15
2b217eb

MrSimple07 commited on

chunk size = 2048 + rows=15
634c04c

MrSimple07 commited on

chunk size = 2048 + rows=15
2676cd6

MrSimple07 commited on

chunk size = 2048 + rows=15
f1379ba

MrSimple07 commited on

chunk size = 2048 + rows=15
54b1e69

MrSimple07 commited on