RAG_AIEXP_01 / documents_prep.py

Commit History

added the 100 topk
75fe00d

MrSimple07 commited on

removed the part removing hyperh + top 80, cutoff = 0.55
429d2d4

MrSimple07 commited on

normalized fixed + in header text as well
57a8908

MrSimple07 commited on

fixing normalizing hypens
9c77451

MrSimple07 commited on

remove hyphens
60178fd

MrSimple07 commited on

normalize anyways + max row = 15 + max chars = 3000
30336c3

MrSimple07 commited on

removed normalization
4834e86

MrSimple07 commited on

added the new function to replace latin crylic c25
11e130c

MrSimple07 commited on

big debug change
9c9aff4

MrSimple07 commited on

big debug change
04f5154

MrSimple07 commited on

new debug functions + 2000 chunk size
aafe88b

MrSimple07 commited on

new debug functions + 2000 chunk size
05c597d

MrSimple07 commited on

added debugging functions for the c25
8d6a517

MrSimple07 commited on

top k = 150 + max chunk size is 4000 + max rows =15 + sim cut off = 0.45
35eb459

MrSimple07 commited on

top k = 80 + max chunk size is 3000
9f55dc6

MrSimple07 commited on

new api = retrieve chunks + some more text fixing
33c996e

MrSimple07 commited on

max chunk size= 4000 + max row = 5
5f6b6af

MrSimple07 commited on

max chunk size= 4000 + max row = 5
8c371f8

MrSimple07 commited on

max chunk size= 4000 + max row = 5
15a7dee

MrSimple07 commited on

new documents prep
63ebb90

MrSimple07 commited on

new documents prep
d1e7fd2

MrSimple07 commited on

new function for should keep table whole for some files
b91dfb0

MrSimple07 commited on

max rows = 20, 150 + 150 bm25
4c7b0a2

MrSimple07 commited on

chunk size = 2048 + rows=15
2eb8b63

MrSimple07 commited on

chunk size = 2048 + rows=15
2b217eb

MrSimple07 commited on

chunk size = 2048 + rows=15
634c04c

MrSimple07 commited on

chunk size = 2048 + rows=15
2676cd6

MrSimple07 commited on

chunk size = 2048 + rows=15
f1379ba

MrSimple07 commited on

chunk size = 2048 + rows=15
54b1e69

MrSimple07 commited on

chunk size = 1500
c354c08

MrSimple07 commited on

max rows = 20, 100 + 100 bm25
7504d82

MrSimple07 commited on

top k reranker = 20, max rows = 10, max chars= 2000 + new deduplication
ec64429

MrSimple07 commited on

top k reranker = 20, max rows = 10, max chars= 4000 + new deduplication
d577496

MrSimple07 commited on

top k reranker = 25, max rows = 5, max chars= 4000
c0c8ab9

MrSimple07 commited on

adaptive table chunking
e04e66f

MrSimple07 commited on

Much lower reranking threshold (-0.5 instead of 0.1) + detailed score logging
d99512d

MrSimple07 commited on

Much lower reranking threshold (-0.5 instead of 0.1) + detailed score logging
a83db61

MrSimple07 commited on

Much lower reranking threshold (-0.5 instead of 0.1) + detailed score logging
806f3f9

MrSimple07 commited on

removed normalization doc id
ad8e8ec

MrSimple07 commited on

index retriever = 100 + 100
31659d7

MrSimple07 commited on

max_chars = 1500 + doc id retriever
ae5a669

MrSimple07 commited on

chunk size = 1024 + max chars = 1200 + deduplication variant
f79b229

MrSimple07 commited on

chunk size = 1024 + max chars = 1200 + keyword based
26c4970

MrSimple07 commited on

chunk siz = 512, max_chars = 1000
9bad02a

MrSimple07 commited on

chunk siz = 1000, max_chars = 1500
7f19939

MrSimple07 commited on

max chars = 3000 + removed normalize_doc_id
7c138ed

MrSimple07 commited on

max chars = 2000 + removed normalize_doc_id
05822e9

MrSimple07 commited on

max chars = 2000 for tables + new answer_question
7565a55

MrSimple07 commited on

max rows = 10 + new answer_question + reranking
2edec29

MrSimple07 commited on