Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
Spaces:
DataEyond
/
Agentic-Service-Data-Eyond
like
0
Running
App
Files
Files
Community
16
Fetching metadata from the HF Docker repository...
[KM-438][KM-439] Improve Retrieval and Querying feature
#15
by
rhbt6767
- opened
1 day ago
base:
refs/heads/main
←
from:
refs/pr/15
Discussion
Files changed
+4430
-281
[noticket] add gitignore
c87f27f4
[NOTICKET]: add document pipeline, simplify document API
fb871f3d
[NOTICKET]: update folder document_pipelines after pipelines
a4cf97ab
[NOTICKET][DB] refactor code to new repo
7f3bb978
[KM-441] add mean and median
9b593342
[NOTICKET] new metadata format for cleaner code
6b590d94
update document
5a69e0ee
delete duplicate file
3848d7b2
edit document for new pipeline
425e0210
[NOTICKET]: add CSV and XLSX file type
31920c3b
[DB] fix/rename db_pipeline.py
d913315c
[NOTICKET][DB] menyesuaikan format struktur db_pipeline sesuai dengan file lain
e13a9017
[NOTICKET][DB] pisahin db credential ke folder model. add ingestion endpoint at db_client to use db pipeline. add router db_client di main.
347a73aa
[NOTICKET]: use tesseract for extract PDF
6b9a13d4
[NOTICKET]: add Tesseract and Poppler binaries via Git LFS
0a9101a1
[NOTICKET]: update uv.lock
bb79f64b
[NOTICKET][DB] update credential & databaseclient. update settings
0e079550
[NOTICKET] update settings
65a5c6b1
[KM-437][DB] add mysql, sqlserver, bigquery, snowflake connections
43539293
[NOTICKET]: adjusted pyproject.toml for OCR PDF
a00e2ad5
[NOTICKET]: fix merge conflict
6c873460
[NOTICKET][DB] fix mysql pipeline
060c8cc8
[NOTICKET] edit imports
b145c06e
[NOTICKET] minor code refactor
52415b6a
[NOTICKET] add duplicate check for storing database
d310770f
[NOTICKET][DB] add supported dbtype for frontend
a531fcc7
[NOTICKET]: add doctypes endpoint & 10MB file size limit
9debae56
[NOTICKET]: add comments to flag that file type lists must stay in sync
023b7cfe
[NOTICKET]: add to gitignore
bbc8c584
Merge branch 'main' of https://huggingface.co/spaces/DataEyond/Agentic-Service-Data-Eyond into dev_new
7757da18
Merge branch 'main' of https://huggingface.co/spaces/DataEyond/Agentic-Service-Data-Eyond into dev_new
9c090a04
Merge branch 'main' of https://huggingface.co/spaces/DataEyond/Agentic-Service-Data-Eyond into dev_new
5398fec4
Merge branch 'dev_new' of https://huggingface.co/spaces/DataEyond/Agentic-Service-Data-Eyond into dev_new
20bf3f8f
[NOTICKET] add total token logging
b9703fc5
[NOTICKET] add updated_at field for metadata & delete old embedding before appending
cb5ab327
[NO TICKET][document]: add updated_at on metadata
d2f7a483
[NO TICKET][document]: delete vector embedding on table langchain_pg_embedding if user delete document on knowledge
ac3d8c19
[NOTICKET][document]: make a clean output to status error unsupported file type
2814813f
[KM-438][KM-439] framework for knowledge retriever
d1e12641
Merge branch 'main' of https://huggingface.co/spaces/DataEyond/Agentic-Service-Data-Eyond into dev_new
a701ac37
[NOTICKET] fix single source to multiple sources
589ca324
[NOTICKET] fixed multiple sources
e9f2a263
[KM-507] add multiple retrieval method to compare (dense, mmr, bm25, hybrid)
ac6b78d1
[KM-507] add changes to methods
82186504
[NOTICKET] add db_client for querying
e49db601
add to gitignore
83ed7447
[KM-507] add different methods, now using dense cosine
145bca39
[KM-512] create folder for querying from bd/tabular docs
2c8a3e89
[NOTICKET] minor fix in chat.py, add package for query, change schema used to hybrid (cosine+bm25)
15cd3a7f
[KM-512] add Pydantic model the LLM fills via function calling in sql_query, and add same signature for db and tabular
220f59eb
[NOTICKET] rename file name, updated after uv sync
948d6dda
[NOTICKET] update .gitignore
240251c4
Merge branch 'dev_new' of https://huggingface.co/spaces/DataEyond/Agentic-Service-Data-Eyond into dev_new
29efec67
[KM-513][document] add convert to parquet if type file is XLSX and CSV
770f26b1
add to gitignore
1fef470b
[KM-512] connect query executor to user question. add logging for db_executor
abc494f9
[NOTICKET] fix delete, now can filter by user
f273db05
[NOTICKET] db_executor: CTE DML check now walks entire AST root, schema: cast instead of string interpolation
bd2b1d9d
[NOTICKET] fix-revert string change
110ee343
[KM-520] Integrate db query executor pipeline with existing rag retrieve pipeline
a25febe2
[KM-516][KM-517] add new feature; ai can now see table & column names that have fk relationship with retrieved result
f86da27b
[NOTICKET] fix query now use orchestrator msg, rework db pipeline replace ingestion logic
be9bbd9d
[KM-507] now only uses hybrid (cosine and bm25)
40925b45
[NOTICKET]untrack software/ folder (ignored via .gitignore)
0931c10d
[NOTICKET] add pyarrow
432c1fa9
[NOTICKET] add pyarrow
7ff66c9b
[KM-515][document] Make Query for Tabular Type (XLSX & CSV)
36049948
[KM-455][document] decided methods retrieval for document
cf77d20e
[KM-533] add table level schema, differentiate with chunk level. expand retrieval result with FK exploration
fc1239ae
[KM-533] now also retrieves table level chunk
4150ba7e
fix: fix dedup logic
c9d3b337
[NOTICKET] rrf merge now at router level
de32ab04
[NOTICKET] minor refactoring
e4f62b85
fix: query executor now use user question as prompt (sebelumnya pakai hasil orchestrator)
0935ede4
fix: increase K in chat endpoint to 10
b59ef76f
feat: add sheet-level chunk on CSV/XLSX ingestion
8daf9b59
fix: 5 bug fixes on tabular executor
a49dc1b5
[NOTICKET][doc] fix aggregate count operation when value_col is not specified
00aa61d9
[NOTICKET] now retrieve db tables first, then get column from the obtained tables. reduce k to 5
bb29492a
[NOTICKET][doc] add guard if filename None
b7fbaebb
fix fallback to fresh retrieval on corrupted Redis cache
9cb950f7
[NOTICKET][doc] validate embedding vector for NaN/Infinity in manhattan retriever
2167a5bd
[NOTICKET] pass orchestrator search_query to sql executor for multi-turn context
23eeb2d3
[NOTICKET][db] add sheet-level retrieval and focus LLM schema context to retrieved columns
a205d0c5
fix: minor returned type if sql writes limit yang melebihi batas
b4df8b1d
[NOTICKET][doc] add sheet-level leg and RRF voting for tabular retrieval
5f86993f
[NOTICKET][doc] remove column filter and fallback cap for full-schema approach
959b1b00
[NOTICKET][doc] correct metadata key path in _format_context
16ab9164
NOTICKET] add dev dependency group and update gitignore
36ffff42
make executors self-contained, remove redundant pre-filter
73b7fe32
fix sorted ranking so model uses overall sorted retrieved chunks
3e7924d5
rhbt6767
DataEyond org
1 day ago
No description provided.
rhbt6767
changed pull request title from
test
to
[KM-438][KM-439] Improve Retrieval and Querying feature
1 day ago
merge dev_new to main
9257b7bc
rhbt6767
changed pull request status to
open
1 day ago
ishaq101
changed pull request status to
merged
1 day ago
Edit
Preview
Upload images, audio, and videos by dragging in the text input, pasting, or
clicking here
.
Tap or paste here to upload images
Comment
·
Sign up
or
log in
to comment