Spaces:
Sleeping
Sleeping
Commit
·
90e01a8
1
Parent(s):
83056c6
Update tools section
Browse filesExpanded the vision for Open Data BIDSifier and added guidelines for tools, coding practices, and LLM usage.
README.md
CHANGED
|
@@ -14,7 +14,34 @@ In this proof-of-concept, we aim to determine whether a coordinated system of AI
|
|
| 14 |
|
| 15 |
## The Vision
|
| 16 |
|
| 17 |
-
If successful, Open Data BIDSifier will serve as a foundation for
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
|
| 19 |
## Rough Work Plan
|
| 20 |
|
|
@@ -40,19 +67,11 @@ Harmonizing **metadata** *with LLM based tools*:
|
|
| 40 |
2. File structure
|
| 41 |
3. Study metadata (fetching from repository HTMLs too)
|
| 42 |
|
| 43 |
-
For the LLM-assisted workflow, following **tools** are suggested: Github Copilot in VS Code, LLMAnything.
|
| 44 |
-
Pick **any LLM** really, smaller LLMs tend to hallucinate more, therefore it is more interesting if they can make it too!
|
| 45 |
-
Suggestions bigger LLMs: GPT-5, Claude, Kimi-K2, DeepSeek-R1
|
| 46 |
-
Suggestions smaller LLMs: SmoLM, LLaMA-7B, Qwen-7B
|
| 47 |
-
|
| 48 |
Harmonizing **metadata** *by hand*:
|
| 49 |
4. Annotation column names (from non-BIDS to BIDS) - working with tabular data
|
| 50 |
5. File structure
|
| 51 |
6. Study metadata (fetching from repository HTMLs too)
|
| 52 |
|
| 53 |
-
For the manual harmonization, an IDE with Python / R is useful; as well as OpenRefine, an open-source tool for working with tabular data.
|
| 54 |
-
|
| 55 |
-
|
| 56 |
Record the problems and the working time for both manual and LLM assisted harmonization.
|
| 57 |
|
| 58 |
Time planned: ~4 hours working time are planned for this step.
|
|
|
|
| 14 |
|
| 15 |
## The Vision
|
| 16 |
|
| 17 |
+
If successful, Open Data BIDSifier will serve as a foundation for an AI agent that can identify and harmonize different datasets from open data, making sure these are immediately usable for machine learning and statistical analysis.
|
| 18 |
+
|
| 19 |
+
## Tools and Structure
|
| 20 |
+
|
| 21 |
+
We will use this repository (https://github.com/stefanches7/AI-assisted-Neuroimaging-harmonization) as an intermittant commit place. Please, make yourself familiar with Git and Github. [This intro](https://docs.github.com/de/get-started/start-your-journey/hello-world) can be useful for that.
|
| 22 |
+
|
| 23 |
+
### Git guidelines
|
| 24 |
+
|
| 25 |
+
Open a new branch and create Pull requests to the main for the additions.
|
| 26 |
+
|
| 27 |
+
### Working with data
|
| 28 |
+
|
| 29 |
+
We will work with raw data (Neuroimaging) and annotation / metadata (tabular data).
|
| 30 |
+
For Neuroimaging, `nibabel` (.nii file format) and `pydicom` (.dcm file format) are the most advanced Python libraries.
|
| 31 |
+
For working with tabular data and manual harmonization, Python package `pandas` is the standard way; as well as OpenRefine, an open-source tool for working with tabular data.
|
| 32 |
+
|
| 33 |
+
### LLM usage
|
| 34 |
+
|
| 35 |
+
For the LLM-assisted workflow, following **tools** are suggested: Github Copilot in VS Code, LLMAnything.
|
| 36 |
+
Pick **any LLM** really, smaller LLMs tend to hallucinate more, therefore it is more interesting if they can make it too!
|
| 37 |
+
Suggestions bigger LLMs: GPT-5, Claude, Kimi-K2, DeepSeek-R1
|
| 38 |
+
Suggestions smaller LLMs: SmoLM, LLaMA-7B, Qwen-7B
|
| 39 |
+
|
| 40 |
+
### Coding & Vibe Coding
|
| 41 |
+
|
| 42 |
+
We will use **Python** to code. I recommend using **Anaconda** package manager as a tool to manage the Python package environments. If you are not sure what the previous 2 sentences really mean, I recommend [reading this intro to Python & Conda](https://www.anaconda.com/topics/choosing-between-anaconda-vs-python#:~:text=Anaconda%20is%20a%20distribution%20that,machine%20learning%2C%20and%20scientific%20computing.)
|
| 43 |
+
|
| 44 |
+
LLMs can assist in writing code, but can also prove counterproductive and write bad (spaghetti), duplicated and erroneous code. It is instructful to be able check their output and correct it manually.
|
| 45 |
|
| 46 |
## Rough Work Plan
|
| 47 |
|
|
|
|
| 67 |
2. File structure
|
| 68 |
3. Study metadata (fetching from repository HTMLs too)
|
| 69 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 70 |
Harmonizing **metadata** *by hand*:
|
| 71 |
4. Annotation column names (from non-BIDS to BIDS) - working with tabular data
|
| 72 |
5. File structure
|
| 73 |
6. Study metadata (fetching from repository HTMLs too)
|
| 74 |
|
|
|
|
|
|
|
|
|
|
| 75 |
Record the problems and the working time for both manual and LLM assisted harmonization.
|
| 76 |
|
| 77 |
Time planned: ~4 hours working time are planned for this step.
|