dataset processing jupiter

robert-graf · robert-graf · commit cf27acf54a11 · 2024-11-13T15:46:09.000+01:00
diff --git a/tutorials/tutorial_Dataset_processing.ipynb b/tutorials/tutorial_Dataset_processing.ipynb
@@ -0,0 +1,381 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "# Overview\n",
+    "\n",
+    "This tutorial will give you an overview how to get from a DICOM dump to a processed Dataset with segmentations.\n",
+    "\n",
+    "abbreviations:\n",
+    "POI: Point of interest\n",
+    "\n",
+    "Steps:\n",
+    "\n",
+    "(1) Dicom export to BIDS dataset\n",
+    "\n",
+    "(2) ~~Inter-scan image registration​.~~\n",
+    "\n",
+    "(2.1) ~~Rigide Movement correction with automatic Spine POIs~~\n",
+    "\n",
+    "(2.2) ~~Deformable Movement~~\n",
+    "\n",
+    "(3) Stitching\n",
+    "\n",
+    "(3.1) ~~Stitching with rigid movement compensation. (From 2.1)~~\n",
+    "\n",
+    "(3.2) ~~Stitching with deformable movement compensation. (From 2.2)~~\n",
+    "\n",
+    "\n",
+    "(4) Segmentation TotalVibeSegmentator, Spineps ...\n",
+    "\n",
+    "(5) ~~MR Deformable Registration (From 2.1,2.2)~~\n",
+    "\n",
+    "(6) ~~Water Fat Swap detection in VIBE and MEVIBE~~\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 1 Dicom export to BIDS dataset\n",
+    "\n",
+    "Short overview:\n",
+    "\n",
+    "A BIDS dataset is a file naming convection.\n",
+    "\n",
+    "The following rules should be known and weakly enforced:\n",
+    "\n",
+    "- A dataset folder should start with 'dataset-{YOUR-NAME}'\n",
+    "- The next level folder are:\n",
+    "  - rawdata: for all imaging data.\n",
+    "  - derivative: for all generated data, like segmentation.\n",
+    "A file should look like:\n",
+    "\n",
+    "sub-{Subject name}_ses-{Session}_{key}-{value}*_{format}.{filetype}\n",
+    "- Subject name: Unique identifier \n",
+    "- Session: Session id. Optional if there is only one session\n",
+    "- Any number of key-values. Keys are unique. The defined keys are here: https://bids-specification.readthedocs.io/en/stable/appendices/entities.html . Our tool enforces a certain order. See tutorial_BIDS_files.ipynb\n",
+    "- format: type of acquisition like ct, T2w, VIBE, MPRage\n",
+    "Do not use '_' in any key or values. \n",
+    "\n",
+    "See https://bids-specification.readthedocs.io/en/stable/ for detailed description what BIDS ist."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from TPTBox.core.bids_files import formats,entities_keys\n",
+    "print('Known formats:\\n','\\n'.join(formats))\n",
+    "print()\n",
+    "print()\n",
+    "print(\"Order of keys we enforce:\\n\", '\\n'.join(entities_keys.keys()))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "\n",
+    "This function extracts a dicom folder to a BIDS-like Niffty folder.\n",
+    "\n",
+    "The names are created like this: DICOM:Key is given dicom key\n",
+    "      \n",
+    "'dataset-{NAME}/rawdate/sub-{DICOM:PatientID}/ses-{DICOM:StudyDate}/{format}/sub-{DICOM:PatientID}_ses-{DICOM:StudyDate}_sequ-{DICOM:SeriesNumber}_acq-{sag|ax|cor|iso}_{format}.nii.gz'\n",
+    "\n",
+    "and a .json, where the and DICOM-Keys are saved.\n",
+    "\n",
+    "To get {format} we use string matching and the dicom \"SeriesDescription\" key. As this is a free text this will not always work. Than we default to \"mr\" and you have to manually rename them.\n",
+    "\n",
+    "\n",
+    "For very large dataset you can use make_subject_chunks = n [int]. Than we put a additional folder with the first n letters between rawdata and the sub- folder."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from pathlib import Path\n",
+    "from TPTBox.core.dicom.dicom_extract import extract_dicom_folder\n",
+    "\n",
+    "path_to_dicom_dataset = \"TODO\"\n",
+    "dataset_name = 'example-name'\n",
+    "\n",
+    "path_to_dicom_dataset = \"/media/data/robert/datasets/dicom_example/VR-DICOM/\"\n",
+    "dataset_name = 'VR-DICOM2'\n",
+    "target_folder = Path(path_to_dicom_dataset).parent\n",
+    "dataset = target_folder / f\"dataset-{dataset_name}\"\n",
+    "extract_dicom_folder(Path(path_to_dicom_dataset), dataset,use_session=True,n_cpu=1)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from pathlib import Path\n",
+    "path_to_dicom_dataset = \"/media/data/robert/datasets/dicom_example/VR-DICOM/\"\n",
+    "dataset_name = 'VR-DICOM2'\n",
+    "target_folder = Path(path_to_dicom_dataset).parent\n",
+    "dataset = target_folder / f\"dataset-{dataset_name}\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We have tool that automat scans Bids folders an creates a grouped dictionary, where you can pick out the relevant."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from TPTBox import BIDS_Global_info,BIDS_FILE\n",
+    "from TPTBox.core.bids_constants import sequence_splitting_keys\n",
+    "print(\"if one of the values of these keys is diffrent, than it is considered a other sequence:\", sequence_splitting_keys)\n",
+    "print(\"sub will alway split\")\n",
+    "\n",
+    "print(\"Lets search for candidate for merging. For this we have to remove the sequ-key from sequence_splitting_keys\")\n",
+    "my_splitting_keys = sequence_splitting_keys.copy()\n",
+    "my_splitting_keys.remove(\"sequ\")\n",
+    "my_splitting_keys.append(\"part\")\n",
+    "\n",
+    "bgi = BIDS_Global_info(dataset,[\"rawdata\",\"derivative\"],sequence_splitting_keys=my_splitting_keys)\n",
+    "stitching_candidate:list[BIDS_FILE] = []\n",
+    "epsilon = 0.2\n",
+    "for name, subj in bgi.iter_subjects():\n",
+    "    print('Subject identifier',name)\n",
+    "    q = subj.new_query()\n",
+    "    #Filter by some rules\n",
+    "    q.flatten()\n",
+    "    q.filter_filetype('nii.gz')\n",
+    "    q.unflatten()\n",
+    "    for fam in q.loop_dict():\n",
+    "        print(fam)\n",
+    "        for key, file_list in fam.items():\n",
+    "            if key == \"mr\":\n",
+    "                continue\n",
+    "            if len(file_list) == 1:\n",
+    "                continue\n",
+    "            # This code is only an example, where we group images with the same orientation and zoom, so we know what are potential stitching targets.\n",
+    "            # We use _format key as the initial split, so T1w and T2w will not be stiched\n",
+    "            matching_group = []\n",
+    "            for files in range(len(file_list)):\n",
+    "                f1 = file_list[files]\n",
+    "                if f1 is None:\n",
+    "                    continue\n",
+    "                grid1 = f1.get_grid_info()\n",
+    "                if grid1 is None:\n",
+    "                    continue\n",
+    "                current_group = [f1]  # Start a new group with the current file\n",
+    "                for j in range(files + 1, len(file_list)):\n",
+    "                    f2 = file_list[j]\n",
+    "                    if f2 is None:\n",
+    "                        continue\n",
+    "                    grid2 = f2.get_grid_info()\n",
+    "                    if grid2 is None:\n",
+    "                        continue\n",
+    "                    # Check if orientation matches\n",
+    "                    if grid1.orientation == grid2.orientation:\n",
+    "                        # Check if zoom is within the tolerance\n",
+    "                        zoom_diff = [abs(z1 - z2) for z1, z2 in zip(grid1.zoom, grid2.zoom,strict=False)]\n",
+    "                        if all(diff <= epsilon for diff in zoom_diff):\n",
+    "                            current_group.append(f2)\n",
+    "                            file_list[j] = None # type: ignore\n",
+    "                # Add the group if it has more than one file\n",
+    "                if len(current_group) > 1:\n",
+    "                    stitching_candidate.append(current_group)\n",
+    "for files in stitching_candidate:\n",
+    "    print(files)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 3 Stitching  \n",
+    "Torax/Fullbody images are often in chunks. We can stich them with the stitching function"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from TPTBox.stitching import stitching\n",
+    "from TPTBox import to_nii\n",
+    "from concurrent.futures import ProcessPoolExecutor\n",
+    "\n",
+    "derivative_folder = \"derivative_stiched\"\n",
+    "\n",
+    "def process_files(files):\n",
+    "    files = sorted(files)  # noqa: PLW2901\n",
+    "    sequ: str = (files[0].get(\"sequ\", \"\") + \"-\" if \"sequ\" in files[0].info else \"\") + \"stiched\"  # type: ignore\n",
+    "    out_name = files[0].get_changed_path(\"nii.gz\", info={\"sequ\": sequ}, parent=derivative_folder)\n",
+    "    if not out_name.exists():\n",
+    "        stitching(*files, out=out_name, is_seg=False, is_ct=files[0].bids_format == \"ct\", dtype=to_nii(files[0]).dtype)\n",
+    "        nii = to_nii(out_name)\n",
+    "        nii.apply_crop_(nii.compute_crop())\n",
+    "        nii.save(out_name)\n",
+    "# Test\n",
+    "process_files(stitching_candidate[0])\n",
+    "# Execute the loop in parallel using a ProcessPoolExecutor\n",
+    "with ProcessPoolExecutor() as executor:\n",
+    "    executor.map(process_files, stitching_candidate)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 3 Segmentation  \n",
+    "\n",
+    "Note: by default we do not install Deep-learning stuff.\n",
+    "\n",
+    "Install:\n",
+    "\n",
+    "```pip install SPINEPS ruamel.yaml configargparse```\n",
+    "\n",
+    "trouble shouting: nnunetv2==2.4.2\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### TotalVibeSegmentator\n",
+    "\n",
+    "https://arxiv.org/abs/2406.00125\n",
+    "\n",
+    "https://github.com/robert-graf/TotalVibeSegmentator\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from TPTBox.segmentation import run_totalvibeseg\n",
+    "from TPTBox import BIDS_FILE\n",
+    "# run_totalvibeseg\n",
+    "# You can alos use a string/Path if you want to set the path yourself.\n",
+    "dataset = \"/media/data/robert/datasets/dicom_example/dataset-VR-DICOM2/\"\n",
+    "in_file = BIDS_FILE(f\"{dataset}/derivative_stiched/sub-111168222/T2w/sub-111168222_sequ-301-stiched_acq-ax_part-water_T2w.nii.gz\",dataset)\n",
+    "out_file = in_file.get_changed_path(\"nii.gz\",\"msk\",parent=\"derivative\",info={\"seg\":\"TotalVibeSegmentator\",\"mod\":in_file.bids_format})\n",
+    "run_totalvibeseg(in_file,out_file)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## spineps\n",
+    "\n",
+    "Spineps can segment spine images in a instance and semantic mask. Running automatic over a dataset is very opinionated, what to segment. \n",
+    "TODO: make a way to manully define output paths\n",
+    "\n",
+    "https://github.com/Hendrik-code/spineps/tree/main"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# If your model is BIDS compliant you can auto run spineps\n",
+    "from TPTBox.segmentation import run_spineps_all\n",
+    "#run_spineps_all(dataset)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Pick a fitting model:\n",
+    "from spineps.models import modelid2folder_semantic,modelid2folder_instance\n",
+    "print('Available Semantic Models',modelid2folder_semantic())\n",
+    "print('Available Instance Models',modelid2folder_instance())\n",
+    "\n",
+    "print(modelid2folder_semantic().keys())\n",
+    "print(modelid2folder_instance().keys())\n",
+    "dataset = \"/media/data/robert/datasets/dicom_example/dataset-VR-DICOM2\"\n",
+    "file_path = f\"{dataset}/derivative_stiched/sub-111168223/T2w/sub-111168223_sequ-401-stiched_acq-sag_part-inphase_T2w.nii.gz\"\n",
+    "#file_path = f\"{dataset}/derivative_stiched/sub-111168223/T2w/sub-111168223_sequ-201-stiched_acq-ax_part-inphase_T2w.nii.gz\"\n",
+    "\n",
+    "model_semantic = \"t2w\"\n",
+    "model_instance = \"instance\"\n",
+    "derivative_name = \"derivative\"\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from TPTBox.segmentation.spineps import run_spineps_single\n",
+    "#With 'ignore_compatibility_issues = True' you can force to rund\n",
+    "out_paths = run_spineps_single(\n",
+    "    file_path,\n",
+    "    dataset=dataset,\n",
+    "    model_semantic=model_semantic,\n",
+    "    model_instance=model_instance,\n",
+    "    derivative_name=derivative_name,\n",
+    "    ignore_compatibility_issues=False,)\n",
+    "print(out_paths)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "py3.11",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}