{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Tile Generation Tutorial" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Welcome to the tile generation tutorial!\n", "\n", "As a whole slide image is too large for deep learning model training, a slide is often divded into a set of small tiles, and used for training. For tile-based whole slide image analysis, generating tiles and labels is an important and laborious step. With LUNA tiling CLIs and tutorials, you can easily generate tile labels and get your data ready for downstream analysis. In this notebook, we will see how to generate tiles and labels using LUNA tiling CLIs. Here are the main steps we will review:\n", "\n", "1. Load slides\n", "2. Generate tiles, labels\n", "3. Collect tiles for model training\n", "\n", "Through out this notebook, we will use different method parameter files. Please refer to the example parameter files in the `configs` directory to follow these steps.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Load slides" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first step in generating tiles is to load slides in a data store, where our results will be generated. We will use **load_slide** CLI to prepare slides from a whole slide image (WSI) table to our analysis location. The slide is represented as a WholeSlideImage data type.\n", "\n", "All LUNA tiling CLIs offer a help option. To check the the CLI arguments, simply run your CLI with `--help` option." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Usage: load_slide [OPTIONS]\r\n", "\r\n", " Load a slide to the datastore from the whole slide image table.\r\n", "\r\n", " app_config - application configuration yaml file. See config.yaml.template\r\n", " for details.\r\n", "\r\n", " datastore_id - datastore name. usually a slide id.\r\n", "\r\n", " method_param_path - json parameter file with path to a WSI delta table.\r\n", "\r\n", " - job_tag: job tag to use for loading the slide\r\n", "\r\n", " - table_path: path to the whole slide image table\r\n", "\r\n", " - datastore_path: path to store data\r\n", "\r\n", "Options:\r\n", " -a, --app_config TEXT application configuration yaml file. See\r\n", " config.yaml.template for details. [required]\r\n", " -s, --datastore_id TEXT datastore name. usually a slide id.\r\n", " [required]\r\n", " -m, --method_param_path TEXT json parameter file with path to a WSI delta\r\n", " table. [required]\r\n", " --help Show this message and exit.\r\n" ] } ], "source": [ "!load_slide --help" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import multiprocessing\n", "import subprocess\n", "\n", "slide_ids = ['01OV002-bd8cdc70-3d46-40ae-99c4-90ef77', '01OV002-ed65cf94-8bc6-492b-9149-adc16f',\n", " '01OV007-9b90eb78-2f50-4aeb-b010-d642f9', '01OV008-308ad404-7079-4ff8-8232-12ee2e',\n", " '01OV008-7579323e-2fae-43a9-b00f-a15c28']\n", "\n", "# simple wrapper around the cli for multiple slides\n", "def pool_process(func, slides):\n", " pool = multiprocessing.Pool(3)\n", " pool.map(func, slides)\n", " pool.close()\n", " pool.join()\n" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "# call load_slide as subprocess\n", "def call_load_slide(slide):\n", " subprocess.run(f\"python3 -m luna.pathology.cli.load_slide -a ../conf/app_config.yaml -s {slide} -m ../conf/load_slides.yaml\", shell=True)\n", " return slide\n", "\n", "pool_process(call_load_slide, slide_ids)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once this step is done, the data store will be created at your `datastore_path` or `PRO_12-123/tiles` with the example method parameters.\n", "\n", "Let's take a look at the WholeSlideImage location for slide 2551571. We'll see that this process created a softlink pointing to the svs image path, along with a `metadata.json`" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "total 4.0K\r\n", "lrwxr-xr-x 1 rosed2 rosed2 91 Dec 20 21:06 data -> /home/rosed2/vmount/PRO_12-123/data/toy_data_set/01OV002-bd8cdc70-3d46-40ae-99c4-90ef77.svs\r\n", "-rw-r--r-- 1 rosed2 rosed2 3.1K Dec 20 21:06 metadata.json\r\n" ] } ], "source": [ "!ls -lhtr ~/vmount/PRO_12-123/tables/tiles/01OV002-bd8cdc70-3d46-40ae-99c4-90ef77/ov_slides/WholeSlideImage/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Generate tiles and labels" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is the main tiling step. The CLI generates tiles, populates otsu and purple scores along with the regional annotation label. An otsu score is calculated using the otsu foreground/background detection algorithm commonly used to filter out the background of the slide. Purple scores are calculated to provide additional guidance to H&E slide analysis.\n" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Usage: generate_tiles [OPTIONS]\r\n", "\r\n", " Generate tile addresses, scores and optionally annotation labels.\r\n", "\r\n", " app_config - application configuration yaml file. See config.yaml.template\r\n", " for details.\r\n", "\r\n", " datastore_id - datastore name. usually a slide id.\r\n", "\r\n", " method_param_path - json file with method parameters for tile generation and\r\n", " filtering.\r\n", "\r\n", " - input_wsi_tag: job tag used to load slides\r\n", "\r\n", " - job_tag: job tag for generating tile labels\r\n", "\r\n", " - tile_size: size of patches\r\n", "\r\n", " - scale_factor: desired downscale factor\r\n", "\r\n", " - requested_magnification: desired magnification\r\n", "\r\n", " - root_path: path to output data\r\n", "\r\n", " - filter: optional filter map to select subset of the tiles e.g. {\r\n", " \"otsu_score\": 0.5 }\r\n", "\r\n", " - project_id: optional project id, if using regional annotations\r\n", "\r\n", " - labelset: optional annotation labelset name, if using regional annotations\r\n", "\r\n", " - annotation_table_path: optional path to the regional annotation table\r\n", "\r\n", "Options:\r\n", " -a, --app_config TEXT application configuration yaml file. See\r\n", " config.yaml.template for details. [required]\r\n", " -s, --datastore_id TEXT datastore name. usually a slide id.\r\n", " [required]\r\n", " -m, --method_param_path TEXT json file with method parameters for tile\r\n", " generation and filtering. [required]\r\n", " --help Show this message and exit.\r\n" ] } ], "source": [ "!generate_tiles --help" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With this method configuration, the tile size is set to 128, scale factor to 16 and slide magnification (from slide metadata) to 20. In this example, we label the tiles with the default labels provided by the regional annotations. Note that we keep only the tiles that have been annotated and have an otsu score above 0.5 for our analysis. Please refer to `~/luna/conf/generate_tiles.yaml` for more details on the method parameters.\n", "\n", "Here we reserve 4 slides for model training, and 1 slide for testing. For training, we will only generate tiles for the areas that have been annotated by the pathologists, so the model will have ground-truth labels. For testing, we will generate tiles for the whole slide.\n", "\n", "We reserve the test slide, to be annotated by the model in the inference notebook. For this test slide, as mentioned before, we generate tiles for *all* tissue regions (otsu score > 0.5). Note here that we use a different config file `~/luna/conf/generate_tiles_all_tissues.yaml` which excludes parameters `project_id`, `labelset`, `annotation_table_path` which pertains to the regional annotation.\n", "\n", "Depending on the size of the WSI and tiles, this step can take up to 10 minutes per slide." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "../PRO_12-123/tables/tiles\n", "├── 01OV002-bd8cdc70-3d46-40ae-99c4-90ef77\n", "│   ├── ov_default_labels\n", "│   │   └── TileImages\n", "│   │   └── data\n", "│   │   ├── address.slice.csv\n", "│   │   ├── metadata.json\n", "│   │   └── tiles.slice.pil\n", "│   └── ov_slides\n", "│   └── WholeSlideImage\n", "│   ├── data -> /home/rosed2/vmount/PRO_12-123/data/toy_data_set/01OV002-bd8cdc70-3d46-40ae-99c4-90ef77.svs\n", "│   └── metadata.json\n", "├── 01OV002-ed65cf94-8bc6-492b-9149-adc16f\n", "│   ├── ov_default_labels\n", "│   │   └── TileImages\n", "│   │   └── data\n", "│   │   ├── address.slice.csv\n", "│   │   ├── metadata.json\n", "│   │   └── tiles.slice.pil\n", "│   └── ov_slides\n", "│   └── WholeSlideImage\n", "│   ├── data -> /home/rosed2/vmount/PRO_12-123/data/toy_data_set/01OV002-ed65cf94-8bc6-492b-9149-adc16f.svs\n", "│   └── metadata.json\n", "├── 01OV007-9b90eb78-2f50-4aeb-b010-d642f9\n", "│   ├── ov_default_labels\n", "│   │   └── TileImages\n", "│   │   └── data\n", "│   │   ├── address.slice.csv\n", "│   │   ├── metadata.json\n", "│   │   └── tiles.slice.pil\n", "│   └── ov_slides\n", "│   └── WholeSlideImage\n", "│   ├── data -> /home/rosed2/vmount/PRO_12-123/data/toy_data_set/01OV007-9b90eb78-2f50-4aeb-b010-d642f9.svs\n", "│   └── metadata.json\n", "├── 01OV008-308ad404-7079-4ff8-8232-12ee2e\n", "│   ├── ov_default_labels\n", "│   │   └── TileImages\n", "│   │   └── data\n", "│   │   ├── address.slice.csv\n", "│   │   ├── metadata.json\n", "│   │   └── tiles.slice.pil\n", "│   └── ov_slides\n", "│   └── WholeSlideImage\n", "│   ├── data -> /home/rosed2/vmount/PRO_12-123/data/toy_data_set/01OV008-308ad404-7079-4ff8-8232-12ee2e.svs\n", "│   └── metadata.json\n", "└── 01OV008-7579323e-2fae-43a9-b00f-a15c28\n", " └── ov_slides\n", " └── WholeSlideImage\n", " ├── data -> /home/rosed2/vmount/PRO_12-123/data/toy_data_set/01OV008-7579323e-2fae-43a9-b00f-a15c28.svs\n", " └── metadata.json\n", "\n", "27 directories, 22 files\n" ] } ], "source": [ "slide_ids_train = ['01OV002-bd8cdc70-3d46-40ae-99c4-90ef77', '01OV002-ed65cf94-8bc6-492b-9149-adc16f',\n", " '01OV007-9b90eb78-2f50-4aeb-b010-d642f9', '01OV008-308ad404-7079-4ff8-8232-12ee2e']\n", "slide_ids_test = '01OV008-7579323e-2fae-43a9-b00f-a15c28'\n", "\n", "# call generate_tiles as subprocess\n", "def call_generate_tiles(slide):\n", " subprocess.run(f\"generate_tiles -a ../conf/app_config.yaml -s {slide} -m ../conf/generate_tiles.yaml\", shell=True)\n", " return slide\n", "\n", "pool_process(call_generate_tiles, slide_ids_train)\n", "\n", "!tree ../PRO_12-123/tables/tiles" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2021-12-20 21:23:17,430 - INFO - root - FYI: Initalized logger, log file at: data-processing.log with handlers: [ (INFO)>, ]\n", "2021-12-20 21:23:17,436 - INFO - luna.common.config - loading config file ../conf/app_config.yaml\n", "2021-12-20 21:23:17,441 - INFO - luna.common.config - loading config file /home/rosed2/vmount/conf/datastore.cfg\n", "2021-12-20 21:23:17,445 - INFO - luna.common.DataStore - Configured datastore with {'GRAPH_STORE_ENABLED': False, 'GRAPH_URI': 'neo4j://localhost:7687', 'GRAPH_USER': 'neo4j', 'GRAPH_PASSWORD': 'password', 'OBJECT_STORE_ENABLED': False, 'MINIO_URI': 'localhost:8001', 'MINIO_USER': 'minio', 'MINIO_PASSWORD': 'password', 'DOC_STORE_ENABLED': False, 'MONGODB_URI': 'mongodb://localhost:27017/'}\n", "2021-12-20 21:23:17,447 - INFO - luna.common.DataStore - Datstore file backend= ../PRO_12-123/tables/tiles\n", "2021-12-20 21:23:17,448 - INFO - [datastore=01OV008-7579323e-2fae-43a9-b00f-a15c28] - Whole slide image path: ../PRO_12-123/tables/tiles/01OV008-7579323e-2fae-43a9-b00f-a15c28/ov_slides/WholeSlideImage/data\n", "2021-12-20 21:23:17,457 - INFO - [datastore=01OV008-7579323e-2fae-43a9-b00f-a15c28] - Writing to output dir: ../PRO_12-123/tables/tiles/01OV008-7579323e-2fae-43a9-b00f-a15c28/ov_default_labels/TileImages/data\n", "2021-12-20 21:23:17,458 - INFO - luna.pathology.common.preprocess - Processing slide ../PRO_12-123/tables/tiles/01OV008-7579323e-2fae-43a9-b00f-a15c28/ov_slides/WholeSlideImage/data\n", "2021-12-20 21:23:17,461 - INFO - luna.pathology.common.preprocess - Params = {'input_wsi_tag': 'ov_slides', 'job_tag': 'ov_default_labels', 'tile_size': 128, 'scale_factor': 16, 'requested_magnification': 20, 'filter': {'otsu_score': 0.5}, 'root_path': '../PRO_12-123/tables/tiles'}\n", "2021-12-20 21:23:17,707 - INFO - luna.pathology.common.preprocess - Slide size = [42240,59209]\n", "2021-12-20 21:23:17,708 - INFO - luna.pathology.common.preprocess - Normalized magnification scale factor for 20x is 2, overall thumbnail scale factor is 32\n", "2021-12-20 21:23:17,709 - INFO - luna.pathology.common.preprocess - Requested tile size=128, tile size at full magnficiation=256, tile size at thumbnail=8\n", "2021-12-20 21:23:18,073 - INFO - luna.pathology.common.preprocess - tiles x 165, tiles y 232\n", "2021-12-20 21:23:18,180 - INFO - luna.pathology.common.preprocess - Number of tiles in raster: 37490\n", "/home/rosed2/.local/lib/python3.6/site-packages/pandas/core/indexing.py:1596: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " self.obj[key] = _infer_fill_value(value)\n", "/home/rosed2/.local/lib/python3.6/site-packages/pandas/core/indexing.py:1763: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " isetter(loc, value)\n", "2021-12-20 21:24:55,830 - INFO - luna.pathology.common.preprocess - Proccessing tiles [10000,11342]\n", "2021-12-20 21:25:08,371 - INFO - luna.pathology.common.preprocess - Saved tile scores and images at ../PRO_12-123/tables/tiles/01OV008-7579323e-2fae-43a9-b00f-a15c28/ov_default_labels/TileImages/data\n" ] } ], "source": [ "!generate_tiles \\\n", "-a ../conf/app_config.yaml \\\n", "-s 01OV008-7579323e-2fae-43a9-b00f-a15c28 \\\n", "-m ../conf/generate_tiles_all_tissues.yaml" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once the step is done, you can find the tiles and score CSV for your slide, at your output location. For slide id 01OV008-7579323e-2fae-43a9-b00f-a15c28, we have the tile image and metadata stored at `~/vmount/PRO_12-123/tables/tiles/01OV008-7579323e-2fae-43a9-b00f-a15c28/ov_default_labels/TileImages/data`." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "total 545M\r\n", "-rw-r--r-- 1 rosed2 rosed2 532M Dec 20 21:25 tiles.slice.pil\r\n", "-rw-r--r-- 1 rosed2 rosed2 763K Dec 20 21:25 address.slice.csv\r\n", "-rw-r--r-- 1 rosed2 rosed2 572 Dec 20 21:25 metadata.json\r\n" ] } ], "source": [ "!ls -lhtr ~/vmount/PRO_12-123/tables/tiles/01OV008-7579323e-2fae-43a9-b00f-a15c28/ov_default_labels/TileImages/data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's look at the tile metadata in the output CSV.\n", "\n", "The tile otsu_score, purple score and regional annotation labels are stored along tile metadata such as address, coordinates, size, and offset. \n", "\n", "For the training slide, we see that only the tiles that meet the filter criteria has been kept. For the test slide, we keep all tissue regions, so we have far more tiles generated. Notice we don't have the regional labels." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
addresscoordinatesotsu_scorepurple_scoreregional_labeltile_image_offsettile_image_lengthtile_image_size_xytile_image_mode
0x54_y114_z20(54, 114)0.8437501.000000stroma199409664.049152.0128.0RGB
1x55_y113_z20(55, 113)0.6406250.968750stroma203882496.049152.0128.0RGB
2x55_y114_z20(55, 114)0.7343750.937500stroma203931648.049152.0128.0RGB
3x56_y113_z20(56, 113)0.5468751.000000stroma208502784.049152.0128.0RGB
4x56_y114_z20(56, 114)0.6562501.000000stroma208551936.049152.0128.0RGB
..............................
144x110_y170_z20(110, 170)1.0000000.968750tumor477806592.049152.0128.0RGB
145x110_y171_z20(110, 171)0.9843751.000000tumor477855744.049152.0128.0RGB
146x110_y172_z20(110, 172)0.9843751.000000tumor477904896.049152.0128.0RGB
147x113_y94_z20(113, 94)0.5468750.703125fat493928448.049152.0128.0RGB
148x115_y94_z20(115, 94)0.6562500.859375fat505921536.049152.0128.0RGB
\n", "

149 rows × 9 columns

\n", "
" ], "text/plain": [ " address coordinates otsu_score purple_score regional_label \\\n", "0 x54_y114_z20 (54, 114) 0.843750 1.000000 stroma \n", "1 x55_y113_z20 (55, 113) 0.640625 0.968750 stroma \n", "2 x55_y114_z20 (55, 114) 0.734375 0.937500 stroma \n", "3 x56_y113_z20 (56, 113) 0.546875 1.000000 stroma \n", "4 x56_y114_z20 (56, 114) 0.656250 1.000000 stroma \n", ".. ... ... ... ... ... \n", "144 x110_y170_z20 (110, 170) 1.000000 0.968750 tumor \n", "145 x110_y171_z20 (110, 171) 0.984375 1.000000 tumor \n", "146 x110_y172_z20 (110, 172) 0.984375 1.000000 tumor \n", "147 x113_y94_z20 (113, 94) 0.546875 0.703125 fat \n", "148 x115_y94_z20 (115, 94) 0.656250 0.859375 fat \n", "\n", " tile_image_offset tile_image_length tile_image_size_xy tile_image_mode \n", "0 199409664.0 49152.0 128.0 RGB \n", "1 203882496.0 49152.0 128.0 RGB \n", "2 203931648.0 49152.0 128.0 RGB \n", "3 208502784.0 49152.0 128.0 RGB \n", "4 208551936.0 49152.0 128.0 RGB \n", ".. ... ... ... ... \n", "144 477806592.0 49152.0 128.0 RGB \n", "145 477855744.0 49152.0 128.0 RGB \n", "146 477904896.0 49152.0 128.0 RGB \n", "147 493928448.0 49152.0 128.0 RGB \n", "148 505921536.0 49152.0 128.0 RGB \n", "\n", "[149 rows x 9 columns]" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "\n", "# For a train slide, we have generated tiles for annotated regions, and populated regional_labels\n", "df = pd.read_csv(\"../PRO_12-123/tables/tiles/01OV002-bd8cdc70-3d46-40ae-99c4-90ef77/ov_default_labels/TileImages/data/address.slice.csv\")\n", "df" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
addresscoordinatesotsu_scorepurple_scoretile_image_offsettile_image_lengthtile_image_size_xytile_image_mode
0x7_y146_z20(7, 146)0.5468750.5937500.049152.0128.0RGB
1x8_y133_z20(8, 133)0.6875000.79687549152.049152.0128.0RGB
2x8_y134_z20(8, 134)0.8906250.96875098304.049152.0128.0RGB
3x8_y135_z20(8, 135)0.8593751.000000147456.049152.0128.0RGB
4x8_y136_z20(8, 136)0.9687500.968750196608.049152.0128.0RGB
...........................
11337x126_y136_z20(126, 136)0.8437500.843750557236224.049152.0128.0RGB
11338x126_y137_z20(126, 137)0.8750001.000000557285376.049152.0128.0RGB
11339x126_y138_z20(126, 138)0.7968751.000000557334528.049152.0128.0RGB
11340x126_y139_z20(126, 139)0.7968750.968750557383680.049152.0128.0RGB
11341x126_y140_z20(126, 140)0.5625000.812500557432832.049152.0128.0RGB
\n", "

11342 rows × 8 columns

\n", "
" ], "text/plain": [ " address coordinates otsu_score purple_score tile_image_offset \\\n", "0 x7_y146_z20 (7, 146) 0.546875 0.593750 0.0 \n", "1 x8_y133_z20 (8, 133) 0.687500 0.796875 49152.0 \n", "2 x8_y134_z20 (8, 134) 0.890625 0.968750 98304.0 \n", "3 x8_y135_z20 (8, 135) 0.859375 1.000000 147456.0 \n", "4 x8_y136_z20 (8, 136) 0.968750 0.968750 196608.0 \n", "... ... ... ... ... ... \n", "11337 x126_y136_z20 (126, 136) 0.843750 0.843750 557236224.0 \n", "11338 x126_y137_z20 (126, 137) 0.875000 1.000000 557285376.0 \n", "11339 x126_y138_z20 (126, 138) 0.796875 1.000000 557334528.0 \n", "11340 x126_y139_z20 (126, 139) 0.796875 0.968750 557383680.0 \n", "11341 x126_y140_z20 (126, 140) 0.562500 0.812500 557432832.0 \n", "\n", " tile_image_length tile_image_size_xy tile_image_mode \n", "0 49152.0 128.0 RGB \n", "1 49152.0 128.0 RGB \n", "2 49152.0 128.0 RGB \n", "3 49152.0 128.0 RGB \n", "4 49152.0 128.0 RGB \n", "... ... ... ... \n", "11337 49152.0 128.0 RGB \n", "11338 49152.0 128.0 RGB \n", "11339 49152.0 128.0 RGB \n", "11340 49152.0 128.0 RGB \n", "11341 49152.0 128.0 RGB \n", "\n", "[11342 rows x 8 columns]" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# For the test slide, we have generated tiles for all tissue regions\n", "df = pd.read_csv(\"../PRO_12-123/tables/tiles/01OV008-7579323e-2fae-43a9-b00f-a15c28/ov_default_labels/TileImages/data/address.slice.csv\")\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Collect tiles for model training" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we have created tile labels, we can use **collect_tiles** CLI to collect the tile metadata as a set of parquet tables and save the outputs for multiple slide ids in the same dataset. This step is done to gather our dataset for model training." ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Usage: collect_tiles [OPTIONS]\r\n", "\r\n", " Save tiles as a parquet file, indexed by slide id, address, and optionally\r\n", " patient_id.\r\n", "\r\n", " app_config - application configuration yaml file. See config.yaml.template\r\n", " for details.\r\n", "\r\n", " datastore_id - datastore name. usually a slide id.\r\n", "\r\n", " method_param_path - json file with method parameters including input, output\r\n", " details.\r\n", "\r\n", " - input_label_tag: job tag used for generating tile labels\r\n", "\r\n", " - input_wsi_tag: job tag used for loading the slide\r\n", "\r\n", " - output_datastore: job tag for collecting tiles\r\n", "\r\n", " - root_path: path to output data\r\n", "\r\n", "Options:\r\n", " -a, --app_config TEXT application configuration yaml file. See\r\n", " config.yaml.template for details. [required]\r\n", " -s, --datastore_id TEXT datastore name. usually a slide id.\r\n", " [required]\r\n", " -m, --method_param_path TEXT json file with method parameters including\r\n", " input, output details. [required]\r\n", " --help Show this message and exit.\r\n" ] } ], "source": [ "!collect_tiles --help" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "At this point, it is critical to note that our model will train on the 4 slides reserved for trainig. We have reserved one slide out of the model training step in order to use it for the inference step. \n", "\n", "We will call **collect_tiles** on the training slides to prepare a dataset for training." ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "slide_ids_train = ['01OV002-bd8cdc70-3d46-40ae-99c4-90ef77', '01OV002-ed65cf94-8bc6-492b-9149-adc16f',\n", " '01OV007-9b90eb78-2f50-4aeb-b010-d642f9', '01OV008-308ad404-7079-4ff8-8232-12ee2e']\n", "# call collect_tiles as subprocess\n", "def call_collect_tiles(slide):\n", " subprocess.run(f\"collect_tiles -a ~/luna/conf/app_config.yaml -s {slide} -m ~/luna/conf/collect_tiles.yaml\", shell=True)\n", "\n", "pool_process(call_collect_tiles, slide_ids_train)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's check the output. The collected parquet files can be loaded as a pyarrow ParquetDataset, and be converted to Pandas Dataframe.\n", "\n", "You'll notice the table is indexed by `patient_id`, `slide id` and `address`. The `data_path` points to the tile image file. The rest of the metadata stored in this table are similar to the output of **generate_tiles** CLI." ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
coordinatesotsu_scorepurple_scoreregional_labeltile_image_offsettile_image_lengthtile_image_size_xytile_image_modedata_path
patient_idid_slide_containeraddress
401OV002-bd8cdc70-3d46-40ae-99c4-90ef77x54_y114_z20(54, 114)0.8437501.000000stroma199409664.049152.0128.0RGB../PRO_12-123/tables/tiles/01OV002-bd8cdc70-3d...
x55_y113_z20(55, 113)0.6406250.968750stroma203882496.049152.0128.0RGB../PRO_12-123/tables/tiles/01OV002-bd8cdc70-3d...
x55_y114_z20(55, 114)0.7343750.937500stroma203931648.049152.0128.0RGB../PRO_12-123/tables/tiles/01OV002-bd8cdc70-3d...
x56_y113_z20(56, 113)0.5468751.000000stroma208502784.049152.0128.0RGB../PRO_12-123/tables/tiles/01OV002-bd8cdc70-3d...
x56_y114_z20(56, 114)0.6562501.000000stroma208551936.049152.0128.0RGB../PRO_12-123/tables/tiles/01OV002-bd8cdc70-3d...
....................................
501OV008-308ad404-7079-4ff8-8232-12ee2ex146_y89_z20(146, 89)0.8750000.953125stroma484786176.049152.0128.0RGB../PRO_12-123/tables/tiles/01OV008-308ad404-70...
x146_y90_z20(146, 90)0.9531250.984375stroma484835328.049152.0128.0RGB../PRO_12-123/tables/tiles/01OV008-308ad404-70...
x147_y90_z20(147, 90)0.7343750.937500stroma488816640.049152.0128.0RGB../PRO_12-123/tables/tiles/01OV008-308ad404-70...
x147_y91_z20(147, 91)0.9687501.000000stroma488865792.049152.0128.0RGB../PRO_12-123/tables/tiles/01OV008-308ad404-70...
x148_y91_z20(148, 91)1.0000001.000000stroma493191168.049152.0128.0RGB../PRO_12-123/tables/tiles/01OV008-308ad404-70...
\n", "

1506 rows × 9 columns

\n", "
" ], "text/plain": [ " coordinates \\\n", "patient_id id_slide_container address \n", "4 01OV002-bd8cdc70-3d46-40ae-99c4-90ef77 x54_y114_z20 (54, 114) \n", " x55_y113_z20 (55, 113) \n", " x55_y114_z20 (55, 114) \n", " x56_y113_z20 (56, 113) \n", " x56_y114_z20 (56, 114) \n", "... ... \n", "5 01OV008-308ad404-7079-4ff8-8232-12ee2e x146_y89_z20 (146, 89) \n", " x146_y90_z20 (146, 90) \n", " x147_y90_z20 (147, 90) \n", " x147_y91_z20 (147, 91) \n", " x148_y91_z20 (148, 91) \n", "\n", " otsu_score \\\n", "patient_id id_slide_container address \n", "4 01OV002-bd8cdc70-3d46-40ae-99c4-90ef77 x54_y114_z20 0.843750 \n", " x55_y113_z20 0.640625 \n", " x55_y114_z20 0.734375 \n", " x56_y113_z20 0.546875 \n", " x56_y114_z20 0.656250 \n", "... ... \n", "5 01OV008-308ad404-7079-4ff8-8232-12ee2e x146_y89_z20 0.875000 \n", " x146_y90_z20 0.953125 \n", " x147_y90_z20 0.734375 \n", " x147_y91_z20 0.968750 \n", " x148_y91_z20 1.000000 \n", "\n", " purple_score \\\n", "patient_id id_slide_container address \n", "4 01OV002-bd8cdc70-3d46-40ae-99c4-90ef77 x54_y114_z20 1.000000 \n", " x55_y113_z20 0.968750 \n", " x55_y114_z20 0.937500 \n", " x56_y113_z20 1.000000 \n", " x56_y114_z20 1.000000 \n", "... ... \n", "5 01OV008-308ad404-7079-4ff8-8232-12ee2e x146_y89_z20 0.953125 \n", " x146_y90_z20 0.984375 \n", " x147_y90_z20 0.937500 \n", " x147_y91_z20 1.000000 \n", " x148_y91_z20 1.000000 \n", "\n", " regional_label \\\n", "patient_id id_slide_container address \n", "4 01OV002-bd8cdc70-3d46-40ae-99c4-90ef77 x54_y114_z20 stroma \n", " x55_y113_z20 stroma \n", " x55_y114_z20 stroma \n", " x56_y113_z20 stroma \n", " x56_y114_z20 stroma \n", "... ... \n", "5 01OV008-308ad404-7079-4ff8-8232-12ee2e x146_y89_z20 stroma \n", " x146_y90_z20 stroma \n", " x147_y90_z20 stroma \n", " x147_y91_z20 stroma \n", " x148_y91_z20 stroma \n", "\n", " tile_image_offset \\\n", "patient_id id_slide_container address \n", "4 01OV002-bd8cdc70-3d46-40ae-99c4-90ef77 x54_y114_z20 199409664.0 \n", " x55_y113_z20 203882496.0 \n", " x55_y114_z20 203931648.0 \n", " x56_y113_z20 208502784.0 \n", " x56_y114_z20 208551936.0 \n", "... ... \n", "5 01OV008-308ad404-7079-4ff8-8232-12ee2e x146_y89_z20 484786176.0 \n", " x146_y90_z20 484835328.0 \n", " x147_y90_z20 488816640.0 \n", " x147_y91_z20 488865792.0 \n", " x148_y91_z20 493191168.0 \n", "\n", " tile_image_length \\\n", "patient_id id_slide_container address \n", "4 01OV002-bd8cdc70-3d46-40ae-99c4-90ef77 x54_y114_z20 49152.0 \n", " x55_y113_z20 49152.0 \n", " x55_y114_z20 49152.0 \n", " x56_y113_z20 49152.0 \n", " x56_y114_z20 49152.0 \n", "... ... \n", "5 01OV008-308ad404-7079-4ff8-8232-12ee2e x146_y89_z20 49152.0 \n", " x146_y90_z20 49152.0 \n", " x147_y90_z20 49152.0 \n", " x147_y91_z20 49152.0 \n", " x148_y91_z20 49152.0 \n", "\n", " tile_image_size_xy \\\n", "patient_id id_slide_container address \n", "4 01OV002-bd8cdc70-3d46-40ae-99c4-90ef77 x54_y114_z20 128.0 \n", " x55_y113_z20 128.0 \n", " x55_y114_z20 128.0 \n", " x56_y113_z20 128.0 \n", " x56_y114_z20 128.0 \n", "... ... \n", "5 01OV008-308ad404-7079-4ff8-8232-12ee2e x146_y89_z20 128.0 \n", " x146_y90_z20 128.0 \n", " x147_y90_z20 128.0 \n", " x147_y91_z20 128.0 \n", " x148_y91_z20 128.0 \n", "\n", " tile_image_mode \\\n", "patient_id id_slide_container address \n", "4 01OV002-bd8cdc70-3d46-40ae-99c4-90ef77 x54_y114_z20 RGB \n", " x55_y113_z20 RGB \n", " x55_y114_z20 RGB \n", " x56_y113_z20 RGB \n", " x56_y114_z20 RGB \n", "... ... \n", "5 01OV008-308ad404-7079-4ff8-8232-12ee2e x146_y89_z20 RGB \n", " x146_y90_z20 RGB \n", " x147_y90_z20 RGB \n", " x147_y91_z20 RGB \n", " x148_y91_z20 RGB \n", "\n", " data_path \n", "patient_id id_slide_container address \n", "4 01OV002-bd8cdc70-3d46-40ae-99c4-90ef77 x54_y114_z20 ../PRO_12-123/tables/tiles/01OV002-bd8cdc70-3d... \n", " x55_y113_z20 ../PRO_12-123/tables/tiles/01OV002-bd8cdc70-3d... \n", " x55_y114_z20 ../PRO_12-123/tables/tiles/01OV002-bd8cdc70-3d... \n", " x56_y113_z20 ../PRO_12-123/tables/tiles/01OV002-bd8cdc70-3d... \n", " x56_y114_z20 ../PRO_12-123/tables/tiles/01OV002-bd8cdc70-3d... \n", "... ... \n", "5 01OV008-308ad404-7079-4ff8-8232-12ee2e x146_y89_z20 ../PRO_12-123/tables/tiles/01OV008-308ad404-70... \n", " x146_y90_z20 ../PRO_12-123/tables/tiles/01OV008-308ad404-70... \n", " x147_y90_z20 ../PRO_12-123/tables/tiles/01OV008-308ad404-70... \n", " x147_y91_z20 ../PRO_12-123/tables/tiles/01OV008-308ad404-70... \n", " x148_y91_z20 ../PRO_12-123/tables/tiles/01OV008-308ad404-70... \n", "\n", "[1506 rows x 9 columns]" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from pyarrow.parquet import ParquetDataset\n", "\n", "ds = ParquetDataset('../PRO_12-123/tables/tiles/ov_tileset').read().to_pandas()\n", "ds" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Congratulations! Now you have the tiles images and labels ready to train your model." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" } }, "nbformat": 4, "nbformat_minor": 4 }