{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Tile Generation Tutorial" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Welcome to the tile generation tutorial!\n", "\n", "As a whole slide image is too large for deep learning model training, a slide is often divded into a set of small tiles, and used for training. For tile-based whole slide image analysis, generating tiles and labels is an important and laborious step. With LUNA tiling CLIs and tutorials, you can easily generate tile labels and get your data ready for downstream analysis. In this notebook, we will see how to generate tiles and labels using LUNA tiling CLIs. Here are the main steps we will review:\n", "\n", "- Load slides\n", "- Generate tiles, labels\n", "- Collect tiles for model training\n", "\n", "Through out this notebook, we will use different method parameter files. Please refer to the example parameter files in the `configs` directory to follow these steps.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load slides" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first step in generating tiles is to load slides in a data store, where our results will be generated. We will use **load_slide** CLI to prepare slides from a whole slide image (WSI) table to our analysis location. The slide is represented as a WholeSlideImage data type.\n", "\n", "All LUNA tiling CLIs offer a help option. To check the the CLI arguments, simply run your CLI with `--help` option." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Usage: load_slide [OPTIONS]\n", "\n", "Options:\n", " -a, --app_config TEXT application configuration yaml file. See\n", " config.yaml.template for details. [required]\n", "\n", " -s, --datastore_id TEXT datastore name. usually a slide id.\n", " [required]\n", "\n", " -m, --method_param_path TEXT json parameter file with path to a WSI delta\n", " table. [required]\n", "\n", " --help Show this message and exit.\n" ] } ], "source": [ "%%bash\n", "\n", "load_slide --help" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import multiprocessing\n", "import subprocess\n", "\n", "slide_ids = ['2551571', '2551531', '2551028', '2551389', '2551129']\n", "\n", "# simple wrapper around the cli for multiple slides\n", "def pool_process(func, slides):\n", " pool = multiprocessing.Pool(3)\n", " pool.map(func, slides)\n", " pool.close()\n", " pool.join()\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# call load_slide as subprocess\n", "def call_load_slide(slide):\n", " subprocess.run(f\"load_slide -a configs/app_config.yaml -s {slide} -m configs/load_slides.yaml\", shell=True)\n", " return slide\n", "\n", "pool_process(call_load_slide, slide_ids)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once this step is done, the data store will be created at your `datastore_path` or `PRO_12-123/tiles` with the example method parameters.\n", "\n", "Let's take a look at the WholeSlideImage location for slide 2551571. We'll see that this process created a softlink pointing to the svs image path, along with a `metadata.json`" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "total 1.0K\n", "lrwxrwxrwx 1 pashaa pashaa 104 Jul 13 13:29 data -> /gpfs/mskmindhdp_emc/user/shared_data_folder/pathology-tutorial/PRO_12-123/data/toy_data_set/2551571.svs\n", "-rwxrwxrwx 1 pashaa pashaa 3.1K Jul 13 13:29 metadata.json\n" ] } ], "source": [ "%%bash\n", "\n", "ls -lhtr PRO_12-123/tiles/2551571/ov_slides/WholeSlideImage/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Generate tiles and labels" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is the main tiling step. The CLI generates tiles, populates otsu and purple scores along with the regional annotation label. An otsu score is calculated using the otsu foreground/background detection algorithm commonly used to filter out the background of the slide. Purple scores are calculated to provide additional guidance to H&E slide analysis.\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Usage: generate_tiles [OPTIONS]\n", "\n", "Options:\n", " -a, --app_config TEXT application configuration yaml file. See\n", " config.yaml.template for details. [required]\n", "\n", " -s, --datastore_id TEXT datastore name. usually a slide id.\n", " [required]\n", "\n", " -m, --method_param_path TEXT json file with method parameters for tile\n", " generation and filtering. [required]\n", "\n", " --help Show this message and exit.\n" ] } ], "source": [ "%%bash\n", "\n", "generate_tiles --help" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With this method configuration, the tile size is set to 128, scale factor to 16 and slide magnification (from slide metadata) to 20. In this example, we label the tiles with the default labels provided by the regional annotations. Note that we keep only the tiles that have been annotated and have an otsu score above 0.5 for our analysis. Please refer to `configs/generate_tiles.yaml` for more details on the method parameters.\n", "\n", "Here we reserve 4 slides for model training, and 1 slide for testing. For training, we will only generate tiles for the areas that have been annotated by the pathologists, so the model will have ground-truth labels. For testing, we will generate tiles for the whole slide.\n", "\n", "We reserve the test slide, to be annotated by the model in the inference notebook. For this test slide, as mentioned before, we generate tiles for *all* tissue regions (otsu score > 0.5). Note here that we use a different config file `configs/generate_tiles_all_tissues.yaml` which excludes parameters `project_id`, `labelset`, `annotation_table_path` which pertains to the regional annotation.\n", "\n", "Depending on the size of the WSI and tiles, this step can take up to 10 minutes per slide." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "slide_ids_train = ['2551571', '2551531', '2551028', '2551389']\n", "slide_ids_test = '2551129'\n", "\n", "# call generate_tiles as subprocess\n", "def call_generate_tiles(slide):\n", " subprocess.run(f\"generate_tiles -a configs/app_config.yaml -s {slide} -m configs/generate_tiles.yaml\", shell=True)\n", " return slide\n", "\n", "pool_process(call_generate_tiles, slide_ids_train)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "\n", "generate_tiles -a configs/app_config.yaml -s 2551129 -m configs/generate_tiles_all_tissues.yaml" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once the step is done, you can find the tiles and score CSV for your slide, at your output location. For slide id 2551571, we have the tile image and metadata stored at `PRO_12-123/tiles/2551571/ov_default_labels/TileImages/data`." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "total 3.6G\n", "-rwxrwxrwx 1 pashaa pashaa 3.6G Jul 13 13:44 tiles.slice.pil\n", "-rwxrwxrwx 1 pashaa pashaa 207K Jul 13 13:45 address.slice.csv\n", "-rwxrwxrwx 1 pashaa pashaa 635 Jul 13 13:45 metadata.json\n" ] } ], "source": [ "%%bash\n", "\n", "ls -lhtr PRO_12-123/tiles/2551571/ov_default_labels/TileImages/data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's look at the tile metadata in the output CSV.\n", "\n", "The tile otsu_score, purple score and regional annotation labels are stored along tile metadata such as address, coordinates, size, and offset. From the log, we see that out of total 206830 tiles only a subset that meets the filter criteria has been kept." ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | address | \n", "coordinates | \n", "otsu_score | \n", "purple_score | \n", "regional_label | \n", "tile_image_offset | \n", "tile_image_length | \n", "tile_image_size_xy | \n", "tile_image_mode | \n", "
|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "x107_y183_z20 | \n", "(107, 183) | \n", "0.859375 | \n", "0.984375 | \n", "veins | \n", "2.845409e+08 | \n", "49152.0 | \n", "128.0 | \n", "RGB | \n", "
| 1 | \n", "x107_y184_z20 | \n", "(107, 184) | \n", "0.890625 | \n", "0.984375 | \n", "veins | \n", "2.845901e+08 | \n", "49152.0 | \n", "128.0 | \n", "RGB | \n", "
| 2 | \n", "x107_y185_z20 | \n", "(107, 185) | \n", "1.000000 | \n", "1.000000 | \n", "veins | \n", "2.846392e+08 | \n", "49152.0 | \n", "128.0 | \n", "RGB | \n", "
| 3 | \n", "x107_y192_z20 | \n", "(107, 192) | \n", "0.593750 | \n", "1.000000 | \n", "veins | \n", "2.849833e+08 | \n", "49152.0 | \n", "128.0 | \n", "RGB | \n", "
| 4 | \n", "x108_y183_z20 | \n", "(108, 183) | \n", "0.875000 | \n", "0.953125 | \n", "veins | \n", "2.918154e+08 | \n", "49152.0 | \n", "128.0 | \n", "RGB | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| 2615 | \n", "x453_y148_z20 | \n", "(453, 148) | \n", "1.000000 | \n", "1.000000 | \n", "lympho_rich_tumor | \n", "3.706257e+09 | \n", "49152.0 | \n", "128.0 | \n", "RGB | \n", "
| 2616 | \n", "x454_y146_z20 | \n", "(454, 146) | \n", "1.000000 | \n", "1.000000 | \n", "lympho_rich_tumor | \n", "3.713237e+09 | \n", "49152.0 | \n", "128.0 | \n", "RGB | \n", "
| 2617 | \n", "x454_y147_z20 | \n", "(454, 147) | \n", "1.000000 | \n", "1.000000 | \n", "lympho_rich_tumor | \n", "3.713286e+09 | \n", "49152.0 | \n", "128.0 | \n", "RGB | \n", "
| 2618 | \n", "x454_y148_z20 | \n", "(454, 148) | \n", "1.000000 | \n", "1.000000 | \n", "lympho_rich_tumor | \n", "3.713335e+09 | \n", "49152.0 | \n", "128.0 | \n", "RGB | \n", "
| 2619 | \n", "x455_y143_z20 | \n", "(455, 143) | \n", "1.000000 | \n", "1.000000 | \n", "lympho_rich_stroma | \n", "3.719725e+09 | \n", "49152.0 | \n", "128.0 | \n", "RGB | \n", "
2620 rows × 9 columns
\n", "| \n", " | address | \n", "coordinates | \n", "otsu_score | \n", "purple_score | \n", "tile_image_offset | \n", "tile_image_length | \n", "tile_image_size_xy | \n", "tile_image_mode | \n", "
|---|---|---|---|---|---|---|---|---|
| 0 | \n", "x1_y1_z20 | \n", "(1, 1) | \n", "1.0000 | \n", "0.0 | \n", "0.000000e+00 | \n", "49152.0 | \n", "128.0 | \n", "RGB | \n", "
| 1 | \n", "x1_y2_z20 | \n", "(1, 2) | \n", "1.0000 | \n", "0.0 | \n", "4.915200e+04 | \n", "49152.0 | \n", "128.0 | \n", "RGB | \n", "
| 2 | \n", "x2_y1_z20 | \n", "(2, 1) | \n", "1.0000 | \n", "0.0 | \n", "9.830400e+04 | \n", "49152.0 | \n", "128.0 | \n", "RGB | \n", "
| 3 | \n", "x2_y2_z20 | \n", "(2, 2) | \n", "1.0000 | \n", "0.0 | \n", "1.474560e+05 | \n", "49152.0 | \n", "128.0 | \n", "RGB | \n", "
| 4 | \n", "x3_y1_z20 | \n", "(3, 1) | \n", "1.0000 | \n", "0.0 | \n", "1.966080e+05 | \n", "49152.0 | \n", "128.0 | \n", "RGB | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| 28750 | \n", "x636_y2_z20 | \n", "(636, 2) | \n", "0.8750 | \n", "0.0 | \n", "1.413120e+09 | \n", "49152.0 | \n", "128.0 | \n", "RGB | \n", "
| 28751 | \n", "x636_y3_z20 | \n", "(636, 3) | \n", "0.8125 | \n", "0.0 | \n", "1.413169e+09 | \n", "49152.0 | \n", "128.0 | \n", "RGB | \n", "
| 28752 | \n", "x637_y1_z20 | \n", "(637, 1) | \n", "1.0000 | \n", "0.0 | \n", "1.413218e+09 | \n", "49152.0 | \n", "128.0 | \n", "RGB | \n", "
| 28753 | \n", "x637_y2_z20 | \n", "(637, 2) | \n", "0.9375 | \n", "0.0 | \n", "1.413267e+09 | \n", "49152.0 | \n", "128.0 | \n", "RGB | \n", "
| 28754 | \n", "x637_y3_z20 | \n", "(637, 3) | \n", "0.8750 | \n", "0.0 | \n", "1.413317e+09 | \n", "49152.0 | \n", "128.0 | \n", "RGB | \n", "
28755 rows × 8 columns
\n", "| \n", " | \n", " | \n", " | coordinates | \n", "otsu_score | \n", "purple_score | \n", "regional_label | \n", "tile_image_offset | \n", "tile_image_length | \n", "tile_image_size_xy | \n", "tile_image_mode | \n", "data_path | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|
| patient_id | \n", "id_slide_container | \n", "address | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
| 4 | \n", "2551028 | \n", "x128_y72_z20 | \n", "(128, 72) | \n", "1.0 | \n", "1.0 | \n", "arteries | \n", "4.446781e+08 | \n", "49152.0 | \n", "128.0 | \n", "RGB | \n", "/gpfs/mskmindhdp_emc/user/shared_data_folder/p... | \n", "
| x128_y73_z20 | \n", "(128, 73) | \n", "1.0 | \n", "1.0 | \n", "arteries | \n", "4.447273e+08 | \n", "49152.0 | \n", "128.0 | \n", "RGB | \n", "/gpfs/mskmindhdp_emc/user/shared_data_folder/p... | \n", "||
| x128_y74_z20 | \n", "(128, 74) | \n", "1.0 | \n", "1.0 | \n", "arteries | \n", "4.447764e+08 | \n", "49152.0 | \n", "128.0 | \n", "RGB | \n", "/gpfs/mskmindhdp_emc/user/shared_data_folder/p... | \n", "||
| x128_y75_z20 | \n", "(128, 75) | \n", "1.0 | \n", "1.0 | \n", "arteries | \n", "4.448256e+08 | \n", "49152.0 | \n", "128.0 | \n", "RGB | \n", "/gpfs/mskmindhdp_emc/user/shared_data_folder/p... | \n", "||
| x128_y76_z20 | \n", "(128, 76) | \n", "1.0 | \n", "1.0 | \n", "arteries | \n", "4.448748e+08 | \n", "49152.0 | \n", "128.0 | \n", "RGB | \n", "/gpfs/mskmindhdp_emc/user/shared_data_folder/p... | \n", "||
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| 1 | \n", "2551571 | \n", "x453_y148_z20 | \n", "(453, 148) | \n", "1.0 | \n", "1.0 | \n", "lympho_rich_tumor | \n", "3.706257e+09 | \n", "49152.0 | \n", "128.0 | \n", "RGB | \n", "/gpfs/mskmindhdp_emc/user/shared_data_folder/p... | \n", "
| x454_y146_z20 | \n", "(454, 146) | \n", "1.0 | \n", "1.0 | \n", "lympho_rich_tumor | \n", "3.713237e+09 | \n", "49152.0 | \n", "128.0 | \n", "RGB | \n", "/gpfs/mskmindhdp_emc/user/shared_data_folder/p... | \n", "||
| x454_y147_z20 | \n", "(454, 147) | \n", "1.0 | \n", "1.0 | \n", "lympho_rich_tumor | \n", "3.713286e+09 | \n", "49152.0 | \n", "128.0 | \n", "RGB | \n", "/gpfs/mskmindhdp_emc/user/shared_data_folder/p... | \n", "||
| x454_y148_z20 | \n", "(454, 148) | \n", "1.0 | \n", "1.0 | \n", "lympho_rich_tumor | \n", "3.713335e+09 | \n", "49152.0 | \n", "128.0 | \n", "RGB | \n", "/gpfs/mskmindhdp_emc/user/shared_data_folder/p... | \n", "||
| x455_y143_z20 | \n", "(455, 143) | \n", "1.0 | \n", "1.0 | \n", "lympho_rich_stroma | \n", "3.719725e+09 | \n", "49152.0 | \n", "128.0 | \n", "RGB | \n", "/gpfs/mskmindhdp_emc/user/shared_data_folder/p... | \n", "
14696 rows × 9 columns
\n", "