{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# DSA Annotation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Digital Slide Archive (DSA) is an open-source web application where users can annotate regional and point annotations on the high power slide viewer. Luna Pathology CLIs pull the different annotation types from DSA, and save the annotations in GeoJSON format along with metadata. In this notebook, we will review:\n", "\n", "- Project setup on DSA\n", "- Create annotations on DSA\n", "- Run regional annotation ETL\n", "- Run point annotation ETL\n", "\n", "DSA provides an excellent [video tutorial](https://www.youtube.com/watch?v=HTvLMyKYyGs&ab_channel=DigitalSlideArchive%2FHistomicsTK) that covers platform features. For the first two points on DSA, the information below is an abridged version of the tutorial for your reference." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Project setup on DSA" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Digital Slide Archive (DSA) is a platform that provides the ability to store, manage, visualize and annotate large imaging data sets. The DSA consists of an interface to visualize slides and manage annotations (HistomicsUI), and a web-server that provides a rich API and data management tools (using Girder). This system can:\n", "\n", "- Organize images from a variety of assetstores, such as local files systems and S3.\n", "- Provide user access controls.\n", "- Image annotation and review.\n", "\n", "HistomicsUI is a web-based application for examining, annotating, and processing histology images to extract both low and high level features (e.g. cellular structure, feature types).\n", "Concepts\n", "\n", "- **Collections** correspond to a project. Collections are at the top level objects in the data organization hierarchy.\n", "- **Folders** help organize slides under a project. e.g. hne_slides\n", "- **Items** correspond to a slide. An item can have metadata, annotations and files associated with it.\n", "- **Annotation** is a single rectangle, point, or polygon\n", "- **Annotation Document** is a set of annotations, created by the pathologist.\n", "- **Annotation Style** is a predefined set of labels (morphology like tumor, stroma, necrosis etc) and colors." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create a collection for your project.\n", "Your images can be organized in a folder.\n", "In this example, we have a `pathology-tutorial` collection with `slides` folder where we organized the images.\n", "\n", "\"DSA\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create annotations on DSA" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Please see this [video tutorial](https://youtu.be/HTvLMyKYyGs?t=369) for creating and viewing annotations. The information below is an abridged version of the tutorial for your reference. \n", "\n", "**1. To navigate to HistomicsUI, go to the Actions → Open in HistomicsUI on the upper right side. HistomicsUI will open a new tab in your browser.**\n", "\n", "\"DSA\n", "\n", "**2. Create an annotation document**\n", "- Click on the + New button on the Annotation panel. This will bring up a Create annotation modal.\n", "- Name you annotation document **regional** or **point**. These are the two types of annotations we support. The annotation document name will be used in the ETL, it is important to standardize your document names so the ETL can download all documents for the annotation type.\n", "- Optionally add a description, then click save.\n", " \n", "\"DSA\n", "\n", "**3. Create annotations**\n", "\n", "- Select a label (e.g. regional_tumor)\n", "- Click on **Point** or **Polygon**. When an annotation shape is highlighted, then your cursor on the slide area will look like a +\n", "- For Point annotation, zoom to an appropriate magnification and click on the cell. The annotation will appear as a circle.\n", "- For Polygon annotation, click and drag your mouse. As you drag the area will be highlighted. Try to meet the starting point, or double click to close the polygon.\n", "\n", "\"DSA\n", "\n", "\n", "**Note**: Using standardized annotation styles is recommended. A uniform annotation style json can be created and shared among the pathologists making annotations." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Run regional annotation ETL\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "env: LUNA_HOME=/gpfs/mskmind_emc/data_user/rosed2/luna\n" ] } ], "source": [ "# TEMP\n", "%env LUNA_HOME=/gpfs/mskmind_emc/data_user/rosed2/luna\n", "#%env PYTHONPATH=/gpfs/mskmind_emc/data_user/rosed2/luna/pyluna-pathology:/gpfs/mskmind_emc/data_user/rosed2/luna/pyluna-common/luna\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once you have created annotations on DSA, we can run the annotation ETL CLI! This ETL will download the annotations, convert them to GeoJSON format, and create a parquet table to make the annotations and metadata queryable.\n", "\n", "For details of the data and app configuration, please refer to the example configurations.\n", "\n", "**Note**: details of your DSA instance should be updated to reflect your DSA setup. If you are using the luna tutorial docker, replace the `localhost` with the IP you get from running:\n", "\n", "```docker inspect -f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' luna_tutorial_girder_1```\n", "\n", "First, let's look at the CLI arguments, by running `--help`" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2022-04-06 21:21:30,463 - INFO - root - Initalized logger, log file at: data-processing.log\r\n", "Usage: dsa_annotation [OPTIONS] INPUT_DSA_ENDPOINT\r\n", "\r\n", " A cli tool\r\n", "\r\n", " Inputs:\r\n", " input_dsa_endpoint: Path to the DSA endpoint like http://localhost:8080/dsa/api/v1\r\n", " \b\r\n", " Outputs:\r\n", " slide_annotation_dataset\r\n", " \b\r\n", " Example:\r\n", " export DSA_USERNAME=username\r\n", " export DSA_PASSWORD=password\r\n", " dsa_annotation_etl http://localhost:8080/dsa/api/v1\r\n", " --collection_name tcga-data\r\n", " --annotation_name TumorVsOther\r\n", " -o /data/annotations/\r\n", "\r\n", "Options:\r\n", " -o, --output_dir TEXT path to output directory to save results\r\n", " -c, --collection_name TEXT name of the collection to pull data from in\r\n", " DSA\r\n", "\r\n", " -a, --annotation_name TEXT name of the annotations to pull from DSA (same\r\n", " annotation name for all slides)\r\n", "\r\n", " -u, --username TEXT DSA username, can be inferred from\r\n", " DSA_USERNAME\r\n", "\r\n", " -p, --password TEXT DSA password, should be inferred from\r\n", " DSA_PASSWORD\r\n", "\r\n", " -nc, --num_cores INTEGER Number of cores to use (default: 4)\r\n", " -m, --method_param_path TEXT path to a metadata json/yaml file with method\r\n", " parameters to reproduce results\r\n", "\r\n", " --help Show this message and exit.\r\n" ] } ], "source": [ "!dsa_annotation --help" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "scrolled": true, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2022-04-06 21:20:59,193 - INFO - root - Initalized logger, log file at: data-processing.log\n", "2022-04-06 21:20:59,195 - INFO - luna.common.utils - Running with {'output_dir': '../dsa_annotations', 'collection_name': 'tcga', 'annotation_name': 'test', 'num_cores': 1, 'username': 'admin', 'password': 'password', 'input_dsa_endpoint': 'http://localhost:8080/api/v1', 'method_param_path': None}\n", "2022-04-06 21:20:59,196 - INFO - luna.common.utils - Param input_dsa_endpoint set = http://localhost:8080/api/v1\n", "2022-04-06 21:20:59,196 - INFO - luna.common.utils - Param collection_name set = tcga\n", "2022-04-06 21:20:59,196 - INFO - luna.common.utils - Param annotation_name set = test\n", "2022-04-06 21:20:59,197 - INFO - luna.common.utils - Param num_cores set = 1\n", "2022-04-06 21:20:59,197 - INFO - luna.common.utils - Param username set = *****\n", "2022-04-06 21:20:59,197 - INFO - luna.common.utils - Param password set = *****\n", "2022-04-06 21:20:59,197 - INFO - luna.common.utils - Param output_dir set = ../dsa_annotations\n", "http://localhost:8080/api/v1/metadata.yml\n", "2022-04-06 21:20:59,198 - INFO - luna.common.utils - Full segment key set: {}\n", "\n", "----------------------------------- Running transform::dsa_annotation_etl -----------------------------------\n", "\n", "2022-04-06 21:20:59,499 - INFO - luna.pathology.dsa.dsa_api_handler - Connected to DSA @ http://localhost:8080/api/v1/\n", "2022-04-06 21:20:59,558 - INFO - numexpr.utils - Note: detected 160 virtual cores but NumExpr set to maximum of 64, check \"NUMEXPR_MAX_THREADS\" environment variable.\n", "2022-04-06 21:20:59,558 - INFO - numexpr.utils - Note: NumExpr detected 160 cores but \"NUMEXPR_MAX_THREADS\" not set, so enforcing safe limit of 8.\n", "2022-04-06 21:20:59,558 - INFO - numexpr.utils - NumExpr defaulting to 8 threads.\n", "2022-04-06 21:20:59,562 - INFO - luna.pathology.dsa.dsa_api_handler - Found collection id=610acb50150bd39c9d5e49e2 for collection=tcga\n", "2022-04-06 21:20:59,582 - INFO - luna.pathology.dsa.dsa_api_handler - Found 2 slides!\n", "2022-04-06 21:21:07,009 - INFO - root - Initalized logger, log file at: data-processing.log\n", "2022-04-06 21:21:07,147 - INFO - dsa_annotation_etl - Dashboard: http://10.254.130.15:8787/status\n", "2022-04-06 21:21:07,156 - INFO - dsa_annotation_etl - Trying to process annotation for slide_id=123, item_id=624c33a8150bd39c9d7ae182\n", "2022-04-06 21:21:07,157 - INFO - dsa_annotation_etl - Trying to process annotation for slide_id=TCGA-GM-A2DB-01Z-00-DX1.9EE36AA6-2594-44C7-B05C-91A0AEC7E511, item_id=610ad128150bd39c9d5e49e5\n", "2022-04-06 21:21:07,193 - INFO - luna.pathology.dsa.dsa_api_handler - Found 1 total annotations: {'otsu_score_tile-based heatmap'}\n", "2022-04-06 21:21:07,196 - INFO - luna.pathology.dsa.dsa_api_handler - Found 1 total annotations: {'test'}\n", "2022-04-06 21:21:07,210 - INFO - numexpr.utils - Note: detected 160 virtual cores but NumExpr set to maximum of 64, check \"NUMEXPR_MAX_THREADS\" environment variable.\n", "2022-04-06 21:21:07,211 - INFO - numexpr.utils - Note: NumExpr detected 160 cores but \"NUMEXPR_MAX_THREADS\" not set, so enforcing safe limit of 8.\n", "2022-04-06 21:21:07,211 - INFO - numexpr.utils - NumExpr defaulting to 8 threads.\n", "2022-04-06 21:21:07,218 - WARNING - luna.pathology.dsa.dsa_api_handler - No matching annotation 'test'\n", "2022-04-06 21:21:07,220 - INFO - luna.pathology.dsa.dsa_api_handler - Found an annotation called test!!!!\n", " annotation_girder_id ... y_coords\n", "0 624c33ba150bd39c9d7ae186 ... [2038, 1989, 1976, 1973, 1989, 2138, 2202, 220...\n", "1 624c33ba150bd39c9d7ae186 ... [1170, 1164, 1144, 1125, 1115, 1102, 1070, 102...\n", "\n", "[2 rows x 26 columns]\n", "2022-04-06 21:21:07,287 - INFO - dsa_annotation_etl - About to turn 2 geometric annotations into a geojson!\n", "2022-04-06 21:21:07,290 - INFO - dsa_annotation_etl - \tCreated geometry POLYGON ((1203 2038, 1255 1989, 1271 197...\n", "2022-04-06 21:21:07,290 - INFO - dsa_annotation_etl - \tCreated geometry POLYGON ((1235 1170, 1200 1164, 1168 114...\n", "2022-04-06 21:21:07,291 - INFO - dsa_annotation_etl - Checking geojson, errors with geojson FeatureCollection: []\n", " _id ... annotation_name\n", "slide_id ... \n", "123 624c33a8150bd39c9d7ae182 ... test\n", "123 624c33a8150bd39c9d7ae182 ... test\n", "123 624c33a8150bd39c9d7ae182 ... test\n", "\n", "[3 rows x 40 columns]\n", "2022-04-06 21:21:08,858 - INFO - dsa_annotation_etl - Created 1 geojsons, 0 points, and 2 polygons\n", "2022-04-06 21:21:09,160 - INFO - luna.common.utils - Code block 'transform::dsa_annotation_etl' took: 9.961238082963973s\n", "2022-04-06 21:21:09,162 - INFO - luna.common.utils - Done.\n" ] } ], "source": [ "# ingest annotations\n", "!dsa_annotation http://localhost:8080/api/v1 \\\n", "--output_dir ../dsa_annotations \\\n", "--collection_name tcga \\\n", "--annotation_name test \\\n", "--num_cores 1 \\\n", "--username admin --password password" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "total 33K\r\n", "-rw-rw-r-- 1 rosed2 rosed2 1.2K Apr 6 21:21 123.annotation.geojson\r\n", "-rw-rw-r-- 1 rosed2 rosed2 290 Apr 6 21:21 metadata.yml\r\n", "-rw-rw-r-- 1 rosed2 rosed2 29K Apr 6 21:21 slide_annotation_dataset_tcga_test.parquet\r\n" ] } ], "source": [ "# metadata, geojson, parquet table output\n", "!ls -lh ../dsa_annotations/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Annotations are saved in a parquet format, where 1 row represents an annotation element.\n", "\n", "We collect metadata about the annotation such as created timestamp and user.\n", "Note that different annotation types (point, regional) can be ingested using the same CLI" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Index(['_id', 'baseParentId', 'baseParentType', 'created', 'creatorId',\n", " 'description', 'folderId', 'largeImage', 'lowerName', 'name', 'size',\n", " 'updated', 'annotation_girder_id', '_modelType', '_version',\n", " 'createdannotation', 'creatorIdannotation', 'public',\n", " 'updatedannotation', 'updatedId', 'groups', 'element_count',\n", " 'element_details', 'annidx', 'elementidx', 'element_girder_id', 'type',\n", " 'group_name', 'label', 'color', 'xmin', 'xmax', 'ymin', 'ymax',\n", " 'bbox_area', 'x_coords', 'y_coords', 'slide_geojson', 'collection_name',\n", " 'annotation_name'],\n", " dtype='object')\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
_idbaseParentIdbaseParentTypecreatedcreatorIddescriptionfolderIdlargeImagelowerNamename...xminxmaxyminymaxbbox_areax_coordsy_coordsslide_geojsoncollection_nameannotation_name
slide_id
123624c33a8150bd39c9d7ae182610acb50150bd39c9d5e49e2collection2022-04-05T12:18:48.428000+00:00608072ed93e6d1c34ffe27b2610acb5f150bd39c9d5e49e3{'fileId': '624c33a8150bd39c9d7ae183', 'source...123.svs123.svs...1171.01345.01973.02202.039846.0[1203, 1255, 1271, 1287, 1345, 1329, 1238, 120...[2038, 1989, 1976, 1973, 1989, 2138, 2202, 220...Nonetcgatest
123624c33a8150bd39c9d7ae182610acb50150bd39c9d5e49e2collection2022-04-05T12:18:48.428000+00:00608072ed93e6d1c34ffe27b2610acb5f150bd39c9d5e49e3{'fileId': '624c33a8150bd39c9d7ae183', 'source...123.svs123.svs...1077.01461.0783.01193.0157440.0[1235, 1200, 1168, 1148, 1132, 1116, 1100, 108...[1170, 1164, 1144, 1125, 1115, 1102, 1070, 102...Nonetcgatest
123624c33a8150bd39c9d7ae182610acb50150bd39c9d5e49e2collection2022-04-05T12:18:48.428000+00:00608072ed93e6d1c34ffe27b2610acb5f150bd39c9d5e49e3{'fileId': '624c33a8150bd39c9d7ae183', 'source...123.svs123.svs...NaNNaNNaNNaNNaNNoneNone../dsa_annotations/123.annotation.geojsontcgatest
\n", "

3 rows × 40 columns

\n", "
" ], "text/plain": [ " _id baseParentId baseParentType \\\n", "slide_id \n", "123 624c33a8150bd39c9d7ae182 610acb50150bd39c9d5e49e2 collection \n", "123 624c33a8150bd39c9d7ae182 610acb50150bd39c9d5e49e2 collection \n", "123 624c33a8150bd39c9d7ae182 610acb50150bd39c9d5e49e2 collection \n", "\n", " created creatorId \\\n", "slide_id \n", "123 2022-04-05T12:18:48.428000+00:00 608072ed93e6d1c34ffe27b2 \n", "123 2022-04-05T12:18:48.428000+00:00 608072ed93e6d1c34ffe27b2 \n", "123 2022-04-05T12:18:48.428000+00:00 608072ed93e6d1c34ffe27b2 \n", "\n", " description folderId \\\n", "slide_id \n", "123 610acb5f150bd39c9d5e49e3 \n", "123 610acb5f150bd39c9d5e49e3 \n", "123 610acb5f150bd39c9d5e49e3 \n", "\n", " largeImage lowerName \\\n", "slide_id \n", "123 {'fileId': '624c33a8150bd39c9d7ae183', 'source... 123.svs \n", "123 {'fileId': '624c33a8150bd39c9d7ae183', 'source... 123.svs \n", "123 {'fileId': '624c33a8150bd39c9d7ae183', 'source... 123.svs \n", "\n", " name ... xmin xmax ymin ymax bbox_area \\\n", "slide_id ... \n", "123 123.svs ... 1171.0 1345.0 1973.0 2202.0 39846.0 \n", "123 123.svs ... 1077.0 1461.0 783.0 1193.0 157440.0 \n", "123 123.svs ... NaN NaN NaN NaN NaN \n", "\n", " x_coords \\\n", "slide_id \n", "123 [1203, 1255, 1271, 1287, 1345, 1329, 1238, 120... \n", "123 [1235, 1200, 1168, 1148, 1132, 1116, 1100, 108... \n", "123 None \n", "\n", " y_coords \\\n", "slide_id \n", "123 [2038, 1989, 1976, 1973, 1989, 2138, 2202, 220... \n", "123 [1170, 1164, 1144, 1125, 1115, 1102, 1070, 102... \n", "123 None \n", "\n", " slide_geojson collection_name \\\n", "slide_id \n", "123 None tcga \n", "123 None tcga \n", "123 ../dsa_annotations/123.annotation.geojson tcga \n", "\n", " annotation_name \n", "slide_id \n", "123 test \n", "123 test \n", "123 test \n", "\n", "[3 rows x 40 columns]" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# check annotation metadata table\n", "import pyarrow.parquet as pq\n", "\n", "annotation_table = pq.read_table('../dsa_annotations/slide_annotation_dataset_tcga_test.parquet').to_pandas()\n", "print(annotation_table.columns)\n", "annotation_table" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" } }, "nbformat": 4, "nbformat_minor": 4 }