========== User Guide ========== .. _Basic CLI Use: Basic Command Line Use ====================== After installation, the basic pattern for luna CLI tools looks like this: :: cli_call -a app_config.yaml -s id -m method_config.yaml ``app_config.yaml`` is a file that specifies how the backend is configured. It'll include parameters related to Spark and settings related to the object store. The ``id`` refers to the input object. This will vary depending on the CLI, but is often the filename or unique identifier of something like a slide. ``method_config.yaml`` are the parameters that are passed to the underlying python function call. To see examples of various CLI calls, take a look at some of the various :ref:`Tutorials` for our pathology workflows. API Use ======= In addition to launching jobs from the command lines, Luna can also be used in Python files or Jupyter Notebooks. After installation, you can simply import it into a Python file and access the built-in functions just as you would for any other library :: import luna from luna.pathology.common import preprocess Tutorials ========= The following notebook tutorials walk you through analysis workflows. .. toctree:: :maxdepth: 1 tutorials/dataset-prep.ipynb tutorials/tiling.ipynb tutorials/model-training.ipynb tutorials/inference-and-visualization.ipynb tutorials/end-to-end-pipeline.ipynb tutorials/dsa-tools.ipynb Pathology Workflow ================== At a high level, the pathology workflow that Luna supports resembles :numref:`fig-path_workflow` .. _fig-path_workflow: .. figure:: ./_static/media/fig-path_workflow.png :width: 600 The pathology workflow offers CLI utilities to take raw WSIs through the process of feature generation, model training and inference and visualization. Data Ingestion -------------- The process starts with Whole Slide Images (WSIs) and annotations from expert pathologists. WSIs are stored on disk with their cooresponding file paths and associated metadata are organized in Parquet_ tables. The format of the annotations can vary depending on the application the pathologists use to annotate the slide in, for example Digital Slide Archive (DSA) exports annotations as JSON objects. Internally, we convert and store these annotations as GeoJSON objects. Our GeoJSON format supports both point annotations, for objects like cells and lymphocytes, as well as regional annotation for when the pathologist annotates a large region of tissue. .. _Parquet: https://databricks.com/glossary/what-is-parquet Feature Generation and Model Development ---------------------------------------- We use the Tiling ETL to generate sequential tiles of arbitrary size from WSIs, at a specified magnification. At this point, tiles can also be filtered to remove extraneous or non-informative tiles, such as tiles within glass areas. Additionally, the Tiling ETL can be combined with available annotations, allowing users to generate exclusively labled tiles for use in building datasets for machine learning models. The tissue classifier package is simply a wrapper around a PyTorch training loop and can be easily changed to experiment with and develop custom models. The point annotations can be also be extracted and used to build models. Luna supports whole-slide cellular segmentation via QuPath through StarDist, a deep-learning based segmentation library. We’ve dockerized QuPath and StarDist together to allow for independent and headless processing of individual slides, which in practice, allow us to support processing parallelization across larger cohorts of slides. Visualization and Statistical Analysis -------------------------------------- Finally, as a final endpoint for visualizing models and slides, we have coupled DSA into our platform to provide a web interface for quality controlling slides and interactive evaluation of built model performance. In order to support the varying types of models that may arise during computational pathology workflows, we have built several CLIs to allow for robust model visualization through DSA for both tissue-level models and cellular models. We also offer statistical analysis modules to take features derived from infered tissue types and cells to predict clinical endpoints Troubleshooting =============== Here are some troubleshooting instructions and fixes to common issues.