Local Installation Setup Tutorial

Hello! This tutorial shows you how to set up and tear down your workspace in a Jupyter Lab notebook (or ipython environment) in order to run the luna library. Here are the steps we will review:

  1. Prerequisites

  2. Set up your virtual environment

  3. Clone the repository and install dependencies

  4. Setup Luna packages and configurations

  5. Teardown your project and virtual environment

  6. References

1. Prerequisites

It is assumed you have a Jupyter lab environment set up for executing these notebooks. If not, you may follow the instruction at https://jupyterlab.readthedocs.io/en/stable/getting_started/installation.html to install the lab environment on your host system of choice.

The prerequisites listed here must be installed on the host system and not through the jupyter lab (or ipython) environment.

You must download Apache Spark to your local computer in the case that it is not already downloaded (https://spark.apache.org/downloads.html).

Make sure that you have the correct version of Java, Scala, Python, and R installed in the correct place on your computer. Apache Spark runs on Java 8/11, Scala 2.12, Python 3.6+ and R 3.5+.

Here are the links for installations of Java, Scala, Python, and R. Again, make sure you download the correct versions:

Java AdoptOpenJDK: https://adoptopenjdk.net/installation.html Scala: https://www.scala-lang.org/download/ Python: https://www.python.org/downloads/ R: https://www.r-project.org/

It is important to have the path to your Java installation in your JAVA_HOME environment variable.

[1]:
!java -version
!python3 --version

import os, subprocess
os.environ['JAVA_HOME'] = subprocess.check_output(['bash','-c', 'which java']).decode("utf-8")
!echo 'JAVA_HOME=' $JAVA_HOME
openjdk version "1.8.0_275"
OpenJDK Runtime Environment (build 1.8.0_275-b01)
OpenJDK 64-Bit Server VM (build 25.275-b01, mixed mode)
Python 3.6.9
JAVA_HOME= /gpfs/mskmindhdp_emc/sw/env/bin/java

You must also download Hadoop for your computer. On mac, you may install with this command:

brew install hadoop

Hadoop has special installation instructions for MacBooks. Here is an instruction link for a single cluster as a guide: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html.

Next, install Openslide (https://openslide.org/download/). This library will help with reading the svs images and their tiles. On mac, you may install with this command:

brew install openslide

Lastly, you must find the location where your Spark software is installed on your machine and the SPARK_HOME environnment variable yourself. You may find your Spark installation directory by executing,

which spark-submit

If for example, the output is “/opt/spark-3.0.0-bin-hadoop3.2/bin/spark-submit”, then set your SPARK_HOME environment variable to “/opt/spark-3.0.0-bin-hadoop3.2” running the code below in a code cell.

import os
os.environ['SPARK_HOME']='/opt/spark-3.0.0-bin-hadoop3.2'
!echo $SPARK_HOME

2. Set up your virtual environment

Next, set up your virtual environment within the jupyter lab (or ipython) environment. The end of this tutorial has steps for tearing down this virtual environment.

Open a terminal in your Jupyter Lab environment by selecting File -> New -> Terminal and execute the following commands. It is assumed that your default python environment on the host system has python3-venv installed (sudo apt-get install python3-venv -y).

# change directory to your pathology tutorial sandbox directory
cd [LOCATION-WHERE-YOU-WANT-TO-CREATE-THE-VIRTUAL-ENV]

# create the virtual environment
python3 -m venv pt-venv

# activate the virtual environment
source pt-venv/bin/activate

# upgrade pip
pip install --upgrade pip

# upgrade setuptools
pip install --upgrade setuptools

# install ipykernel
pip install ipykernel

# Register this env with jupyter lab. It’ll now show up in the
# launcher & kernels list once you refresh the page
python3 -m ipykernel install --user --name pt-venv --display-name "luna venv"

# List kernels to ensure it was created successfully
jupyter kernelspec list

# deactivate the virtual environment in the terminal
deactivate

Now, apply the new kernel to your notebook by first selecting the default kernel (which is typically “Python 3”) and then selecting your new kernel “pathology tutorial venv” from the drop-down list. NOTE: It may take a minute for the drop-down list to update.

Any python packages you pip install through the jupyter environment will now persist only in this environment.

3. Clone the repository and install dependencies

[2]:
!pwd
!mkdir sandbox && cd sandbox && git clone https://github.com/msk-mind/luna.git
!tree -d -L 2 sandbox
/gpfs/mskmindhdp_emc/user/shared_data_folder/rosed2
Cloning into 'luna'...
remote: Enumerating objects: 9764, done.
remote: Counting objects: 100% (1615/1615), done.
remote: Compressing objects: 100% (1121/1121), done.
remote: Total 9764 (delta 831), reused 1075 (delta 382), pack-reused 8149
Receiving objects: 100% (9764/9764), 129.57 MiB | 5.91 MiB/s, done.
Resolving deltas: 100% (6390/6390), done.
Checking out files: 100% (772/772), done.
sandbox
└── luna
    ├── conf
    ├── data_processing
    ├── docker
    ├── docs
    ├── integration
    ├── pyluna-common
    ├── pyluna-pathology
    ├── pyluna-radiology
    ├── src
    └── tests

11 directories

Next, install the luna package. To install luna subpackages, with more features specify a subpackage {radiology, pathology} or install all with .[all] to installl all features.

In this example, we’ll install pathology subpackage only.

Note: for pyluna-* packages that are not on pypi, add your local path in setup.cfg files for installation to work correctly. For example, modify absolute-path-to-luna-repo and update your setup.cfgs with:

pyluna-pathology @ file://localhost/absolute-path-to-luna-repo/pyluna-pathology/

[4]:
%pip install sandbox/luna/.[pathology] -q
  DEPRECATION: A future pip version will change local packages to be built in-place without first copying to a temporary directory. We recommend you use --use-feature=in-tree-build to test your packages with this new behavior before it becomes the default.
   pip 21.3 will remove support for this functionality. You can find discussion regarding this at https://github.com/pypa/pip/issues/7555.
Note: you may need to restart the kernel to use updated packages.

At this point, you may check if additional dependencies were installed by running the “%pip list” in the terminal.

Note: if you receive an error message about a particular installation during this process that halts the previous command from being fully executed, run ‘%pip install x’, where x is the package, and then run the previous command again.

If you have followed all of these steps so far, your jupyter installation should be set up! Try importing the luna libraries.

[7]:
import luna
import luna.pathology

You should have no errors with this step. Congratulations, you are ready to move on to the dataset prep!

4. Setup Luna packages and configurations

To run luna packages, we need to set up $LUNA_HOME and configurations.

Currently, we have two configuration files.

  • datastore.cfg: configuration for the backend store of your data. POSIX and Minio backends are supported.

  • logging.cfg: configuration for logging level and optional central logging in MongoDB.

  1. Setup LUNA_HOME environment variable to point to a location where luna configs can be stored.

[24]:
!mkdir sandbox/luna_home
%env LUNA_HOME=sandbox/luna_home
env: LUNA_HOME=sandbox/luna_home
[25]:
# check $LUNA_HOME
!echo $LUNA_HOME
sandbox/luna_home
  1. Copy conf/ folder in luna repo to $LUNA_HOME/conf

[26]:
cp -r sandbox/luna/conf/ $LUNA_HOME/conf
  1. In the conf/ folder, copy logging.default.yml to logging.cfg and datastore.default.yml to datastore.cfg and modify the .cfg files to reflect your setup.

[32]:
%%sh

cd $LUNA_HOME/conf
cp logging.default.yml logging.cfg
cp datastore.default.yml datastore.cfg
[35]:
# check config directory
!ls $LUNA_HOME/conf
datastore.cfg  datastore.default.yml  logging.cfg  logging.default.yml

5. Teardown your project and virtual environment

WARNING: Follow these steps only after you are done with using this jupyter environment and you are ready to restore you sytem back to its original state.

# in your jupyter terminal, uninstall the pt-venv kernel
jupyter kernelspec uninstall pt-venv

# delete the virtual environment
rm -rf pt-venv

Next, delete the sandbox.

[ ]:
!rm -rf sandbox

6. References:

Use Virtual Environments Inside Jupyter Notebooks & Jupter Lab [Best Practices] - https://www.zainrizvi.io/blog/jupyter-notebooks-best-practices-use-virtual-environments/

Installing the IPython kernel - https://ipython.readthedocs.io/en/stable/install/kernel_install.html#kernels-for-python-2-and-3

[ ]: