{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Local Installation Setup Tutorial\n", "\n", "Hello! This tutorial shows you how to set up and tear down your workspace in a Jupyter Lab notebook (or ipython environment) in order to run the luna library. Here are the steps we will review:\n", "\n", "1. Prerequisites\n", "2. Set up your virtual environment\n", "3. Clone the repository and install dependencies\n", "4. Setup Luna packages and configurations\n", "5. Teardown your project and virtual environment\n", "6. References\n", "\n", "## 1. Prerequisites\n", "\n", "It is assumed you have a Jupyter lab environment set up for executing these notebooks. If not, you may follow the instruction at https://jupyterlab.readthedocs.io/en/stable/getting_started/installation.html to install the lab environment on your host system of choice. \n", "\n", "The prerequisites listed here must be installed on the host system and not through the jupyter lab (or ipython) environment. \n", "\n", "You must download Apache Spark to your local computer in the case that it is not already downloaded (https://spark.apache.org/downloads.html).\n", "\n", "Make sure that you have the correct version of Java, Scala, Python, and R installed in the correct place on your computer. Apache Spark runs on Java 8/11, Scala 2.12, Python 3.6+ and R 3.5+.\n", "\n", "Here are the links for installations of Java, Scala, Python, and R. Again, make sure you download the correct versions:\n", "\n", "Java AdoptOpenJDK: https://adoptopenjdk.net/installation.html\n", "Scala: https://www.scala-lang.org/download/\n", "Python: https://www.python.org/downloads/\n", "R: https://www.r-project.org/\n", "\n", "It is important to have the path to your Java installation in your JAVA_HOME environment variable. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "openjdk version \"1.8.0_275\"\n", "OpenJDK Runtime Environment (build 1.8.0_275-b01)\n", "OpenJDK 64-Bit Server VM (build 25.275-b01, mixed mode)\n", "Python 3.6.9\n", "JAVA_HOME= /gpfs/mskmindhdp_emc/sw/env/bin/java\n" ] } ], "source": [ "!java -version\n", "!python3 --version\n", "\n", "import os, subprocess\n", "os.environ['JAVA_HOME'] = subprocess.check_output(['bash','-c', 'which java']).decode(\"utf-8\")\n", "!echo 'JAVA_HOME=' $JAVA_HOME" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You must also download Hadoop for your computer. On mac, you may install with this command:\n", "\n", " brew install hadoop\n", "\n", "Hadoop has special installation instructions for MacBooks. Here is an instruction link for a single cluster as a guide: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html.\n", "\n", "Next, install Openslide (https://openslide.org/download/). This library will help with reading the svs images and their tiles. On mac, you may install with this command:\n", "\n", " brew install openslide\n", "\n", "Lastly, you must find the location where your Spark software is installed on your machine and the SPARK_HOME environnment variable yourself. You may find your Spark installation directory by executing, \n", "\n", " which spark-submit\n", " \n", "If for example, the output is \"/opt/spark-3.0.0-bin-hadoop3.2/bin/spark-submit\", then set your SPARK_HOME environment variable to \"/opt/spark-3.0.0-bin-hadoop3.2\" running the code below in a code cell.\n", "\n", " import os\n", " os.environ['SPARK_HOME']='/opt/spark-3.0.0-bin-hadoop3.2'\n", " !echo $SPARK_HOME" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Set up your virtual environment\n", "\n", "Next, set up your virtual environment within the jupyter lab (or ipython) environment. The end of this tutorial has steps for tearing down this virtual environment. \n", "\n", "Open a terminal in your Jupyter Lab environment by selecting File -> New -> Terminal and execute the following commands. It is assumed that your default python environment on the host system has python3-venv installed (sudo apt-get install python3-venv -y).\n", "\n", " # change directory to your pathology tutorial sandbox directory\n", " cd [LOCATION-WHERE-YOU-WANT-TO-CREATE-THE-VIRTUAL-ENV]\n", "\n", " # create the virtual environment\n", " python3 -m venv pt-venv\n", " \n", " # activate the virtual environment\n", " source pt-venv/bin/activate \n", " \n", " # upgrade pip\n", " pip install --upgrade pip\n", " \n", " # upgrade setuptools\n", " pip install --upgrade setuptools\n", " \n", " # install ipykernel\n", " pip install ipykernel\n", "\n", " # Register this env with jupyter lab. It’ll now show up in the\n", " # launcher & kernels list once you refresh the page\n", " python3 -m ipykernel install --user --name pt-venv --display-name \"luna venv\"\n", "\n", " # List kernels to ensure it was created successfully\n", " jupyter kernelspec list\n", " \n", " # deactivate the virtual environment in the terminal\n", " deactivate\n", "\n", "Now, apply the new kernel to your notebook by first selecting the default kernel (which is typically \"Python 3\") and then selecting your new kernel \"pathology tutorial venv\" from the drop-down list. **NOTE:** It may take a minute for the drop-down list to update. \n", "\n", "Any python packages you pip install through the jupyter environment will now persist only in this environment.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Clone the repository and install dependencies" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/gpfs/mskmindhdp_emc/user/shared_data_folder/rosed2\n", "Cloning into 'luna'...\n", "remote: Enumerating objects: 9764, done.\u001b[K\n", "remote: Counting objects: 100% (1615/1615), done.\u001b[K\n", "remote: Compressing objects: 100% (1121/1121), done.\u001b[K\n", "remote: Total 9764 (delta 831), reused 1075 (delta 382), pack-reused 8149\u001b[K\n", "Receiving objects: 100% (9764/9764), 129.57 MiB | 5.91 MiB/s, done.\n", "Resolving deltas: 100% (6390/6390), done.\n", "Checking out files: 100% (772/772), done.\n", "\u001b[01;34msandbox\u001b[00m\n", "└── \u001b[01;34mluna\u001b[00m\n", " ├── \u001b[01;34mconf\u001b[00m\n", " ├── \u001b[01;34mdata_processing\u001b[00m\n", " ├── \u001b[01;34mdocker\u001b[00m\n", " ├── \u001b[01;34mdocs\u001b[00m\n", " ├── \u001b[01;34mintegration\u001b[00m\n", " ├── \u001b[01;34mpyluna-common\u001b[00m\n", " ├── \u001b[01;34mpyluna-pathology\u001b[00m\n", " ├── \u001b[01;34mpyluna-radiology\u001b[00m\n", " ├── \u001b[01;34msrc\u001b[00m\n", " └── \u001b[01;34mtests\u001b[00m\n", "\n", "11 directories\n" ] } ], "source": [ "!pwd\n", "!mkdir sandbox && cd sandbox && git clone https://github.com/msk-mind/luna.git \n", "!tree -d -L 2 sandbox" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, install the luna package. To install luna subpackages, with more features specify a subpackage {radiology, pathology} or install all with `.[all]` to installl all features.\n", "\n", "In this example, we'll install pathology subpackage only.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> Note: for pyluna-* packages that are not on pypi, add your local path in setup.cfg files for installation to work correctly. For example, modify `absolute-path-to-luna-repo` and update your setup.cfgs with:\n", "\n", "> pyluna-pathology @ file://localhost/absolute-path-to-luna-repo/pyluna-pathology/" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[33m DEPRECATION: A future pip version will change local packages to be built in-place without first copying to a temporary directory. We recommend you use --use-feature=in-tree-build to test your packages with this new behavior before it becomes the default.\n", " pip 21.3 will remove support for this functionality. You can find discussion regarding this at https://github.com/pypa/pip/issues/7555.\u001b[0m\n", "Note: you may need to restart the kernel to use updated packages.\n" ] } ], "source": [ "%pip install sandbox/luna/.[pathology] -q" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "At this point, you may check if additional dependencies were installed by running the \"%pip list\" in the terminal.\n", "\n", "*Note: if you receive an error message about a particular installation during this process that halts the previous command from being fully executed, run '%pip install x', where x is the package, and then run the previous command again.*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you have followed all of these steps so far, your jupyter installation should be set up! Try importing the luna libraries." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "import luna\n", "import luna.pathology" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You should have no errors with this step. Congratulations, you are ready to move on to the dataset prep!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Setup Luna packages and configurations\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To run luna packages, we need to set up `$LUNA_HOME` and configurations.\n", "\n", "Currently, we have two configuration files.\n", "\n", "- **datastore.cfg**: configuration for the backend store of your data. POSIX and Minio backends are supported.\n", "\n", "- **logging.cfg**: configuration for logging level and optional central logging in MongoDB.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. Setup `LUNA_HOME` environment variable to point to a location where luna configs can be stored.\n" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "env: LUNA_HOME=sandbox/luna_home\n" ] } ], "source": [ "!mkdir sandbox/luna_home\n", "%env LUNA_HOME=sandbox/luna_home" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "sandbox/luna_home\n" ] } ], "source": [ "# check $LUNA_HOME\n", "!echo $LUNA_HOME" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "2. Copy `conf/` folder in luna repo to $LUNA_HOME/conf" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "cp -r sandbox/luna/conf/ $LUNA_HOME/conf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "3. In the `conf/` folder, copy `logging.default.yml` to `logging.cfg` and `datastore.default.yml` to `datastore.cfg` and modify the `.cfg` files to reflect your setup.\n" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [], "source": [ "%%sh\n", "\n", "cd $LUNA_HOME/conf\n", "cp logging.default.yml logging.cfg\n", "cp datastore.default.yml datastore.cfg" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "datastore.cfg datastore.default.yml logging.cfg logging.default.yml\n" ] } ], "source": [ "# check config directory\n", "!ls $LUNA_HOME/conf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Teardown your project and virtual environment\n", "\n", "**WARNING:** Follow these steps only after you are done with using this jupyter environment and you are ready to restore you sytem back to its original state. \n", "\n", " # in your jupyter terminal, uninstall the pt-venv kernel\n", " jupyter kernelspec uninstall pt-venv\n", " \n", " # delete the virtual environment \n", " rm -rf pt-venv\n", " \n", "Next, delete the sandbox." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!rm -rf sandbox" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6. References:\n", "\n", "Use Virtual Environments Inside Jupyter Notebooks & Jupter Lab [Best Practices] -\n", "https://www.zainrizvi.io/blog/jupyter-notebooks-best-practices-use-virtual-environments/\n", "\n", "Installing the IPython kernel - \n", "https://ipython.readthedocs.io/en/stable/install/kernel_install.html#kernels-for-python-2-and-3" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "luna venv", "language": "python", "name": "pt-venv" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" } }, "nbformat": 4, "nbformat_minor": 4 }