Merge branch 'master' into patch-1

48def68d · Peter Parente · GitHub · 8c48791c · fa04fea6 · 48def68d
Commit 48def68d authored May 27, 2020 by Peter Parente Committed by GitHub May 27, 2020
8 changed files
--- a/all-spark-notebook/README.md
+++ b/all-spark-notebook/README.md
 [![docker pulls](https://img.shields.io/docker/pulls/jupyter/all-spark-notebook.svg)](https://hub.docker.com/r/jupyter/all-spark-notebook/) [![docker stars](https://img.shields.io/docker/stars/jupyter/all-spark-notebook.svg)](https://hub.docker.com/r/jupyter/all-spark-notebook/) [![image metadata](https://images.microbadger.com/badges/image/jupyter/all-spark-notebook.svg)](https://microbadger.com/images/jupyter/all-spark-notebook "jupyter/all-spark-notebook image metadata")

-# Jupyter Notebook Python, Scala, R, Spark, Mesos Stack
+# Jupyter Notebook Python, Scala, R, Spark Stack

 Please visit the documentation site for help using and contributing to this image and others.


--- a/base-notebook/Dockerfile
+++ b/base-notebook/Dockerfile
@@ -112,7 +112,7 @@ RUN conda install --quiet --yes 'tini=0.18.0' && \
 RUN conda install --quiet --yes \
    'notebook=6.0.3' \
    'jupyterhub=1.1.0' \
-    'jupyterlab=2.1.1' && \
+    'jupyterlab=2.1.3' && \
    conda clean --all -f -y && \
    npm cache clean --force && \
    jupyter notebook --generate-config && \

--- a/docs/locale/en/LC_MESSAGES/using.po
+++ b/docs/locale/en/LC_MESSAGES/using.po
--- a/docs/using/selecting.md
+++ b/docs/using/selecting.md
@@ -116,11 +116,10 @@ packages from [conda-forge](https://conda-forge.github.io/feedstocks)
 | [Dockerfile commit history](https://github.com/jupyter/docker-stacks/commits/master/pyspark-notebook/Dockerfile)
 | [Docker Hub image tags](https://hub.docker.com/r/jupyter/pyspark-notebook/tags/)

-`jupyter/pyspark-notebook` includes Python support for Apache Spark, optionally on Mesos.
+`jupyter/pyspark-notebook` includes Python support for Apache Spark.

 * Everything in `jupyter/scipy-notebook` and its ancestor images
 * [Apache Spark](https://spark.apache.org/) with Hadoop binaries
-* [Mesos](http://mesos.apache.org/) client libraries

 ### jupyter/all-spark-notebook

@@ -128,7 +127,7 @@ packages from [conda-forge](https://conda-forge.github.io/feedstocks)
 | [Dockerfile commit history](https://github.com/jupyter/docker-stacks/commits/master/all-spark-notebook/Dockerfile)
 | [Docker Hub image tags](https://hub.docker.com/r/jupyter/all-spark-notebook/tags/)

-`jupyter/all-spark-notebook` includes Python, R, and Scala support for Apache Spark, optionally on Mesos.
+`jupyter/all-spark-notebook` includes Python, R, and Scala support for Apache Spark.

 * Everything in `jupyter/pyspark-notebook` and its ancestor images
 * [IRKernel](https://irkernel.github.io/) to support R code in Jupyter notebooks

--- a/docs/using/specifics.md
+++ b/docs/using/specifics.md
@@ -38,7 +38,8 @@ head(filter(df, df$Petal_Width > 0.2))

 #### In a Spylon Kernel Scala Notebook

-Spylon kernel instantiates a `SparkContext` for you in variable `sc` after you configure Spark options in a `%%init_spark` magic cell.
+Spylon kernel instantiates a `SparkContext` for you in variable `sc` after you configure Spark
+options in a `%%init_spark` magic cell.

 ```python
 %%init_spark
@@ -61,15 +62,18 @@ val rdd = sc.parallelize(0 to 999)
 rdd.takeSample(false, 5)
 ```

-### Connecting to a Spark Cluster on Mesos
+### Connecting to a Spark Cluster in Standalone Mode

-This configuration allows your compute cluster to scale with your data.
+Connection to Spark Cluster on Standalone Mode requires the following set of steps:

-0. [Deploy Spark on Mesos](http://spark.apache.org/docs/latest/running-on-mesos.html).
-1. Configure each slave with [the `--no-switch_user` flag](https://open.mesosphere.com/reference/mesos-slave/) or create the `$NB_USER` account on every slave node.
-2. Run the Docker container with `--net=host` in a location that is network addressable by all of your Spark workers. (This is a [Spark networking requirement](http://spark.apache.org/docs/latest/cluster-overview.html#components).)
-    * NOTE: When using `--net=host`, you must also use the flags `--pid=host -e TINI_SUBREAPER=true`. See https://github.com/jupyter/docker-stacks/issues/64 for details.
-3. Follow the language specific instructions below.
+0. Verify that the docker image (check the Dockerfile) and the Spark Cluster which is being
+   deployed, run the same version of Spark.
+1. [Deploy Spark in Standalone Mode](http://spark.apache.org/docs/latest/spark-standalone.html).
+2. Run the Docker container with `--net=host` in a location that is network addressable by all of
+   your Spark workers. (This is a [Spark networking
+   requirement](http://spark.apache.org/docs/latest/cluster-overview.html#components).)
+    * NOTE: When using `--net=host`, you must also use the flags `--pid=host -e
+      TINI_SUBREAPER=true`. See https://github.com/jupyter/docker-stacks/issues/64 for details.

 #### In a Python Notebook

@@ -81,8 +85,8 @@ os.environ['PYSPARK_PYTHON'] = '/usr/bin/python3'
 import pyspark
 conf = pyspark.SparkConf()

-# point to mesos master or zookeeper entry (e.g., zk://10.10.10.10:2181/mesos)
-conf.setMaster("mesos://10.10.10.10:5050")
+# Point to spark master
+conf.setMaster("spark://10.10.10.10:7070")
 # point to spark binary package in HDFS or on local filesystem on all slave
 # nodes (e.g., file:///opt/spark/spark-2.2.0-bin-hadoop2.7.tgz)
 conf.set("spark.executor.uri", "hdfs://10.10.10.10/spark/spark-2.2.0-bin-hadoop2.7.tgz")
@@ -103,12 +107,12 @@ rdd.sumApprox(3)
 ```r
 library(SparkR)

-# Point to mesos master or zookeeper entry (e.g., zk://10.10.10.10:2181/mesos)
-# Point to spark binary package in HDFS or on local filesystem on all slave
+# Point to spark master
+# Point to spark binary package in HDFS or on local filesystem on all worker
 # nodes (e.g., file:///opt/spark/spark-2.2.0-bin-hadoop2.7.tgz) in sparkEnvir
 # Set other options in sparkEnvir
-sc <- sparkR.session("mesos://10.10.10.10:5050", sparkEnvir=list(
-    spark.executor.uri="hdfs://10.10.10.10/spark/spark-2.2.0-bin-hadoop2.7.tgz",
+sc <- sparkR.session("spark://10.10.10.10:7070", sparkEnvir=list(
+    spark.executor.uri="hdfs://10.10.10.10/spark/spark-2.4.3-bin-hadoop2.7.tgz",
    spark.executor.memory="8g"
    )
 )
@@ -123,9 +127,9 @@ head(filter(df, df$Petal_Width > 0.2))

 ```python
 %%init_spark
-# Configure the location of the mesos master and spark distribution on HDFS
-launcher.master = "mesos://10.10.10.10:5050"
-launcher.conf.spark.executor.uri=hdfs://10.10.10.10/spark/spark-2.2.0-bin-hadoop2.7.tgz
+# Point to spark master
+launcher.master = "spark://10.10.10.10:7070"
+launcher.conf.spark.executor.uri=hdfs://10.10.10.10/spark/spark-2.4.3-bin-hadoop2.7.tgz
 ```

 ```scala
@@ -136,17 +140,22 @@ rdd.takeSample(false, 5)

 #### In an Apache Toree Scala Notebook

-The Apache Toree kernel automatically creates a `SparkContext` when it starts based on configuration information from its command line arguments and environment variables. You can pass information about your Mesos cluster via the `SPARK_OPTS` environment variable when you spawn a container.
+The Apache Toree kernel automatically creates a `SparkContext` when it starts based on configuration
+information from its command line arguments and environment variables. You can pass information
+about your cluster via the `SPARK_OPTS` environment variable when you spawn a container.

-For instance, to pass information about a Mesos master, Spark binary location in HDFS, and an executor options, you could start the container like so:
+For instance, to pass information about a standalone Spark master, Spark binary location in HDFS,
+and an executor options, you could start the container like so:

 ```
-docker run -d -p 8888:8888 -e SPARK_OPTS='--master=mesos://10.10.10.10:5050 \
-    --spark.executor.uri=hdfs://10.10.10.10/spark/spark-2.2.0-bin-hadoop2.7.tgz \
+docker run -d -p 8888:8888 -e SPARK_OPTS='--master=spark://10.10.10.10:7070 \
+    --spark.executor.uri=hdfs://10.10.10.10/spark/spark-2.4.3-bin-hadoop2.7.tgz \
    --spark.executor.memory=8g' jupyter/all-spark-notebook
 ```

-Note that this is the same information expressed in a notebook in the Python case above. Once the kernel spec has your cluster information, you can test your cluster in an Apache Toree notebook like so:
+Note that this is the same information expressed in a notebook in the Python case above. Once the
+kernel spec has your cluster information, you can test your cluster in an Apache Toree notebook like
+so:

 ```scala
 // should print the value of --master in the kernel spec
@@ -157,19 +166,10 @@ val rdd = sc.parallelize(0 to 99999999)
 rdd.sum()
 ```

-### Connecting to a Spark Cluster in Standalone Mode
-
-Connection to Spark Cluster on Standalone Mode requires the following set of steps:
-
-0. Verify that the docker image (check the Dockerfile) and the Spark Cluster which is being deployed, run the same version of Spark.
-1. [Deploy Spark in Standalone Mode](http://spark.apache.org/docs/latest/spark-standalone.html).
-2. Run the Docker container with `--net=host` in a location that is network addressable by all of your Spark workers. (This is a [Spark networking requirement](http://spark.apache.org/docs/latest/cluster-overview.html#components).)
-    * NOTE: When using `--net=host`, you must also use the flags `--pid=host -e TINI_SUBREAPER=true`. See https://github.com/jupyter/docker-stacks/issues/64 for details.
-3. The language specific instructions are almost same as mentioned above for Mesos, only the master url would now be something like spark://10.10.10.10:7077
-
 ## Tensorflow

-The `jupyter/tensorflow-notebook` image supports the use of [Tensorflow](https://www.tensorflow.org/) in single machine or distributed mode.
+The `jupyter/tensorflow-notebook` image supports the use of
+[Tensorflow](https://www.tensorflow.org/) in single machine or distributed mode.

 ### Single Machine Mode


--- a/pyspark-notebook/Dockerfile
+++ b/pyspark-notebook/Dockerfile
@@ -15,7 +15,7 @@ RUN apt-get -y update && \
    apt-get install --no-install-recommends -y openjdk-8-jre-headless ca-certificates-java && \
    rm -rf /var/lib/apt/lists/*

-# Using the preferred mirror to download the file
+# Using the preferred mirror to download Spark
 RUN cd /tmp && \
    wget -q $(wget -qO- https://www.apache.org/dyn/closer.lua/spark/spark-${APACHE_SPARK_VERSION}/spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz\?as_json | \
    python -c "import sys, json; content=json.load(sys.stdin); print(content['preferred']+content['path_info'])") && \
@@ -24,23 +24,9 @@ RUN cd /tmp && \
    rm spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz
 RUN cd /usr/local && ln -s spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_VERSION} spark

-# Mesos dependencies
-# Install from the Xenial Mesosphere repository since there does not (yet)
-# exist a Bionic repository and the dependencies seem to be compatible for now.
-COPY mesos.key /tmp/
-RUN apt-get -y update && \
-    apt-get install --no-install-recommends -y gnupg && \
-    apt-key add /tmp/mesos.key && \
-    echo "deb http://repos.mesosphere.io/ubuntu xenial main" > /etc/apt/sources.list.d/mesosphere.list && \
-    apt-get -y update && \
-    apt-get --no-install-recommends -y install mesos=1.2\* && \
-    apt-get purge --auto-remove -y gnupg && \
-    rm -rf /var/lib/apt/lists/*
-
-# Spark and Mesos config
+# Configure Spark
 ENV SPARK_HOME=/usr/local/spark
 ENV PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.7-src.zip \
-    MESOS_NATIVE_LIBRARY=/usr/local/lib/libmesos.so \
    SPARK_OPTS="--driver-java-options=-Xms1024M --driver-java-options=-Xmx4096M --driver-java-options=-Dlog4j.logLevel=info" \
    PATH=$PATH:$SPARK_HOME/bin


--- a/pyspark-notebook/README.md
+++ b/pyspark-notebook/README.md
 [![docker pulls](https://img.shields.io/docker/pulls/jupyter/pyspark-notebook.svg)](https://hub.docker.com/r/jupyter/pyspark-notebook/) [![docker stars](https://img.shields.io/docker/stars/jupyter/pyspark-notebook.svg)](https://hub.docker.com/r/jupyter/pyspark-notebook/) [![image metadata](https://images.microbadger.com/badges/image/jupyter/pyspark-notebook.svg)](https://microbadger.com/images/jupyter/pyspark-notebook "jupyter/pyspark-notebook image metadata")

-# Jupyter Notebook Python, Spark, Mesos Stack
+# Jupyter Notebook Python, Spark Stack

 Please visit the documentation site for help using and contributing to this image and others.


--- a/pyspark-notebook/mesos.key
+++ b/pyspark-notebook/mesos.key