Bump Spark to 1.6.1

(c) Copyright IBM Corp. 2016

Bump Spark to 1.6.1
(c) Copyright IBM Corp. 2016
eb754a1a · Peter Parente · Gino Bustelo · 0b3c834f · eb754a1a · eb754a1a
Commit eb754a1a authored May 06, 2016 by Peter Parente Committed by Gino Bustelo May 10, 2016
Hide whitespace changes
Inline Side-by-side

Showing with 8 additions and 28 deletions

all-spark-notebook/README.md all-spark-notebook/README.md +6 -26

pyspark-notebook/Dockerfile pyspark-notebook/Dockerfile +2 -2

No files found.
--- a/all-spark-notebook/README.md
+++ b/all-spark-notebook/README.md
@@ -161,37 +161,17 @@ head(filter(df, df$Petal_Width > 0.2))
 ### In an Apache Toree (Scala) Notebook
 0. Open a terminal via *New -> Terminal* in the notebook interface.
-1. Add information about your cluster to the Scala kernel spec file in `~/.local/share/jupyter/kernels/apache_toree/kernel.json`. (See below.)
+1. Add information about your cluster to the `SPARK_OPTS` environment variable when running the container.
 2. Open an Apache Toree (Scala) notebook.
 3. Use the pre-configured `SparkContext` in variable `sc`.
-The Apache Toree kernel automatically creates a `SparkContext` when it starts based on configuration information from its command line arguments and environments. Therefore, you must add it to the Toree kernel spec file. You cannot, at present, configure it yourself within a notebook.
+The Apache Toree kernel automatically creates a `SparkContext` when it starts based on configuration information from its command line arguments and environment variables. You can pass information about your Mesos cluster via the `SPARK_OPTS` environment variable when you spawn a container.
-For instance, a kernel spec file with information about a Mesos master, Spark binary location in HDFS, and an executor option appears here:
+For instance, to pass information about a Mesos master, Spark binary location in HDFS, and an executor options, you could start the container like so:
-```
+`docker run -d -p 8888:8888 -e SPARK_OPTS '--master=mesos://10.10.10.10:5050 \
-{
+    --spark.executor.uri=hdfs://10.10.10.10/spark/spark-1.6.0-bin-hadoop2.6.tgz \
-  "codemirror_mode": "scala",
+    --spark.executor.memory=8g' jupyter/all-spark-notebook`
-  "display_name": "Apache_Toree",
-  "language_info": {
-    "name": "scala"
-  },
-  "argv": [
-    "/home/jovyan/.local/share/jupyter/kernels/apache_toree/bin/run.sh",
-    "--profile",
-    "{connection_file}"
-  ],
-  "env": {
-    "CAPTURE_STANDARD_ERR": "true",
-    "SPARK_HOME": "/usr/local/spark",
-    "SEND_EMPTY_OUTPUT": "false",
-    "SPARK_OPTS": "--master=mesos://10.10.10.10:5050 --driver-java-options=-Xms1024M --driver-java-options=-Xmx4096M --driver-java-options=-Dlog4j.logLevel=info --spark.executor.memory=8g --spark.executor.uri=hdfs://10.10.10.10/spark/spark-1.6.0-bin-hadoop2.6.tgz",
-    "CAPTURE_STANDARD_OUT": "true",
-    "PYTHONPATH": "/usr/local/spark/python:/usr/local/spark/python/lib/py4j-0.9-src.zip",
-    "MAX_INTERPRETER_THREADS": "16"
-  }
-}
-```
 Note that this is the same information expressed in a notebook in the Python case above. Once the kernel spec has your cluster information, you can test your cluster in an Apache Toree notebook like so:

--- a/pyspark-notebook/Dockerfile
+++ b/pyspark-notebook/Dockerfile
@@ -10,14 +10,14 @@ USER root
 RUN apt-get -y update && apt-get -y install jq && apt-get clean && rm -rf /var/lib/apt/lists/*
 # Spark dependencies
-ENV APACHE_SPARK_VERSION 1.6.0
+ENV APACHE_SPARK_VERSION 1.6.1
 RUN apt-get -y update && \
    apt-get install -y --no-install-recommends openjdk-7-jre-headless && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*
 RUN cd /tmp && \
        wget -q http://d3kbcqa49mib13.cloudfront.net/spark-${APACHE_SPARK_VERSION}-bin-hadoop2.6.tgz && \
-        echo "439fe7793e0725492d3d36448adcd1db38f438dd1392bffd556b58bb9a3a2601 *spark-${APACHE_SPARK_VERSION}-bin-hadoop2.6.tgz" | sha256sum -c - && \
+        echo "09f3b50676abc9b3d1895773d18976953ee76945afa72fa57e6473ce4e215970 *spark-${APACHE_SPARK_VERSION}-bin-hadoop2.6.tgz" | sha256sum -c - && \
        tar xzf spark-${APACHE_SPARK_VERSION}-bin-hadoop2.6.tgz -C /usr/local && \
        rm spark-${APACHE_SPARK_VERSION}-bin-hadoop2.6.tgz
 RUN cd /usr/local && ln -s spark-${APACHE_SPARK_VERSION}-bin-hadoop2.6 spark