Commit 628fbcb2 authored by Peter Parente's avatar Peter Parente Committed by GitHub

Merge pull request #404 from pando85/master

[Update] Spark updated to 2.1.1
parents d7b570a1 e304b3a7
......@@ -11,8 +11,8 @@
* Scala 2.11.x
* pyspark, pandas, matplotlib, scipy, seaborn, scikit-learn pre-installed for Python
* ggplot2, rcurl preinstalled for R
* Spark 2.0.2 with Hadoop 2.7 for use in local mode or to connect to a cluster of Spark workers
* Mesos client 0.25 binary that can communicate with a Mesos master
* Spark 2.1.1 with Hadoop 2.7 for use in local mode or to connect to a cluster of Spark workers
* Mesos client 1.2 binary that can communicate with a Mesos master
* spylon-kernel
* Unprivileged user `jovyan` (uid=1000, configurable, see options) in group `users` (gid=100) with ownership over `/home/jovyan` and `/opt/conda`
* [tini](https://github.com/krallin/tini) as the container entrypoint and [start-notebook.sh](../base-notebook/start-notebook.sh) as the default command
......@@ -124,8 +124,8 @@ conf = pyspark.SparkConf()
# point to mesos master or zookeeper entry (e.g., zk://10.10.10.10:2181/mesos)
conf.setMaster("mesos://10.10.10.10:5050")
# point to spark binary package in HDFS or on local filesystem on all slave
# nodes (e.g., file:///opt/spark/spark-2.0.2-bin-hadoop2.7.tgz)
conf.set("spark.executor.uri", "hdfs://10.10.10.10/spark/spark-2.0.2-bin-hadoop2.7.tgz")
# nodes (e.g., file:///opt/spark/spark-2.1.1-bin-hadoop2.7.tgz)
conf.set("spark.executor.uri", "hdfs://10.10.10.10/spark/spark-2.1.1-bin-hadoop2.7.tgz")
# set other options as desired
conf.set("spark.executor.memory", "8g")
conf.set("spark.core.connection.ack.wait.timeout", "1200")
......@@ -157,10 +157,10 @@ library(SparkR)
# point to mesos master or zookeeper entry (e.g., zk://10.10.10.10:2181/mesos)\
# as the first argument
# point to spark binary package in HDFS or on local filesystem on all slave
# nodes (e.g., file:///opt/spark/spark-2.0.2-bin-hadoop2.7.tgz) in sparkEnvir
# nodes (e.g., file:///opt/spark/spark-2.1.1-bin-hadoop2.7.tgz) in sparkEnvir
# set other options in sparkEnvir
sc <- sparkR.session("mesos://10.10.10.10:5050", sparkEnvir=list(
spark.executor.uri="hdfs://10.10.10.10/spark/spark-2.0.2-bin-hadoop2.7.tgz",
spark.executor.uri="hdfs://10.10.10.10/spark/spark-2.1.1-bin-hadoop2.7.tgz",
spark.executor.memory="8g"
)
)
......@@ -183,7 +183,7 @@ The Apache Toree kernel automatically creates a `SparkContext` when it starts ba
For instance, to pass information about a Mesos master, Spark binary location in HDFS, and an executor options, you could start the container like so:
`docker run -d -p 8888:8888 -e SPARK_OPTS '--master=mesos://10.10.10.10:5050 \
--spark.executor.uri=hdfs://10.10.10.10/spark/spark-2.0.2-bin-hadoop2.7.tgz \
--spark.executor.uri=hdfs://10.10.10.10/spark/spark-2.1.1-bin-hadoop2.7.tgz \
--spark.executor.memory=8g' jupyter/all-spark-notebook`
Note that this is the same information expressed in a notebook in the Python case above. Once the kernel spec has your cluster information, you can test your cluster in an Apache Toree notebook like so:
......
......@@ -7,7 +7,7 @@ MAINTAINER Jupyter Project <jupyter@googlegroups.com>
USER root
# Spark dependencies
ENV APACHE_SPARK_VERSION 2.1.0
ENV APACHE_SPARK_VERSION 2.1.1
ENV HADOOP_VERSION 2.7
# Temporarily add jessie backports to get openjdk 8, but then remove that source
......@@ -19,7 +19,7 @@ RUN echo 'deb http://cdn-fastly.deb.debian.org/debian jessie-backports main' > /
rm -rf /var/lib/apt/lists/*
RUN cd /tmp && \
wget -q http://d3kbcqa49mib13.cloudfront.net/spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz && \
echo "3fc94096ae34f9a1a148d37e5ed640a7e5de1812f1f2ecd715d92bbf2901e895cf4b93e6d8ee0d64debb5df7c56d673c0a36e5fc49503ec0f4507eb0edf961a4 *spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz" | sha512sum -c - && \
echo "4b6427ca6dc6f888b21bff9f9a354260af4a0699a1f43caabf58ae6030951ee5fa8b976497aa33de7e4ae55609d47a80bfe66dfc48c79ea28e3e5b03bdaaba11 *spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz" | sha512sum -c - && \
tar xzf spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz -C /usr/local && \
rm spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz
RUN cd /usr/local && ln -s spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_VERSION} spark
......
......@@ -7,8 +7,8 @@
* Jupyter Notebook 5.0.x
* Conda Python 3.x and Python 2.7.x environments
* pyspark, pandas, matplotlib, scipy, seaborn, scikit-learn pre-installed
* Spark 2.1.0 with Hadoop 2.7 for use in local mode or to connect to a cluster of Spark workers
* Mesos client 0.25 binary that can communicate with a Mesos master
* Spark 2.1.1 with Hadoop 2.7 for use in local mode or to connect to a cluster of Spark workers
* Mesos client 1.2 binary that can communicate with a Mesos master
* Unprivileged user `jovyan` (uid=1000, configurable, see options) in group `users` (gid=100) with ownership over `/home/jovyan` and `/opt/conda`
* [tini](https://github.com/krallin/tini) as the container entrypoint and [start-notebook.sh](../base-notebook/start-notebook.sh) as the default command
* A [start-singleuser.sh](../base-notebook/start-singleuser.sh) script useful for running a single-user instance of the Notebook server, as required by JupyterHub
......@@ -70,8 +70,8 @@ conf = pyspark.SparkConf()
# point to mesos master or zookeeper entry (e.g., zk://10.10.10.10:2181/mesos)
conf.setMaster("mesos://10.10.10.10:5050")
# point to spark binary package in HDFS or on local filesystem on all slave
# nodes (e.g., file:///opt/spark/spark-2.0.2-bin-hadoop2.7.tgz)
conf.set("spark.executor.uri", "hdfs://10.122.193.209/spark/spark-2.0.2-bin-hadoop2.7.tgz")
# nodes (e.g., file:///opt/spark/spark-2.1.1-bin-hadoop2.7.tgz)
conf.set("spark.executor.uri", "hdfs://10.122.193.209/spark/spark-2.1.1-bin-hadoop2.7.tgz")
# set other options as desired
conf.set("spark.executor.memory", "8g")
conf.set("spark.core.connection.ack.wait.timeout", "1200")
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment