Commit 401575ed authored by Peter Parente's avatar Peter Parente

Merge pull request #204 from lbustelo/UpgradeToree0.1.0.dev5

Bump Spark to 1.6.1, Toree to 0.1.0.dev7
parents 83ec4d26 b412388f
...@@ -28,5 +28,5 @@ RUN conda config --add channels r && \ ...@@ -28,5 +28,5 @@ RUN conda config --add channels r && \
'r-rcurl=1.95*' && conda clean -tipsy 'r-rcurl=1.95*' && conda clean -tipsy
# Apache Toree kernel # Apache Toree kernel
RUN pip install toree==0.1.0.dev4 RUN pip install toree==0.1.0.dev7
RUN jupyter toree install --user --kernel_name='Apache_Toree' RUN jupyter toree install --user
...@@ -161,37 +161,17 @@ head(filter(df, df$Petal_Width > 0.2)) ...@@ -161,37 +161,17 @@ head(filter(df, df$Petal_Width > 0.2))
### In an Apache Toree (Scala) Notebook ### In an Apache Toree (Scala) Notebook
0. Open a terminal via *New -> Terminal* in the notebook interface. 0. Open a terminal via *New -> Terminal* in the notebook interface.
1. Add information about your cluster to the Scala kernel spec file in `~/.local/share/jupyter/kernels/apache_toree/kernel.json`. (See below.) 1. Add information about your cluster to the `SPARK_OPTS` environment variable when running the container.
2. Open an Apache Toree (Scala) notebook. 2. Open an Apache Toree (Scala) notebook.
3. Use the pre-configured `SparkContext` in variable `sc`. 3. Use the pre-configured `SparkContext` in variable `sc`.
The Apache Toree kernel automatically creates a `SparkContext` when it starts based on configuration information from its command line arguments and environments. Therefore, you must add it to the Toree kernel spec file. You cannot, at present, configure it yourself within a notebook. The Apache Toree kernel automatically creates a `SparkContext` when it starts based on configuration information from its command line arguments and environment variables. You can pass information about your Mesos cluster via the `SPARK_OPTS` environment variable when you spawn a container.
For instance, a kernel spec file with information about a Mesos master, Spark binary location in HDFS, and an executor option appears here: For instance, to pass information about a Mesos master, Spark binary location in HDFS, and an executor options, you could start the container like so:
``` `docker run -d -p 8888:8888 -e SPARK_OPTS '--master=mesos://10.10.10.10:5050 \
{ --spark.executor.uri=hdfs://10.10.10.10/spark/spark-1.6.0-bin-hadoop2.6.tgz \
"codemirror_mode": "scala", --spark.executor.memory=8g' jupyter/all-spark-notebook`
"display_name": "Apache_Toree",
"language_info": {
"name": "scala"
},
"argv": [
"/home/jovyan/.local/share/jupyter/kernels/apache_toree/bin/run.sh",
"--profile",
"{connection_file}"
],
"env": {
"CAPTURE_STANDARD_ERR": "true",
"SPARK_HOME": "/usr/local/spark",
"SEND_EMPTY_OUTPUT": "false",
"SPARK_OPTS": "--master=mesos://10.10.10.10:5050 --driver-java-options=-Xms1024M --driver-java-options=-Xmx4096M --driver-java-options=-Dlog4j.logLevel=info --spark.executor.memory=8g --spark.executor.uri=hdfs://10.10.10.10/spark/spark-1.6.0-bin-hadoop2.6.tgz",
"CAPTURE_STANDARD_OUT": "true",
"PYTHONPATH": "/usr/local/spark/python:/usr/local/spark/python/lib/py4j-0.9-src.zip",
"MAX_INTERPRETER_THREADS": "16"
}
}
```
Note that this is the same information expressed in a notebook in the Python case above. Once the kernel spec has your cluster information, you can test your cluster in an Apache Toree notebook like so: Note that this is the same information expressed in a notebook in the Python case above. Once the kernel spec has your cluster information, you can test your cluster in an Apache Toree notebook like so:
......
...@@ -10,14 +10,14 @@ USER root ...@@ -10,14 +10,14 @@ USER root
RUN apt-get -y update && apt-get -y install jq && apt-get clean && rm -rf /var/lib/apt/lists/* RUN apt-get -y update && apt-get -y install jq && apt-get clean && rm -rf /var/lib/apt/lists/*
# Spark dependencies # Spark dependencies
ENV APACHE_SPARK_VERSION 1.6.0 ENV APACHE_SPARK_VERSION 1.6.1
RUN apt-get -y update && \ RUN apt-get -y update && \
apt-get install -y --no-install-recommends openjdk-7-jre-headless && \ apt-get install -y --no-install-recommends openjdk-7-jre-headless && \
apt-get clean && \ apt-get clean && \
rm -rf /var/lib/apt/lists/* rm -rf /var/lib/apt/lists/*
RUN cd /tmp && \ RUN cd /tmp && \
wget -q http://d3kbcqa49mib13.cloudfront.net/spark-${APACHE_SPARK_VERSION}-bin-hadoop2.6.tgz && \ wget -q http://d3kbcqa49mib13.cloudfront.net/spark-${APACHE_SPARK_VERSION}-bin-hadoop2.6.tgz && \
echo "439fe7793e0725492d3d36448adcd1db38f438dd1392bffd556b58bb9a3a2601 *spark-${APACHE_SPARK_VERSION}-bin-hadoop2.6.tgz" | sha256sum -c - && \ echo "09f3b50676abc9b3d1895773d18976953ee76945afa72fa57e6473ce4e215970 *spark-${APACHE_SPARK_VERSION}-bin-hadoop2.6.tgz" | sha256sum -c - && \
tar xzf spark-${APACHE_SPARK_VERSION}-bin-hadoop2.6.tgz -C /usr/local && \ tar xzf spark-${APACHE_SPARK_VERSION}-bin-hadoop2.6.tgz -C /usr/local && \
rm spark-${APACHE_SPARK_VERSION}-bin-hadoop2.6.tgz rm spark-${APACHE_SPARK_VERSION}-bin-hadoop2.6.tgz
RUN cd /usr/local && ln -s spark-${APACHE_SPARK_VERSION}-bin-hadoop2.6 spark RUN cd /usr/local && ln -s spark-${APACHE_SPARK_VERSION}-bin-hadoop2.6 spark
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment