Merge pull request #143 from apurva3000/spark_standalone_readme

Added instructions for connecting to spark on standalone mode.

Merge pull request #143 from apurva3000/spark_standalone_readme
Added instructions for connecting to spark on standalone mode.
418f9f6c · Peter Parente · 4a4937f3 · 4597488b · 418f9f6c · 418f9f6c
Commit 418f9f6c authored Mar 07, 2016 by Peter Parente
Hide whitespace changes
Inline Side-by-side

Showing with 19 additions and 0 deletions

all-spark-notebook/README.md all-spark-notebook/README.md +9 -0

pyspark-notebook/README.md pyspark-notebook/README.md +10 -0

No files found.
--- a/all-spark-notebook/README.md
+++ b/all-spark-notebook/README.md
@@ -191,6 +191,15 @@ println(sc.master)
 val rdd = sc.parallelize(0 to 99999999)
 rdd.sum()
 ```
+## Connecting to a Spark Cluster on Standalone Mode
+Connection to Spark Cluster on Standalone Mode requires the following set of steps:
+0. Verify that the docker image (check the Dockerfile) and the Spark Cluster which is being deployed, run the same version of Spark. 
+1. [Deploy Spark on Standalone Mode](http://spark.apache.org/docs/latest/spark-standalone.html).
+2. Run the Docker container with `--net=host` in a location that is network addressable by all of your Spark workers. (This is a [Spark networking requirement](http://spark.apache.org/docs/latest/cluster-overview.html#components).)
+    * NOTE: When using `--net=host`, you must also use the flags `--pid=host -e TINI_SUBREAPER=true`. See https://github.com/jupyter/docker-stacks/issues/64 for details.
+3. The language specific instructions are almost same as mentioned above for Mesos, only the master url would now be something like spark://10.10.10.10:7077
 ## Notebook Options

--- a/pyspark-notebook/README.md
+++ b/pyspark-notebook/README.md
@@ -82,6 +82,16 @@ To use Python 2 in the notebook and on the workers, change the `PYSPARK_PYTHON`
 Of course, all of this can be hidden in an [IPython kernel startup script](http://ipython.org/ipython-doc/stable/development/config.html?highlight=startup#startup-files), but "explicit is better than implicit." :)
+## Connecting to a Spark Cluster on Standalone Mode
+Connection to Spark Cluster on Standalone Mode requires the following set of steps:
+0. Verify that the docker image (check the Dockerfile) and the Spark Cluster which is being deployed, run the same version of Spark.
+1. [Deploy Spark on Standalone Mode](http://spark.apache.org/docs/latest/spark-standalone.html).
+2. Run the Docker container with `--net=host` in a location that is network addressable by all of your Spark workers. (This is a [Spark networking requirement](http://spark.apache.org/docs/latest/cluster-overview.html#components).)
+    * NOTE: When using `--net=host`, you must also use the flags `--pid=host -e TINI_SUBREAPER=true`. See https://github.com/jupyter/docker-stacks/issues/64 for details.
+3. The language specific instructions are almost same as mentioned above for Mesos, only the master url would now be something like spark://10.10.10.10:7077
 ## Notebook Options
 You can pass [Jupyter command line options](http://jupyter.readthedocs.org/en/latest/config.html#command-line-arguments) through the [`start-notebook.sh` command](https://github.com/jupyter/docker-stacks/blob/master/minimal-notebook/start-notebook.sh#L15) when launching the container. For example, to set the base URL of the notebook server you might do the following: