Merge pull request #615 from parente/doc-contributing

Some contributor documentation

Merge pull request #615 from parente/doc-contributing
Some contributor documentation
6fd049e1 · Peter Parente · GitHub · d29a0f61 · f39639cb · 6fd049e1
Commit 6fd049e1 authored May 20, 2018 by Peter Parente Committed by GitHub May 20, 2018
8 changed files
--- a/.github/issue_template.md
+++ b/.github/issue_template.md
-Hi! Thanks for using Jupyter's docker-stacks images.
+Hi! Thanks for using the Jupyter Docker Stacks.
-If you are requesting a library upgrade or addition in one of the existing images, please state the desired library name and version here and disregard the remaining sections.
+If you are looking to contribute to the images, please see the [Contributor's Guide] (http://jupyter-docker-stacks.readthedocs.io/en/latest/#) in the documentation for our preferred processes.
 If you are reporting an issue with one of the existing images, please answer the questions below to help us troubleshoot the problem. Please be as thorough as possible.

--- a/docs/contributing/features.md
+++ b/docs/contributing/features.md
+# New Features
+Thank you for contributing to docker-stacks! We review pull requests of new features (e.g., new packages, new scripts, new flags) to balance the value of the images to the Jupyter community with the cost of maintaining the images over time.
+## Suggesting a New Feature
+Please follow the process below to suggest a new feature for inclusion in one of the core stacks:
+1. [Open a GitHub issue](https://github.com/jupyter/docker-stacks/issues) describing the feature you'd like to contribute.
+2. Discuss with the maintainers whether the addition makes sense in [one of the core stacks](../using/selecting.html#Core-Stacks), as a [recipe in the documentation](recipes.html), as a [community stack](stacks.html), or as something else entirely.
+## Selection Criteria
+Roughly speaking, we evaluate new features based on the following criteria:
+* **Usefulness to Jupyter users**: Is the feature generally applicable across domains? Does it work with Jupyter Notebook, JupyterLab, JupyterHub, etc.?
+* **Fit with the image purpose**: Does the feature match the theme of the stack in which it will be added? Would it fit better in a new, community stack?
+* **Complexity of build / runtime configuration**: How many lines of code does the feature require in one of the Dockerfiles or startup scripts? Does it require new scripts entirely? Do users need to adjust how they use the images?
+* **Impact on image metrics**: How many bytes does the feature and its dependencies add to the image(s)? How many minutes do they add to the build time?
+* **Ability to support the addition**: Can existing maintainers answer user questions and address future build issues? Are the contributors interested in helping with long-term maintenance? Can we write tests to ensure the feature continues to work over time?
+## Submitting a Pull Request
+If there's agreement that the feature belongs in one or more of the core stacks:
+1. Implement the feature in a local clone of the `jupyter/docker-stacks` project.
+2. Please build the image locally before submitting a pull request. Building the image locally shortens the debugging cycle by taking some load off [Travis CI](http://travis-ci.org/), which graciously provides free build services for open source projects like this one.  If you use `make`, call:
+```
+make image/somestack-notebook
+```
+3. [Submit a pull request](https://github.com/PointCloudLibrary/pcl/wiki/A-step-by-step-guide-on-preparing-and-submitting-a-pull-request) (PR) with your changes.
+4. Watch for Travis to report a build success or failure for your PR on GitHub.
+5. Discuss changes with the maintainers and address any build issues.
--- a/docs/contributing/packages.md
+++ b/docs/contributing/packages.md
-# Packages
+# Package Updates
-## Package Updates
+We are actively seeking pull requests which update packages already included in the project Dockerfiles. This is a great way for first-time contributors to participate in developing docker-stacks.
-## New Packages
+Please follow the process below to update a package version:
\ No newline at end of file
+1. Locate the Dockerfile containing the library you wish to update (e.g., [base-notebook/Dockerfile](https://github.com/jupyter/docker-stacks/blob/master/base-notebook/Dockerfile), [scipy-notebook/Dockerfile](https://github.com/jupyter/docker-stacks/blob/master/scipy-notebook/Dockerfile))
+2. Adjust the version number for the package. We prefer to pin the major and minor version number of packages so as to minimize rebuild side-effects when users submit pull requests (PRs). For example, you'll find the Jupyter Notebook package, `notebook`, installed using conda with  `notebook=5.4.*`.
+3.  Please build the image locally before submitting a pull request. Building the image locally shortens the debugging cycle by taking some load off [Travis CI](http://travis-ci.org/), which graciously provides free build services for open source projects like this one.  If you use `make`, call:
+```
+make image/somestack-notebook
+```
+4. [Submit a pull request](https://github.com/PointCloudLibrary/pcl/wiki/A-step-by-step-guide-on-preparing-and-submitting-a-pull-request) (PR) with your changes.
+5. Watch for Travis to report a build success or failure for your PR on GitHub.
+6. Discuss changes with the maintainers and address any build issues. Version conflicts are the most common problem. You may need to upgrade additional packages to fix build failures.
--- a/docs/contributing/recipes.md
+++ b/docs/contributing/recipes.md
-# Recipes
+# New Recipes
\ No newline at end of file
+We welcome contributions of [recipes](../using/recipes.html), short examples of using, configuring, or extending the Docker Stacks, for inclusion in the documentation site. Follow the process below to add a new recipe:
+1. Open the `docs/using/recipes.md` source file.
+2. Add a second-level Markdown heading naming your recipe at the bottom of the file (e.g., `## Add the RISE extension``)
+3. Write the body of your recipe under the heading, including whatever command line, Dockerfile, links, etc. you need.
+4. [Submit a pull request](https://github.com/PointCloudLibrary/pcl/wiki/A-step-by-step-guide-on-preparing-and-submitting-a-pull-request) (PR) with your changes.
+5. Discuss changes with the maintainers and address any formatting or content issues.
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -31,12 +31,14 @@ Table of Contents
   using/running
   using/common
   using/specifics
+   using/recipes
 .. toctree::
   :maxdepth: 2
   :caption: Contributor Guide
   contributing/packages
+   contributing/features
   contributing/recipes
   contributing/tests
   contributing/stacks

--- a/docs/using/recipes.md
+++ b/docs/using/recipes.md
+# Contributed Recipes
+Users sometimes share interesting ways of using the Jupyter Docker Stacks. We encourage users to [contribute these recipes](../contributing/recipes.html) to the documentation in case they prove useful to other members of the community by submitting a pull request to `docs/using/recipes.md`. The sections below capture this knowledge.
+## Using `pip install` in a Child Docker image
+Create a new Dockerfile like the one shown below. 
+```dockerfile
+# Start from a core stack version
+FROM jupyter/datascience-notebook:9f9e5ca8fe5a
+# Install in the default python3 environment
+RUN pip install 'ggplot==0.6.8'
+```
+Then build a new image.
+```bash
+docker build --rm -t jupyter/my-datascience-notebook .
+```
+Ref: [docker-stacks/commit/79169618d571506304934a7b29039085e77db78c](https://github.com/jupyter/docker-stacks/commit/79169618d571506304934a7b29039085e77db78c#commitcomment-15960081)
+## Add a Python 2.x environment
+Python 2.x was removed from all images on August 10th, 2017, starting in tag `cc9feab481f7`. You can add a Python 2.x environment by defining your own Dockerfile inheriting from one of the images like so:
+```
+# Choose your desired base image
+FROM jupyter/scipy-notebook:latest
+# Create a Python 2.x environment using conda including at least the ipython kernel
+# and the kernda utility. Add any additional packages you want available for use
+# in a Python 2 notebook to the first line here (e.g., pandas, matplotlib, etc.)
+RUN conda create --quiet --yes -p $CONDA_DIR/envs/python2 python=2.7 ipython ipykernel kernda && \
+    conda clean -tipsy
+USER root
+# Create a global kernelspec in the image and modify it so that it properly activates
+# the python2 conda environment.
+RUN $CONDA_DIR/envs/python2/bin/python -m ipykernel install && \
+$CONDA_DIR/envs/python2/bin/kernda -o -y /usr/local/share/jupyter/kernels/python2/kernel.json
+USER $NB_USER
+```
+Ref: [https://github.com/jupyter/docker-stacks/issues/440](https://github.com/jupyter/docker-stacks/issues/440)
+## Run JupyterLab
+JupyterLab is preinstalled as a notebook extension starting in tag [c33a7dc0eece](https://github.com/jupyter/docker-stacks/wiki/Docker-build-history).
+Run jupyterlab using a command such as `docker run -it --rm -p 8888:8888 jupyter/datascience-notebook start.sh jupyter lab`
+## Let's Encrypt a Notebook server
+See the README for the simple automation here [https://github.com/jupyter/docker-stacks/tree/master/examples/make-deploy](https://github.com/jupyter/docker-stacks/tree/master/examples/make-deploy) which includes steps for requesting and renewing a Let's Encrypt certificate.
+Ref: [https://github.com/jupyter/docker-stacks/issues/78](https://github.com/jupyter/docker-stacks/issues/78)
+## Slideshows with Jupyter and RISE
+[RISE](https://github.com/damianavila/RISE) allows via extension to create live slideshows of your notebooks, with no conversion, adding javascript Reveal.js:
+```
+# Add Live slideshows with RISE
+RUN conda install -c damianavila82 rise
+```
+Credit: [Paolo D.](https://github.com/pdonorio) based on [docker-stacks/issues/43](https://github.com/jupyter/docker-stacks/issues/43)
+## xgboost
+You need to install conda's gcc for Python xgboost to work properly. Otherwise, you'll get an exception about libgomp.so.1 missing GOMP_4.0.
+```
+%%bash
+conda install -y gcc
+pip install xgboost
+import xgboost
+```
+## Running behind a nginx proxy
+Sometimes it is useful to run the Jupyter instance behind a nginx proxy, for instance:
+- you would prefer to access the notebook at a server URL with a path (`https://example.com/jupyter`) rather than a port (`https://example.com:8888`)
+- you may have many different services in addition to Jupyter running on the same server, and want to nginx to help improve server performance in manage the connections
+Here is a [quick example NGINX configuration](https://gist.github.com/cboettig/8643341bd3c93b62b5c2) to get started.  You'll need a server, a `.crt` and `.key` file for your server, and `docker` & `docker-compose` installed.  Then just download the files at that gist and run `docker-compose up -d` to test it out.  Customize the `nginx.conf` file to set the desired paths and add other services.
+## Host volume mounts and notebook errors
+If you are mounting a host directory as `/home/jovyan/work` in your container and you receive permission errors or connection errors when you create a notebook, be sure that the `jovyan` user (UID=1000 by default) has read/write access to the directory on the host. Alternatively, specify the UID of the `jovyan` user on container startup using the `-e NB_UID` option described in the [Common Features, Docker Options section](../using/common.html#Docker-Options)
+Ref: [https://github.com/jupyter/docker-stacks/issues/199](https://github.com/jupyter/docker-stacks/issues/199)
+## JuptyerHub
+We also have contributed recipes for using JupyterHub.
+### Use JupyterHub's dockerspawner
+In most cases for use with DockerSpawner, given any image that already has a notebook stack set up, you would only need to add:
+1. install the jupyterhub-singleuser script (for the right Python) 
+2. change the command to launch the single-user server
+Swapping out the `FROM` line in the `jupyterhub/singleuser` Dockerfile should be enough for most cases.
+Credit: [Justin Tyberg](https://github.com/jtyberg), [quanghoc](https://github.com/quanghoc), and [Min RK](https://github.com/minrk) based on [docker-stacks/issues/124](https://github.com/jupyter/docker-stacks/issues/124) and [docker-stacks/pull/185](https://github.com/jupyter/docker-stacks/pull/185)
+### Containers with a specific version of JupyterHub
+To use a specific version of JupyterHub, the version of `jupyterhub` in your image should match the version in the Hub itself.
+```
+FROM  jupyter/base-notebook:5ded1de07260
+RUN pip install jupyterhub==0.8.0b1
+```
+Credit: [MinRK](https://github.com/jupyter/docker-stacks/issues/423#issuecomment-322767742)
+Ref: [https://github.com/jupyter/docker-stacks/issues/177](https://github.com/jupyter/docker-stacks/issues/177)
+## Spark
+A few suggestions have been made regarding using Docker Stacks with spark.
+### Using PySpark with AWS S3
+```
+import os
+os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.amazonaws:aws-java-sdk:1.10.34,org.apache.hadoop:hadoop-aws:2.6.0 pyspark-shell'
+import pyspark
+sc = pyspark.SparkContext("local[*]")
+from pyspark.sql import SQLContext
+sqlContext = SQLContext(sc)
+hadoopConf = sc._jsc.hadoopConfiguration()
+myAccessKey = input() 
+mySecretKey = input()
+hadoopConf.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
+hadoopConf.set("fs.s3.awsAccessKeyId", myAccessKey)
+hadoopConf.set("fs.s3.awsSecretAccessKey", mySecretKey)
+df = sqlContext.read.parquet("s3://myBucket/myKey")
+```
+Ref: [https://github.com/jupyter/docker-stacks/issues/127](https://github.com/jupyter/docker-stacks/issues/127)
+### Using Local Spark JARs
+```
+import os
+os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars /home/jovyan/spark-streaming-kafka-assembly_2.10-1.6.1.jar pyspark-shell'
+import pyspark
+from pyspark.streaming.kafka import KafkaUtils
+from pyspark.streaming import StreamingContext
+sc = pyspark.SparkContext()
+ssc = StreamingContext(sc,1)
+broker = "<my_broker_ip>"
+directKafkaStream = KafkaUtils.createDirectStream(ssc, ["test1"], {"metadata.broker.list": broker})
+directKafkaStream.pprint()
+ssc.start()
+```
+Ref: [https://github.com/jupyter/docker-stacks/issues/154](https://github.com/jupyter/docker-stacks/issues/154)
+### Using spark-packages.org
+If you'd like to use packages from [spark-packages.org](https://spark-packages.org/), see [https://gist.github.com/parente/c95fdaba5a9a066efaab](https://gist.github.com/parente/c95fdaba5a9a066efaab) for an example of how to specify the package identifier in the environment before creating a SparkContext.
+Ref: [https://github.com/jupyter/docker-stacks/issues/43](https://github.com/jupyter/docker-stacks/issues/43)
+### Use jupyter/all-spark-notebooks with an existing Spark/YARN cluster
+```
+FROM jupyter/all-spark-notebook
+# Set env vars for pydoop
+ENV HADOOP_HOME /usr/local/hadoop-2.7.3
+ENV JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64
+ENV HADOOP_CONF_HOME /usr/local/hadoop-2.7.3/etc/hadoop
+ENV HADOOP_CONF_DIR  /usr/local/hadoop-2.7.3/etc/hadoop
+USER root
+# Add proper open-jdk-8 not just the jre, needed for pydoop
+RUN echo 'deb http://cdn-fastly.deb.debian.org/debian jessie-backports main' > /etc/apt/sources.list.d/jessie-backports.list && \
+    apt-get -y update && \
+    apt-get install --no-install-recommends -t jessie-backports -y openjdk-8-jdk && \
+    rm /etc/apt/sources.list.d/jessie-backports.list && \
+    apt-get clean && \
+    rm -rf /var/lib/apt/lists/ && \
+# Add hadoop binaries
+    wget http://mirrors.ukfast.co.uk/sites/ftp.apache.org/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz && \
+    tar -xvf hadoop-2.7.3.tar.gz -C /usr/local && \
+    chown -R $NB_USER:users /usr/local/hadoop-2.7.3 && \
+    rm -f hadoop-2.7.3.tar.gz && \
+# Install os dependencies required for pydoop, pyhive
+    apt-get update && \
+    apt-get install --no-install-recommends -y build-essential python-dev libsasl2-dev && \
+    apt-get clean && \
+    rm -rf /var/lib/apt/lists/* && \
+# Remove the example hadoop configs and replace
+# with those for our cluster.
+# Alternatively this could be mounted as a volume
+    rm -f /usr/local/hadoop-2.7.3/etc/hadoop/*
+# Download this from ambari / cloudera manager and copy here
+COPY example-hadoop-conf/ /usr/local/hadoop-2.7.3/etc/hadoop/
+# Spark-Submit doesn't work unless I set the following
+RUN echo "spark.driver.extraJavaOptions -Dhdp.version=2.5.3.0-37" >> /usr/local/spark/conf/spark-defaults.conf  && \
+    echo "spark.yarn.am.extraJavaOptions -Dhdp.version=2.5.3.0-37" >> /usr/local/spark/conf/spark-defaults.conf && \
+    echo "spark.master=yarn" >>  /usr/local/spark/conf/spark-defaults.conf && \
+    echo "spark.hadoop.yarn.timeline-service.enabled=false" >> /usr/local/spark/conf/spark-defaults.conf && \
+    chown -R $NB_USER:users /usr/local/spark/conf/spark-defaults.conf && \
+    # Create an alternative HADOOP_CONF_HOME so we can mount as a volume and repoint
+    # using ENV var if needed
+    mkdir -p /etc/hadoop/conf/ && \
+    chown $NB_USER:users /etc/hadoop/conf/
+USER $NB_USER
+# Install useful jupyter extensions and python libraries like :
+# - Dashboards
+# - PyDoop
+# - PyHive
+RUN pip install jupyter_dashboards faker && \
+    jupyter dashboards quick-setup --sys-prefix && \
+    pip2 install pyhive pydoop thrift sasl thrift_sasl faker
+USER root
+# Ensure we overwrite the kernel config so that toree connects to cluster
+RUN jupyter toree install --sys-prefix --spark_opts="--master yarn --deploy-mode client --driver-memory 512m  --executor-memory 512m  --executor-cores 1 --driver-java-options -Dhdp.version=2.5.3.0-37 --conf spark.hadoop.yarn.timeline-service.enabled=false"
+USER $NB_USER
+```
+Credit: [britishbadger](https://github.com/britishbadger) from [docker-stacks/issues/369](https://github.com/jupyter/docker-stacks/issues/369)
--- a/docs/using/running.md
+++ b/docs/using/running.md
@@ -9,7 +9,7 @@ This section provides details about the second.
 ## Using the Docker CLI
-You can launch a local Docker container from the Jupyter Docker Stacks using the [Docker command line interface](https://docs.docker.com/engine/reference/commandline/cli/). There are numerous ways to configure containers using the CLI. The following are a couple common patterns.
+You can launch a local Docker container from the Jupyter Docker Stacks using the [Docker command line interface](https://docs.docker.com/engine/reference/commandline/cli/). There are numerous ways to configure containers using the CLI. The following are some common patterns.
 **Example 1** This command pulls the `jupyter/scipy-notebook` image tagged `2c80cf3537ca` from Docker Hub if it is not already present on the local host. It then starts a container running a Jupyter Notebook server and exposes the server on host port 8888. The server logs appear in the terminal and include a URL to the notebook server.

--- a/docs/using/selecting.md
+++ b/docs/using/selecting.md
@@ -63,7 +63,7 @@ The Jupyter team maintains a set of Docker image definitions in the [https://git
 `jupyter/scipy-notebook` includes popular packages from the scientific Python ecosystem.
 * Everything in `jupyter/minimal-notebook` and its ancestor images
-* [pandas](https://pandas.pydata.org/), [numexpr](https://github.com/pydata/numexpr), [matplotlib](https://matplotlib.org/), [scipy](https://www.scipy.org/), [seaborn](https://seaborn.pydata.org/), [scikit-learn(http://scikit-learn.org/stable/)], [scikit-image](http://scikit-image.org/), [sympy](http://www.sympy.org/en/index.html), [cython](http://cython.org/), [patsy](https://patsy.readthedocs.io/en/latest/), [statsmodel](http://www.statsmodels.org/stable/index.html), [cloudpickle](https://github.com/cloudpipe/cloudpickle), [dill](https://pypi.python.org/pypi/dill), [numba](https://numba.pydata.org/), [bokeh](https://bokeh.pydata.org/en/latest/), [sqlalchemy](https://www.sqlalchemy.org/), [hdf5](http://www.h5py.org/), [vincent](http://vincent.readthedocs.io/en/latest/), [beautifulsoup](https://www.crummy.com/software/BeautifulSoup/), [protobuf](https://developers.google.com/protocol-buffers/docs/pythontutorial), and [xlrd](http://www.python-excel.org/) packages
+* [pandas](https://pandas.pydata.org/), [numexpr](https://github.com/pydata/numexpr), [matplotlib](https://matplotlib.org/), [scipy](https://www.scipy.org/), [seaborn](https://seaborn.pydata.org/), [scikit-learn](http://scikit-learn.org/stable/), [scikit-image](http://scikit-image.org/), [sympy](http://www.sympy.org/en/index.html), [cython](http://cython.org/), [patsy](https://patsy.readthedocs.io/en/latest/), [statsmodel](http://www.statsmodels.org/stable/index.html), [cloudpickle](https://github.com/cloudpipe/cloudpickle), [dill](https://pypi.python.org/pypi/dill), [numba](https://numba.pydata.org/), [bokeh](https://bokeh.pydata.org/en/latest/), [sqlalchemy](https://www.sqlalchemy.org/), [hdf5](http://www.h5py.org/), [vincent](http://vincent.readthedocs.io/en/latest/), [beautifulsoup](https://www.crummy.com/software/BeautifulSoup/), [protobuf](https://developers.google.com/protocol-buffers/docs/pythontutorial), and [xlrd](http://www.python-excel.org/) packages
 * [ipywidgets](https://ipywidgets.readthedocs.io/en/stable/) for interactive visualizations in Python notebooks
 * [Facets](https://github.com/PAIR-code/facets) for visualizing machine learning datasets