Commit 55d5ca6b authored by Kyle Kelley's avatar Kyle Kelley

Merge pull request #112 from parente/fix-python2-path

Set PYSPARK_PYTHON path in python2 kernelspec
parents 5836ee46 232d6fc4
......@@ -6,6 +6,9 @@ MAINTAINER Jupyter Project <jupyter@googlegroups.com>
USER root
# Util to help with kernel spec later
RUN apt-get -y update && apt-get -y install jq
# Spark dependencies
ENV APACHE_SPARK_VERSION 1.5.1
RUN apt-get -y update && \
......@@ -90,12 +93,13 @@ RUN conda install --yes \
RUN mkdir -p /opt/conda/share/jupyter/kernels/scala
COPY kernel.json /opt/conda/share/jupyter/kernels/scala/
USER root
# Install Python 2 kernel spec globally to avoid permission problems when NB_UID
# switching at runtime.
RUN $CONDA_DIR/envs/python2/bin/python \
$CONDA_DIR/envs/python2/bin/ipython \
kernelspec install-self
USER jovyan
# Install Python 2 kernel spec into the Python 3 conda environment which
# runs the notebook server
RUN bash -c '. activate python2 && \
python -m ipykernel.kernelspec --prefix=$CONDA_DIR && \
. deactivate'
# Set PYSPARK_HOME in the python2 spec
RUN jq --arg v "$CONDA_DIR/envs/python2/bin/python" \
'.["env"]["PYSPARK_PYTHON"]=$v' \
$CONDA_DIR/share/jupyter/kernels/python2/kernel.json > /tmp/kernel.json && \
mv /tmp/kernel.json $CONDA_DIR/share/jupyter/kernels/python2/kernel.json
......@@ -32,7 +32,7 @@ This configuration is nice for using Spark on small, local data.
1. Open a Python 2 or 3 notebook.
2. Create a `SparkContext` configured for local mode.
For example, the first few cells in a Python 3 notebook might read:
For example, the first few cells in a notebook might read:
```python
import pyspark
......@@ -43,15 +43,6 @@ rdd = sc.parallelize(range(1000))
rdd.takeSample(False, 5)
```
In a Python 2 notebook, prefix the above with the following code to ensure the local workers use Python 2 as well.
```python
import os
os.environ['PYSPARK_PYTHON'] = 'python2'
# include pyspark cells from above here ...
```
### In a R Notebook
0. Run the container as shown above.
......
......@@ -6,6 +6,9 @@ MAINTAINER Jupyter Project <jupyter@googlegroups.com>
USER root
# Util to help with kernel spec later
RUN apt-get -y update && apt-get -y install jq
# Spark dependencies
ENV APACHE_SPARK_VERSION 1.5.1
RUN apt-get -y update && \
......@@ -52,13 +55,13 @@ RUN conda create -p $CONDA_DIR/envs/python2 python=2.7 \
pyzmq \
&& conda clean -yt
USER root
# Install Python 2 kernel spec globally to avoid permission problems when NB_UID
# switching at runtime.
RUN $CONDA_DIR/envs/python2/bin/python \
$CONDA_DIR/envs/python2/bin/ipython \
kernelspec install-self
USER jovyan
# Install Python 2 kernel spec into the Python 3 conda environment which
# runs the notebook server
RUN bash -c '. activate python2 && \
python -m ipykernel.kernelspec --prefix=$CONDA_DIR && \
. deactivate'
# Set PYSPARK_HOME in the python2 spec
RUN jq --arg v "$CONDA_DIR/envs/python2/bin/python" \
'.["env"]["PYSPARK_PYTHON"]=$v' \
$CONDA_DIR/share/jupyter/kernels/python2/kernel.json > /tmp/kernel.json && \
mv /tmp/kernel.json $CONDA_DIR/share/jupyter/kernels/python2/kernel.json
......@@ -27,7 +27,7 @@ This configuration is nice for using Spark on small, local data.
2. Open a Python 2 or 3 notebook.
3. Create a `SparkContext` configured for local mode.
For example, the first few cells in a Python 3 notebook might read:
For example, the first few cells in the notebook might read:
```python
import pyspark
......@@ -38,15 +38,6 @@ rdd = sc.parallelize(range(1000))
rdd.takeSample(False, 5)
```
In a Python 2 notebook, prefix the above with the following code to ensure the local workers use Python 2 as well.
```python
import os
os.environ['PYSPARK_PYTHON'] = 'python2'
# include pyspark cells from above here ...
```
## Connecting to a Spark Cluster on Mesos
This configuration allows your compute cluster to scale with your data.
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment