Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Sign in / Register
Toggle navigation
J
Jupyter Docker Stacks
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Locked Files
Issues
0
Issues
0
List
Boards
Labels
Service Desk
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Security & Compliance
Security & Compliance
Dependency List
License Compliance
Packages
Packages
List
Container Registry
Analytics
Analytics
CI / CD
Code Review
Insights
Issues
Repository
Value Stream
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
nanahira
Jupyter Docker Stacks
Commits
55d5ca6b
Commit
55d5ca6b
authored
Feb 03, 2016
by
Kyle Kelley
Browse files
Options
Browse Files
Download
Plain Diff
Merge pull request #112 from parente/fix-python2-path
Set PYSPARK_PYTHON path in python2 kernelspec
parents
5836ee46
232d6fc4
Changes
4
Show whitespace changes
Inline
Side-by-side
Showing
4 changed files
with
32 additions
and
43 deletions
+32
-43
all-spark-notebook/Dockerfile
all-spark-notebook/Dockerfile
+13
-9
all-spark-notebook/README.md
all-spark-notebook/README.md
+3
-12
pyspark-notebook/Dockerfile
pyspark-notebook/Dockerfile
+13
-10
pyspark-notebook/README.md
pyspark-notebook/README.md
+3
-12
No files found.
all-spark-notebook/Dockerfile
View file @
55d5ca6b
...
...
@@ -6,6 +6,9 @@ MAINTAINER Jupyter Project <jupyter@googlegroups.com>
USER
root
# Util to help with kernel spec later
RUN
apt-get
-y
update
&&
apt-get
-y
install
jq
# Spark dependencies
ENV
APACHE_SPARK_VERSION 1.5.1
RUN
apt-get
-y
update
&&
\
...
...
@@ -90,12 +93,13 @@ RUN conda install --yes \
RUN
mkdir
-p
/opt/conda/share/jupyter/kernels/scala
COPY
kernel.json /opt/conda/share/jupyter/kernels/scala/
USER
root
# Install Python 2 kernel spec globally to avoid permission problems when NB_UID
# switching at runtime.
RUN
$CONDA_DIR
/envs/python2/bin/python
\
$CONDA_DIR
/envs/python2/bin/ipython
\
kernelspec install-self
USER
jovyan
# Install Python 2 kernel spec into the Python 3 conda environment which
# runs the notebook server
RUN
bash
-c
'. activate python2 &&
\
python -m ipykernel.kernelspec --prefix=$CONDA_DIR &&
\
. deactivate'
# Set PYSPARK_HOME in the python2 spec
RUN
jq
--arg
v
"
$CONDA_DIR
/envs/python2/bin/python"
\
'.["env"]["PYSPARK_PYTHON"]=$v'
\
$CONDA_DIR
/share/jupyter/kernels/python2/kernel.json
>
/tmp/kernel.json
&&
\
mv
/tmp/kernel.json
$CONDA_DIR
/share/jupyter/kernels/python2/kernel.json
all-spark-notebook/README.md
View file @
55d5ca6b
...
...
@@ -32,7 +32,7 @@ This configuration is nice for using Spark on small, local data.
1.
Open a Python 2 or 3 notebook.
2.
Create a
`SparkContext`
configured for local mode.
For example, the first few cells in a
Python 3
notebook might read:
For example, the first few cells in a notebook might read:
```
python
import
pyspark
...
...
@@ -43,15 +43,6 @@ rdd = sc.parallelize(range(1000))
rdd
.
takeSample
(
False
,
5
)
```
In a Python 2 notebook, prefix the above with the following code to ensure the local workers use Python 2 as well.
```
python
import
os
os
.
environ
[
'PYSPARK_PYTHON'
]
=
'python2'
# include pyspark cells from above here ...
```
### In a R Notebook
0.
Run the container as shown above.
...
...
pyspark-notebook/Dockerfile
View file @
55d5ca6b
...
...
@@ -6,6 +6,9 @@ MAINTAINER Jupyter Project <jupyter@googlegroups.com>
USER
root
# Util to help with kernel spec later
RUN
apt-get
-y
update
&&
apt-get
-y
install
jq
# Spark dependencies
ENV
APACHE_SPARK_VERSION 1.5.1
RUN
apt-get
-y
update
&&
\
...
...
@@ -52,13 +55,13 @@ RUN conda create -p $CONDA_DIR/envs/python2 python=2.7 \
pyzmq
\
&&
conda clean
-yt
USER
root
# Install Python 2 kernel spec globally to avoid permission problems when NB_UID
# switching at runtime.
RUN
$CONDA_DIR
/envs/python2/bin/python
\
$CONDA_DIR
/envs/python2/bin/ipython
\
kernelspec install-self
USER
jovyan
# Install Python 2 kernel spec into the Python 3 conda environment which
# runs the notebook server
RUN
bash
-c
'. activate python2 &&
\
python -m ipykernel.kernelspec --prefix=$CONDA_DIR &&
\
. deactivate'
# Set PYSPARK_HOME in the python2 spec
RUN
jq
--arg
v
"
$CONDA_DIR
/envs/python2/bin/python"
\
'.["env"]["PYSPARK_PYTHON"]=$v'
\
$CONDA_DIR
/share/jupyter/kernels/python2/kernel.json
>
/tmp/kernel.json
&&
\
mv
/tmp/kernel.json
$CONDA_DIR
/share/jupyter/kernels/python2/kernel.json
pyspark-notebook/README.md
View file @
55d5ca6b
...
...
@@ -27,7 +27,7 @@ This configuration is nice for using Spark on small, local data.
2.
Open a Python 2 or 3 notebook.
3.
Create a
`SparkContext`
configured for local mode.
For example, the first few cells in
a Python 3
notebook might read:
For example, the first few cells in
the
notebook might read:
```
python
import
pyspark
...
...
@@ -38,15 +38,6 @@ rdd = sc.parallelize(range(1000))
rdd
.
takeSample
(
False
,
5
)
```
In a Python 2 notebook, prefix the above with the following code to ensure the local workers use Python 2 as well.
```
python
import
os
os
.
environ
[
'PYSPARK_PYTHON'
]
=
'python2'
# include pyspark cells from above here ...
```
## Connecting to a Spark Cluster on Mesos
This configuration allows your compute cluster to scale with your data.
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment