Skip to content

ENH: Create DockerFile and devcontainer.json files to work with Docker and VS Code in Containers #30638

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jan 19, 2020
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions .devcontainer.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
// For format details, see https://aka.ms/vscode-remote/devcontainer.json or the definition README at
// https://github.com/microsoft/vscode-dev-containers/tree/master/containers/python-3-miniconda
{
"name": "pandas",
"context": ".",
"dockerFile": "Dockerfile",

// Use 'settings' to set *default* container specific settings.json values on container create.
// You can edit these settings after create using File > Preferences > Settings > Remote.
"settings": {
"terminal.integrated.shell.linux": "/bin/bash",
"python.condaPath": "/opt/conda/bin/conda",
"python.pythonPath": "/opt/conda/bin/python",
"python.formatting.provider": "black",
"python.linting.enabled": true,
"python.linting.flake8Enabled": true,
"python.linting.pylintEnabled": false,
"python.linting.mypyEnabled": true,
"python.testing.pytestEnabled": true
},

// Add the IDs of extensions you want installed when the container is created in the array below.
"extensions": [
"ms-python.python",
"ms-vscode.cpptools"
]
}
46 changes: 46 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
FROM continuumio/miniconda3

# if you forked pandas, you can pass in your own GitHub username to use your fork
# i.e. gh_username=myname
ARG gh_username=pandas-dev
ARG pandas_home="/home/pandas"

# Avoid warnings by switching to noninteractive
ENV DEBIAN_FRONTEND=noninteractive

# Configure apt and install packages
RUN apt-get update \
&& apt-get -y install --no-install-recommends apt-utils dialog 2>&1 \
#
# Verify git, process tools, lsb-release (common in install instructions for CLIs) installed
&& apt-get -y install git iproute2 procps iproute2 lsb-release \
#
# Install C compilers (gcc not enough, so just went with build-essential which admittedly might be overkill),
# needed to build pandas C extensions
&& apt-get -y install build-essential \
#
# cleanup
&& apt-get autoremove -y \
&& apt-get clean -y \
&& rm -rf /var/lib/apt/lists/*

# Switch back to dialog for any ad-hoc use of apt-get
ENV DEBIAN_FRONTEND=dialog

# Clone pandas repo
RUN mkdir "$pandas_home" \
&& git clone "https://github.com/$gh_username/pandas.git" "$pandas_home" \
&& cd "$pandas_home" \
&& git remote add upstream "https://github.com/pandas-dev/pandas.git"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you pull from upstream master after this step? Should keep up with the latest changes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just did

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm don't see this in the diff. Should be a git pull upstream master in there

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I thought you meant on my changes in the PR, lol, not in the DockerFile. got it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @WillAyd is requesting that the Dockerfile has a git pull upstream master?

Though I'm not sure that would achieve the desired effect, since it only happens when the docker image is built. How often does that happen? How does a person starting up a container ensure they're working with a fresh copy of pandas?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, the main use case is setting up a fresh env for some dev work, using this DockerFile and the devcontainer.json. Once you are using that env, if you use it for a while, it's up to the user to keep the code up to date with master. This is just intended for first time set up.

But the real value of this is that's it's dead simple to delete and create a new env in VS Online or with containers, so the usual use case is just to create a new env and not use an existing one for a while.

So I guess this line is indeed unnecessary, and I should remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if you agree before I revert that commit

Copy link
Member

@WillAyd WillAyd Jan 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think either this should stay or should remove the build step later. Doesn't make sense to build an outdated version for development

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I follow you there. this is the same steps as the instructions in the main docs. Even if we remove git pull upstream master, how is this an outdated version? We just cloned the repo 2 lines before this, like 5 seconds ago


# Because it is surprisingly difficult to activate a conda environment inside a DockerFile
# (from personal experience and per https://github.com/ContinuumIO/docker-images/issues/89),
# we just update the base/root one from the 'environment.yml' file instead of creating a new one.
#
# Set up environment
RUN conda env update -n base -f "$pandas_home/environment.yml"

# Build C extensions and pandas
RUN cd "$pandas_home" \
&& python setup.py build_ext --inplace -j 4 \
&& python -m pip install -e .
11 changes: 11 additions & 0 deletions doc/source/development/contributing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,17 @@ requires a C compiler and Python environment. If you're making documentation
changes, you can skip to :ref:`contributing.documentation` but you won't be able
to build the documentation locally before pushing your changes.

Using a Docker Container
~~~~~~~~~~~~~~~~~~~~~~~~

Instead of manually setting up a development environment, you can use Docker to
automatically create the environment with just several commands. Pandas provides a `DockerFile`
in the root directory to build a Docker image with a full pandas development environment.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth linking the docker installation guide in this section

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where's that? I never saw a docker section.


Even easier, you can use the DockerFile to launch a remote session with Visual Studio Code,
a popular free IDE, using the `.devcontainer.json` file.
See https://code.visualstudio.com/docs/remote/containers for details.

.. _contributing.dev_c:

Installing a C compiler
Expand Down