Skip to content

Commit eb8d179

Browse files
committed
Merge pull request #677 from SixtyCapital/allow-pandas-to-ds-constructor
Dataset constructor can take pandas objects
2 parents c93d156 + 3e56789 commit eb8d179

File tree

4 files changed

+50
-26
lines changed

4 files changed

+50
-26
lines changed

doc/data-structures.rst

Lines changed: 31 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -74,9 +74,11 @@ in index values in the same way.
7474
Coordinates can take the following forms:
7575

7676
- A list of ``(dim, ticks[, attrs])`` pairs with length equal to the number of dimensions
77-
- A dictionary of ``{coord_name: coord}`` where the values are scaler values,
78-
1D arrays or tuples (tuples in the same form as above). This form lets you supply other
79-
coordinates than those corresponding to dimensions (more on these later).
77+
- A dictionary of ``{coord_name: coord}`` where the values are each a scalar value,
78+
a 1D array or a tuple. Tuples are be in the same form as the above, and
79+
multiple dimensions can be supplied with the form ``(dims, data[, attrs])``.
80+
Supplying as a tuple allows other coordinates than those corresponding to
81+
dimensions (more on these later).
8082

8183
As a list of tuples:
8284

@@ -92,6 +94,14 @@ As a dictionary:
9294
'ranking': ('space', [1, 2, 3])},
9395
dims=['time', 'space'])
9496
97+
As a dictionary with coords across multiple dimensions:
98+
99+
.. ipython:: python
100+
101+
xray.DataArray(data, coords={'time': times, 'space': locs, 'const': 42,
102+
'ranking': (('space', 'time'), np.arange(12).reshape(4,3))},
103+
dims=['time', 'space'])
104+
95105
If you create a ``DataArray`` by supplying a pandas
96106
:py:class:`~pandas.Series`, :py:class:`~pandas.DataFrame` or
97107
:py:class:`~pandas.Panel`, any non-specified arguments in the
@@ -194,8 +204,7 @@ to access any variable in a dataset, datasets have four key properties:
194204
each dimension (e.g., ``{'x': 6, 'y': 6, 'time': 8}``)
195205
- ``data_vars``: a dict-like container of DataArrays corresponding to variables
196206
- ``coords``: another dict-like container of DataArrays intended to label points
197-
used in ``data_vars`` (e.g., 1-dimensional arrays of numbers, datetime
198-
objects or strings)
207+
used in ``data_vars`` (e.g., arrays of numbers, datetime objects or strings)
199208
- ``attrs``: an ``OrderedDict`` to hold arbitrary metadata
200209

201210
The distinction between whether a variables falls in data or coordinates
@@ -223,18 +232,16 @@ Creating a Dataset
223232
~~~~~~~~~~~~~~~~~~
224233

225234
To make an :py:class:`~xray.Dataset` from scratch, supply dictionaries for any
226-
variables, coordinates and attributes you would like to insert into the
227-
dataset.
235+
variables (``data_vars``), coordinates (``coords``) and attributes (``attrs``).
228236

229-
For the ``data_vars`` and ``coords`` arguments, keys should be the name of the
230-
variable and values should be scalars, 1d arrays or tuples of the form
231-
``(dims, data[, attrs])`` sufficient to label each array:
237+
``data_vars`` are supplied as a dictionary with each key as the name of the variable and each
238+
value as one of:
239+
- A :py:class:`~xray.DataArray`
240+
- A tuple of the form ``(dims, data[, attrs])``
241+
- A pandas object
232242

233-
- ``dims`` should be a sequence of strings.
234-
- ``data`` should be a numpy.ndarray (or array-like object) that has a
235-
dimensionality equal to the length of ``dims``.
236-
- ``attrs`` is an arbitrary Python dictionary for storing metadata associated
237-
with a particular array.
243+
``coords`` are supplied as dictionary of ``{coord_name: coord}`` where the values are scalar values,
244+
arrays or tuples in the form of ``(dims, data[, attrs])``.
238245

239246
Let's create some fake data for the example we show above:
240247

@@ -259,8 +266,8 @@ Notice that we did not explicitly include coordinates for the "x" or "y"
259266
dimensions, so they were filled in array of ascending integers of the proper
260267
length.
261268

262-
We can also pass :py:class:`xray.DataArray` objects or a pandas object as values
263-
in the dictionary instead of tuples:
269+
Here we pass :py:class:`xray.DataArray` objects or a pandas object as values
270+
in the dictionary:
264271

265272
.. ipython:: python
266273
@@ -271,13 +278,15 @@ in the dictionary instead of tuples:
271278
272279
xray.Dataset({'bar': foo.to_pandas()})
273280
274-
Where a pandas object is supplied, the names of its indexes are used as dimension
281+
Where a pandas object is supplied as a value, the names of its indexes are used as dimension
275282
names, and its data is aligned to any existing dimensions.
276283

277-
You can also create an dataset from a :py:class:`pandas.DataFrame` with
278-
:py:meth:`Dataset.from_dataframe <xray.Dataset.from_dataframe>` or from a
279-
netCDF file on disk with :py:func:`~xray.open_dataset`. See
280-
:ref:`pandas` and :ref:`io`.
284+
You can also create an dataset from:
285+
- A :py:class:`pandas.DataFrame` or :py:class:`pandas.Panel` along its columns and items
286+
respectively, by passing it into the :py:class:`xray.Dataset` directly
287+
- A :py:class:`pandas.DataFrame` with :py:meth:`Dataset.from_dataframe <xray.Dataset.from_dataframe>`,
288+
which will additionally handle MultiIndexes See :ref:`pandas`
289+
- A netCDF file on disk with :py:func:`~xray.open_dataset`. See :ref:`io`.
281290

282291
Dataset contents
283292
~~~~~~~~~~~~~~~~

doc/whats-new.rst

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -94,8 +94,10 @@ Enhancements
9494
9595
Notice that ``shift`` moves data independently of coordinates, but ``roll``
9696
moves both data and coordinates.
97-
- Assigning a ``pandas`` object to a ``Dataset`` directly is now permitted. Its
98-
index names correspond to the `dims`` of the ``Dataset``, and its data is aligned
97+
- Assigning a ``pandas`` object to the variable of ``Dataset`` directly is now permitted. Its
98+
index names correspond to the ``dims`` of the ``Dataset``, and its data is aligned
99+
- Passing a :py:class:`pandas.DataFrame` or :py:class:`pandas.Panel` to a Dataset constructor
100+
is now permitted
99101
- New function :py:func:`~xray.broadcast` for explicitly broadcasting
100102
``DataArray`` and ``Dataset`` objects against each other. For example:
101103

xray/core/dataset.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -205,7 +205,7 @@ def __init__(self, data_vars=None, coords=None, attrs=None,
205205
data_vars = {}
206206
if coords is None:
207207
coords = set()
208-
if data_vars or coords:
208+
if data_vars is not None or coords is not None:
209209
self._set_init_vars_and_dims(data_vars, coords, compat)
210210
if attrs is not None:
211211
self.attrs = attrs

xray/test/test_dataset.py

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -199,7 +199,7 @@ def test_constructor_auto_align(self):
199199
with self.assertRaisesRegexp(ValueError, 'conflicting sizes'):
200200
Dataset({'a': a, 'b': b, 'e': e})
201201

202-
def test_constructor_pandas(self):
202+
def test_constructor_pandas_sequence(self):
203203

204204
ds = self.make_example_math_dataset()
205205
pandas_objs = OrderedDict(
@@ -214,6 +214,19 @@ def test_constructor_pandas(self):
214214
ds_based_on_pandas = Dataset(variables=pandas_objs, coords=ds.coords, attrs=ds.attrs)
215215
self.assertDatasetEqual(ds, ds_based_on_pandas)
216216

217+
def test_constructor_pandas_single(self):
218+
219+
das = [
220+
DataArray(np.random.rand(4,3), dims=['a', 'b']), # df
221+
DataArray(np.random.rand(4,3,2), dims=['a','b','c']), # panel
222+
]
223+
224+
for da in das:
225+
pandas_obj = da.to_pandas()
226+
ds_based_on_pandas = Dataset(pandas_obj)
227+
for dim in ds_based_on_pandas.data_vars:
228+
self.assertArrayEqual(ds_based_on_pandas[dim], pandas_obj[dim])
229+
217230

218231
def test_constructor_compat(self):
219232
data = OrderedDict([('x', DataArray(0, coords={'y': 1})),

0 commit comments

Comments
 (0)