Skip to content
This repository was archived by the owner on Apr 10, 2024. It is now read-only.

Commit 94d0281

Browse files
committed
Add section about logical to physical correspondence
1 parent 7a44f8f commit 94d0281

File tree

4 files changed

+93
-14
lines changed

4 files changed

+93
-14
lines changed

source/conf.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -48,9 +48,9 @@
4848
master_doc = 'index'
4949

5050
# General information about the project.
51-
project = "Wes's pandas 2.0 Design Docs"
52-
copyright = '2016, Wes McKinney'
53-
author = 'Wes McKinney'
51+
project = "pandas 2.0 Design Docs"
52+
copyright = '2016, pandas Development Team'
53+
author = 'pandas Development Team'
5454

5555
# The version info for the project you're documenting, acts as replacement for
5656
# |version| and |release|, also used in various other places throughout the
@@ -143,7 +143,7 @@
143143
# Add any paths that contain custom static files (such as style sheets) here,
144144
# relative to this directory. They are copied after the builtin static files,
145145
# so a file named "default.css" will overwrite the builtin "default.css".
146-
html_static_path = ['_static']
146+
html_static_path = []
147147

148148
# Add any extra paths that contain custom files (such as robots.txt or
149149
# .htaccess) here, relative to this directory. These files are copied
@@ -229,7 +229,7 @@
229229
# author, documentclass [howto, manual, or own class]).
230230
latex_documents = [
231231
(master_doc, 'pandas20DesignDocs.tex', 'pandas 2.0 Design Docs Documentation',
232-
'Wes McKinney', 'manual'),
232+
'pandas Development Team', 'manual'),
233233
]
234234

235235
# The name of an image file (relative to this directory) to place at the top of

source/index.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
Wes's pandas 2.0 Design Documents
2-
=================================
1+
pandas 2.0 Design Documents
2+
===========================
33

44
These are a set of documents, based on discussions started in December 2015, to
55
assist with discussions around changes to Python pandas's internal design

source/internal-architecture.rst

Lines changed: 83 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,9 @@
88
np.set_printoptions(precision=4, suppress=True)
99
pd.options.display.max_rows = 100
1010
11-
===============================
12-
Internal Architecture Changes
13-
===============================
11+
===================================
12+
Internals: Data structure changes
13+
===================================
1414

1515
Logical types and Physical Storage Decoupling
1616
=============================================
@@ -203,6 +203,85 @@ we've chosen for pandas, and elsewhere we can invoke pandas-specific code.
203203
A major concern here based on these ideas is **preserving NumPy
204204
interoperability**, so I'll examine this topic in some detail next.
205205

206+
Correspondence between logical and physical types
207+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
208+
209+
* **Floating point numbers**
210+
211+
- Logical: ``Float16/32/64``
212+
- Physical: ``numpy.float16/32/64``, with ``NaN`` for null (for backwards
213+
compatibility)
214+
215+
* **Signed Integers**
216+
217+
- Logical: ``Int8/16/32/64``
218+
- Physical: ``numpy.int8/16/32/64`` array plus nullness bitmap
219+
220+
* **Unsigned Integers**
221+
222+
- Logical: ``Int8/16/32/64``
223+
- Physical: ``numpy.int8/16/32/64`` array plus nullness bitmap
224+
225+
* **Boolean**
226+
227+
- Logical: ``Boolean``
228+
- Physical: ``np.bool_`` (a.k.a. ``np.uint8``) array plus nullness bitmap. We
229+
may also explore bit storage (versus bytes).
230+
231+
* **Categorical**
232+
233+
- Logical: ``Categorical[T]``, where ``T`` is any other logical type
234+
- Physical: this type is a composition of a ``Int8`` through ``Int64``
235+
(depending on the cardinality of the categories) plus the categories
236+
array. These have the same physical representation as
237+
238+
* **String and Binary**
239+
240+
- Logical: ``String`` and ``Binary``
241+
- Physical: Dictionary-encoded representation for UTF-8 and general binary
242+
data as described in the `string section <strings>`.
243+
244+
* **Timestamp**
245+
246+
- Logical: ``Timestamp[unit]``, where unit is the resolution. Nanoseconds can
247+
continue to be the default unit for now
248+
- Physical: ``numpy.int64``, with ``INT64_MIN`` as the null value.
249+
250+
* **Timedelta**
251+
252+
- Logical: ``Timedelta[unit]``, where unit is the resolution
253+
- Physical: ``numpy.int64``, with ``INT64_MIN`` as the null value.
254+
255+
* **Period**
256+
257+
- Logical: ``Period[unit]``, where unit is the resolution
258+
- Physical: ``numpy.int64``, with ``INT64_MIN`` as the null value.
259+
260+
* **Interval**
261+
262+
- Logical: ``Interval``
263+
- Physical: two arrays of ``Timestamp[U]`` -- these may need to be forced to
264+
both be the same resolution
265+
266+
* **Python objects** (catch-all for other data types)
267+
268+
- Logical: ``Object``
269+
- Physical: ``numpy.object_`` array, with None for null values (perhaps with
270+
``np.NaN`` also for backwards compatibility)
271+
272+
* **Complex numbers**
273+
274+
- Logical: ``Complex64/128``
275+
- Physical: ``numpy.complex64/128``, with ``NaN`` for null (for backwards
276+
compatibility)
277+
278+
Some notes on these:
279+
280+
- While a pandas (logical) type may map onto one or more physical
281+
representations, in general NumPy types will map directly onto a pandas
282+
type. Thus, existing code involving ``numpy.dtype``-like objects (such as
283+
``'f8'`` or ``numpy.float64``) will continue to work.
284+
206285
Preserving NumPy interoperability
207286
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
208287

@@ -318,7 +397,7 @@ bitmap** (which the user never sees). This has numerous benefits:
318397
Notably, this is the way that PostgreSQL handles null values. For example, we
319398
might have:
320399

321-
.. code-block::
400+
.. code-block:: text
322401
323402
[0, 1, 2, NA, NA, 5, 6, NA]
324403

source/strings.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,9 @@
88
np.set_printoptions(precision=4, suppress=True)
99
pd.options.display.max_rows = 100
1010
11-
==================================
12-
Enhanced string / UTF-8 handling
13-
==================================
11+
=============================================
12+
Internals: Enhanced string / UTF-8 handling
13+
=============================================
1414

1515
There are some things we can do to make pandas use less memory and perform
1616
computations significantly faster on string data.

0 commit comments

Comments
 (0)