@@ -179,20 +179,16 @@ We've enhanced the :class:`StringDtype`, an extension type dedicated to string d
179
179
(:issue: `39908 `)
180
180
181
181
It is now possible to specify a ``storage `` keyword option to :class: `StringDtype `, use
182
- pandas options or specify the dtype using ``dtype='string[pyarrow]' ``
182
+ pandas options or specify the dtype using ``dtype='string[pyarrow]' `` to allow the
183
+ StringArray to be backed by a PyArrow array instead of a NumPy array of Python objects.
184
+
185
+ The PyArrow backed StringArray requires pyarrow 1.0.0 or greater to be installed.
183
186
184
187
.. warning ::
185
188
186
189
``string[pyarrow] `` is currently considered experimental. The implementation
187
190
and parts of the API may change without warning.
188
191
189
- The ``'string[pyarrow]' `` extension type solves several issues with NumPy backed arrays:
190
-
191
- 1.
192
- 2.
193
- 3.
194
-
195
-
196
192
.. ipython :: python
197
193
198
194
pd.Series([' abc' , None , ' def' ], dtype = pd.StringDtype(storage = " pyarrow" ))
@@ -212,8 +208,8 @@ You can also create a PyArrow backed string array using pandas options.
212
208
s = pd.Series([' abc' , None , ' def' ], dtype = " string" )
213
209
s
214
210
215
- The usual string accessor methods work. Where appropriate, the return type
216
- of the Series or columns of a DataFrame will also have string dtype.
211
+ The usual string accessor methods work. Where appropriate, the return type of the Series
212
+ or columns of a DataFrame will also have string dtype.
217
213
218
214
.. ipython :: python
219
215
@@ -226,7 +222,61 @@ String accessor methods returning integers will return a value with :class:`Int6
226
222
227
223
s.str.count(" a" )
228
224
229
- See :ref: `text.types ` for more.
225
+ Some string accessor methods use native PyArrow string kernels operating directly on the
226
+ PyArrow memory, others fallback to converting to a NumPy array of Python objects and
227
+ using the native Python string functions. String methods using Pyarrow kernels are
228
+ generally much more performant.
229
+
230
+ Some PyArrow string kernels are implemented in later versions of pyarrow that the
231
+ minimum version required to create a PyArrow backed StringArray. In these cases, the
232
+ string accessor will fall back to the Python implementations.
233
+
234
+ Some string accessor methods accept arguments controlling their behaviour which are not
235
+ supported by the PyArrow kernels. These cases will also fall back to object mode.
236
+
237
+ +--------------------------------+----------+------------------------------------------+
238
+ | Accessor | Minimum | Limitations (otherwise fall back to |
239
+ | Method | PyArrow | object mode) |
240
+ | | Version | |
241
+ +================================+==========+==========================================+
242
+ | :meth: `~Series.str.contains ` | 1.0.0 | The ``flags `` argument is not supported. |
243
+ | | | If ``regex=True ``, pyarrow 4.0.0 is |
244
+ | | | required and ``case=False `` is not |
245
+ | | | supported. |
246
+ +--------------------------------+----------+------------------------------------------+
247
+ | :meth: `~Series.str.startswith ` | 4.0.0 | |
248
+ | :meth: `~Series.str.endswith ` | | |
249
+ +--------------------------------+----------+------------------------------------------+
250
+ | :meth: `~Series.str.replace ` | 4.0.0 | The ``flags `` argument, ``case=False ``, |
251
+ | | | passing a callable for the ``repr `` |
252
+ | | | argument or passing a compiled regex is |
253
+ | | | not supported. |
254
+ +--------------------------------+----------+------------------------------------------+
255
+ | :meth: `~Series.str.match ` | 4.0.0 | |
256
+ | :meth: `~Series.str.fullmatch ` | | |
257
+ +--------------------------------+----------+------------------------------------------+
258
+ | :meth: `~Series.str.isalnum ` | 1.0.0 | |
259
+ | :meth: `~Series.str.isalpha ` | | |
260
+ | :meth: `~Series.str.isdecimal ` | | |
261
+ | :meth: `~Series.str.isdigit ` | | |
262
+ | :meth: `~Series.str.islower ` | | |
263
+ | :meth: `~Series.str.isnumeric ` | | |
264
+ | :meth: `~Series.str.istitle ` | | |
265
+ | :meth: `~Series.str.isupper ` | | |
266
+ +--------------------------------+----------+------------------------------------------+
267
+ | :meth: `~Series.str.isspace ` | 2.0.0 | |
268
+ +--------------------------------+----------+------------------------------------------+
269
+ | :meth: `~Series.str.len ` | 4.0.0 | |
270
+ +--------------------------------+----------+------------------------------------------+
271
+ | :meth: `~Series.str.lower ` | 1.0.0 | |
272
+ | :meth: `~Series.str.upper ` | | |
273
+ +--------------------------------+----------+------------------------------------------+
274
+ | :meth: `~Series.str.strip ` | 4.0.0 | |
275
+ | :meth: `~Series.str.lstrip ` | | |
276
+ | :meth: `~Series.str.rstrip ` | | |
277
+ +--------------------------------+----------+------------------------------------------+
278
+
279
+
230
280
231
281
Centered Datetime-Like Rolling Windows
232
282
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
0 commit comments