|
3 | 3 |
|
4 | 4 | [](https://travis-ci.org/ibab/root_pandas)
|
5 | 5 |
|
6 |
| -A convenience wrapper around the `root_numpy` library that allows you to load and save pandas DataFrames in the ROOT format used in high energy phyics. |
| 6 | +`root_pandas` is a convenience package built around the `root_numpy` library. |
| 7 | +It allows you to easily load and store pandas DataFrames using the columnar ROOT data format used in high energy physics. |
7 | 8 |
|
| 9 | +It's modeled closely after the existing pandas API for reading and writing HDF5 files. |
| 10 | +This means that in many cases, it is possible to substitute the use of HDF5 with ROOT and vice versa. |
| 11 | + |
| 12 | +On top of that, `root_pandas` offers several features that go beyond what pandas offers with `read_hdf` and `to_hdf`. |
| 13 | + |
| 14 | +These include |
| 15 | + |
| 16 | + - Specifying multiple input filenames, in which case they are read as if they were one continuous file. |
| 17 | + - Selecting several columns at once using `*` globbing and `{A,B}` shell patterns. |
| 18 | + - Flattening source files containing arrays by storing one array element each in the DataFrame, duplicating any scalar variables. |
| 19 | + |
| 20 | +## Reading ROOT files |
| 21 | + |
| 22 | +This is how you can read the contents of a ROOT file into a DataFrame: |
8 | 23 | ```python
|
9 |
| -from pandas import DataFrame |
10 | 24 | from root_pandas import read_root
|
11 | 25 |
|
12 |
| -data = [1, 2, 3] |
| 26 | +df = read_root('myfile.root') |
| 27 | +``` |
13 | 28 |
|
14 |
| -df = DataFrame({'AAA': data, 'ABA': data, 'ABB': data}) |
| 29 | +If there are several ROOT trees in the input file, you have to specify the tree key: |
| 30 | +```python |
| 31 | +df = read_root('myfile.root', 'mykey') |
| 32 | +``` |
15 | 33 |
|
16 |
| -df.to_root('test.root', 'tree') |
| 34 | +Specific columns can be selected like this: |
| 35 | +```python |
| 36 | +df = read_root('myfile.root', columns=['variable1', 'variable2']) |
| 37 | +``` |
17 | 38 |
|
18 |
| -df_new = read_root('test.root', 'tree', columns=['A{A,B}A']) |
| 39 | +You can also use `*` in the column names to read in any matching branch: |
| 40 | +```python |
| 41 | +df = read_root('myfile.root', columns=['variable*']) |
| 42 | +``` |
19 | 43 |
|
20 |
| -# DataFrames are extremely convenient |
21 |
| -df_new['answer'] = 42 |
| 44 | +In addition, you can use shell brace patterns as in |
| 45 | +```python |
| 46 | +df = read_root('myfile.root', columns=['variable{1,2}']) |
| 47 | +``` |
22 | 48 |
|
23 |
| -df_new.to_root('new.root') |
24 |
| -# The file contains a tree called 'tree' with the 'AAA', 'ABA' and 'answer' branches |
25 |
| -# There is also an 'index' branch that persists the DataFrame's index |
| 49 | +You can also use `*` and `{a,b}` simultaneously, and several times per string. |
| 50 | + |
| 51 | +Working with stored arrays can be a bit inconventient in pandas. |
| 52 | +`root_pandas` makes it easy to flatten your input data, providing you with a DataFrame containing only scalars: |
| 53 | +```python |
| 54 | +df = read_root('myfile.root', columns=['arrayvariable', 'othervariable'], flatten=True) |
26 | 55 | ```
|
27 | 56 |
|
| 57 | +Assuming the ROOT file contains the array `[1, 2, 3]` in the first `arrayvariable` column, flattening |
| 58 | +will expand this into three entries, where each contains one of the array elements. |
| 59 | +All other scalar entries are duplicated. |
| 60 | +The automatically created `__array_index` column also allows you to get the index that each array element had in its array before flattening. |
| 61 | + |
28 | 62 | There is also support for working with files that don't fit into memory:
|
29 | 63 | If the `chunksize` parameter is specified, `read_root` returns an iterator that yields DataFrames, each containing up to `chunksize` rows.
|
30 | 64 | ```python
|
31 |
| -for df in read_root('bigfile.root', 'tree', chunksize=100000): |
| 65 | +for df in read_root('bigfile.root', chunksize=100000): |
32 | 66 | # process df here
|
33 |
| - df.to_root('finished.root', mode='a') |
34 | 67 | ```
|
35 |
| -By default, `to_root` erases the existing contents of the file. Use `mode='a'` to append. |
36 | 68 |
|
37 |
| -## Installation |
38 |
| -The package is currently not on PyPI. |
39 |
| -To install it into your home directory with pip, run |
40 |
| -```bash |
41 |
| -pip install --user git+https://github.com/ibab/root_pandas |
| 69 | +You can also combine any of the above options at the same time. |
| 70 | + |
| 71 | +## Writing ROOT files |
| 72 | + |
| 73 | +`root_pandas` patches the pandas DataFrame to have a `to_root` method that allows you to save it into a ROOT file: |
| 74 | +```python |
| 75 | +df.to_root('out.root', key='mytree') |
42 | 76 | ```
|
| 77 | +You can also call the `to_root` function and specify the DataFrame as the first argument: |
| 78 | +```python |
| 79 | +to_root(df, 'out.root', key='mytree') |
| 80 | +``` |
| 81 | + |
| 82 | +By default, `to_root` erases the existing contents of the file. Use `mode='a'` to append: |
| 83 | +```python |
| 84 | +for df in read_root('bigfile.root', chunksize=100000): |
| 85 | + df.to_root('out.root', mode='a') |
| 86 | +``` |
| 87 | +When doing this, you shouldn't forget to `os.remove` the file first, otherwise you will append more and more data to it on each run of your program. |
| 88 | + |
| 89 | + |
0 commit comments