Skip to content
This repository was archived by the owner on Jan 9, 2023. It is now read-only.

Commit 1b6f98c

Browse files
committed
Update README.md
1 parent 6abd81d commit 1b6f98c

File tree

1 file changed

+66
-19
lines changed

1 file changed

+66
-19
lines changed

README.md

Lines changed: 66 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -3,40 +3,87 @@
33

44
[![Build Status](https://travis-ci.org/ibab/root_pandas.svg?branch=master)](https://travis-ci.org/ibab/root_pandas)
55

6-
A convenience wrapper around the `root_numpy` library that allows you to load and save pandas DataFrames in the ROOT format used in high energy phyics.
6+
`root_pandas` is a convenience package built around the `root_numpy` library.
7+
It allows you to easily load and store pandas DataFrames using the columnar ROOT data format used in high energy physics.
78

9+
It's modeled closely after the existing pandas API for reading and writing HDF5 files.
10+
This means that in many cases, it is possible to substitute the use of HDF5 with ROOT and vice versa.
11+
12+
On top of that, `root_pandas` offers several features that go beyond what pandas offers with `read_hdf` and `to_hdf`.
13+
14+
These include
15+
16+
- Specifying multiple input filenames, in which case they are read as if they were one continuous file.
17+
- Selecting several columns at once using `*` globbing and `{A,B}` shell patterns.
18+
- Flattening source files containing arrays by storing one array element each in the DataFrame, duplicating any scalar variables.
19+
20+
## Reading ROOT files
21+
22+
This is how you can read the contents of a ROOT file into a DataFrame:
823
```python
9-
from pandas import DataFrame
1024
from root_pandas import read_root
1125

12-
data = [1, 2, 3]
26+
df = read_root('myfile.root')
27+
```
1328

14-
df = DataFrame({'AAA': data, 'ABA': data, 'ABB': data})
29+
If there are several ROOT trees in the input file, you have to specify the tree key:
30+
```python
31+
df = read_root('myfile.root', 'mykey')
32+
```
1533

16-
df.to_root('test.root', 'tree')
34+
Specific columns can be selected like this:
35+
```python
36+
df = read_root('myfile.root', columns=['variable1', 'variable2'])
37+
```
1738

18-
df_new = read_root('test.root', 'tree', columns=['A{A,B}A'])
39+
You can also use `*` in the column names to read in any matching branch:
40+
```python
41+
df = read_root('myfile.root', columns=['variable*'])
42+
```
1943

20-
# DataFrames are extremely convenient
21-
df_new['answer'] = 42
44+
In addition, you can use shell brace patterns as in
45+
```python
46+
df = read_root('myfile.root', columns=['variable{1,2}'])
47+
```
2248

23-
df_new.to_root('new.root')
24-
# The file contains a tree called 'tree' with the 'AAA', 'ABA' and 'answer' branches
25-
# There is also an 'index' branch that persists the DataFrame's index
49+
You can also use `*` and `{a,b}` simultaneously, and several times per string.
50+
51+
Working with stored arrays can be a bit inconventient in pandas.
52+
`root_pandas` makes it easy to flatten your input data, providing you with a DataFrame containing only scalars:
53+
```python
54+
df = read_root('myfile.root', columns=['arrayvariable', 'othervariable'], flatten=True)
2655
```
2756

57+
Assuming the ROOT file contains the array `[1, 2, 3]` in the first `arrayvariable` column, flattening
58+
will expand this into three entries, where each contains one of the array elements.
59+
All other scalar entries are duplicated.
60+
The automatically created `__array_index` column also allows you to get the index that each array element had in its array before flattening.
61+
2862
There is also support for working with files that don't fit into memory:
2963
If the `chunksize` parameter is specified, `read_root` returns an iterator that yields DataFrames, each containing up to `chunksize` rows.
3064
```python
31-
for df in read_root('bigfile.root', 'tree', chunksize=100000):
65+
for df in read_root('bigfile.root', chunksize=100000):
3266
# process df here
33-
df.to_root('finished.root', mode='a')
3467
```
35-
By default, `to_root` erases the existing contents of the file. Use `mode='a'` to append.
3668

37-
## Installation
38-
The package is currently not on PyPI.
39-
To install it into your home directory with pip, run
40-
```bash
41-
pip install --user git+https://github.com/ibab/root_pandas
69+
You can also combine any of the above options at the same time.
70+
71+
## Writing ROOT files
72+
73+
`root_pandas` patches the pandas DataFrame to have a `to_root` method that allows you to save it into a ROOT file:
74+
```python
75+
df.to_root('out.root', key='mytree')
4276
```
77+
You can also call the `to_root` function and specify the DataFrame as the first argument:
78+
```python
79+
to_root(df, 'out.root', key='mytree')
80+
```
81+
82+
By default, `to_root` erases the existing contents of the file. Use `mode='a'` to append:
83+
```python
84+
for df in read_root('bigfile.root', chunksize=100000):
85+
df.to_root('out.root', mode='a')
86+
```
87+
When doing this, you shouldn't forget to `os.remove` the file first, otherwise you will append more and more data to it on each run of your program.
88+
89+

0 commit comments

Comments
 (0)