Skip to content

Add new optional "separator" argument to json_normalize #14891

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 52 commits into from
Closed
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
457019b
added 'separator' argument to json_normalize
jowens Dec 15, 2016
c345d6d
test for json_normalize argument 'separator'
jowens Dec 16, 2016
def361d
added new enhancement: json_normalize now takes 'separator' as an opt…
jowens Dec 16, 2016
fac9ac1
rename json_normalize arg separator to sep, simpler test, add version…
jowens Dec 16, 2016
5f777f4
DOC: fixed typo (#14892)
smsaladi Dec 16, 2016
992dfbc
BUG: regression in DataFrame.combine_first with integer columns (GH14…
jorisvandenbossche Dec 16, 2016
2083f0d
DOC: Add documentation about cpplint (#14890)
gfyoung Dec 16, 2016
d1b1720
BLD: swap 3.6-dev and 3.4 builds, reorg build order (#14899)
jreback Dec 16, 2016
e7df751
ENH: merge_asof() has type specializations and can take multiple 'by'…
Dec 16, 2016
2566223
TST: to_json keeps column info with empty dataframe (#7445)
mroeschke Dec 16, 2016
6f4e36a
API: map() on Index returns an Index, not array
nateyoder Dec 16, 2016
dd8cba2
BUG: Patch read_csv NA values behaviour
gfyoung Dec 16, 2016
73bc6cf
Groupby tests restructure
aileronajay Dec 17, 2016
f5c8d54
Catch warning introduced by GH14432 in test case
Dec 17, 2016
e80a2b9
DOC for refactored compression (GH14576) + BUG: bz2-compressed URL wi…
dhimmel Dec 17, 2016
906b51a
TST: Test datetime array assignment with different units (#7492) (#14…
mroeschke Dec 17, 2016
bdbebc4
BUG: Prevent addition overflow with TimedeltaIndex (#14816)
gfyoung Dec 17, 2016
e503d40
Clean up construction of Series with dictionary and datetime index
nateyoder Dec 17, 2016
f3c5a42
BUG: .fillna() for datetime64 with tz is passing thru floats
opensourceworkAR Dec 18, 2016
37b22c7
TST: Test timedelta arithmetic (#9396) (#14906)
mroeschke Dec 18, 2016
a718962
TST: Groupby/transform with grouped NaN (#9941) (#14907)
mroeschke Dec 18, 2016
f1cfe5b
CLN: remove simple _DATELIKE_DTYPES test and replace with is_datetime…
jreback Dec 18, 2016
8b98104
ENH: select_dtypes now allows 'datetimetz' for generically selecting …
jreback Dec 19, 2016
8c798c0
TST:Test to_sparse with nan dataframe (#10079) (#14913)
mroeschke Dec 19, 2016
dc4b070
COMPAT/REF: Use s3fs for s3 IO
TomAugspurger Dec 19, 2016
39efbbc
CLN: move unique1d to algorithms from nanops (#14919)
jreback Dec 19, 2016
0ac3d98
BUG: Don't convert uint64 to object in DataFrame init (#14917)
gfyoung Dec 19, 2016
f11501a
MAINT: Only output errors in C style check (#14924)
gfyoung Dec 19, 2016
8e630b6
BUG: Fixed DataFrame.describe percentiles are ndarray w/ no median
pbreach Dec 19, 2016
3ccb501
CLN: Resubmit of GH14700. Fixes GH14554. Errors other than Indexing…
clham Dec 19, 2016
5faf32a
BUG: Fix to numeric on decimal fields
Dec 20, 2016
b35c689
BUG: Prevent uint64 overflow in Series.unique
gfyoung Dec 20, 2016
0c52813
BUG: Convert uint64 in maybe_convert_objects
gfyoung Dec 20, 2016
3ab0e55
PERF: make all inference routines cpdef bint
jreback Dec 20, 2016
02906ce
TST: Test empty input for read_csv (#14867) (#14920)
jeffcarey Dec 20, 2016
50930a9
API/BUG: Fix inconsistency in Partial String Index with 'second' reso…
ischurov Dec 20, 2016
24fb26d
BUG: bug in Series construction from UTC
jreback Dec 20, 2016
708792a
DOC: cleanup of timeseries.rst
jreback Dec 20, 2016
3ab369c
TST: Groupby.filter dropna=False with empty group (#10780) (#14926)
mroeschke Dec 20, 2016
1678f14
DOC: small edits in timeseries.rst
jreback Dec 21, 2016
4c3d4d4
cache and remove boxing (#14931)
MaximilianR Dec 21, 2016
0a7cd97
DOC: whatsnew 0.20 and timeseries doc fixes
jreback Dec 21, 2016
07c83ee
PERF: fix getitem unique_check / initialization issue
jreback Dec 21, 2016
73e2829
BUG: Properly read Categorical msgpacks (#14918)
gfyoung Dec 21, 2016
f79bc7a
DOC: Pandas Cheat Sheet
Dr-Irv Dec 21, 2016
a06e32a
added 'separator' argument to json_normalize
jowens Dec 15, 2016
dcc4632
test for json_normalize argument 'separator'
jowens Dec 16, 2016
2363314
added new enhancement: json_normalize now takes 'separator' as an opt…
jowens Dec 16, 2016
8e0faa8
rename json_normalize arg separator to sep, simpler test, add version…
jowens Dec 16, 2016
521720d
json_normalize's separator is now sep, also does a check for string_t…
jowens Dec 21, 2016
74c4285
simpler and better tests for json_normalize with separator (default, …
jowens Dec 21, 2016
8b72b12
Merge branch 'json_normalize-separator' of github.com:jowens/pandas i…
jowens Dec 21, 2016
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.20.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@ Other enhancements
- ``pd.DataFrame.plot`` now prints a title above each subplot if ``suplots=True`` and ``title`` is a list of strings (:issue:`14753`)
- ``pd.Series.interpolate`` now supports timedelta as an index type with ``method='time'`` (:issue:`6424`)
- ``pandas.io.json.json_normalize()`` gained the option ``errors='ignore'|'raise'``; the default is ``errors='raise'`` which is backward compatible. (:issue:`14583`)
- ``pandas.io.json.json_normalize()`` gained the option ``separator=string``; the default is ``separator='.'`` which is backward compatible. (:issue:`14883`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about, ...gained separator option which accepts str, default is "."



.. _whatsnew_0200.api_breaking:
Expand Down
11 changes: 7 additions & 4 deletions pandas/io/json.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,8 @@ def to_json(path_or_buf, obj, orient=None, date_format='epoch',
default_handler=None, lines=False):

if lines and orient != 'records':
raise ValueError(
"'lines' keyword only valid when 'orient' is records")
raise ValueError(
"'lines' keyword only valid when 'orient' is records")

if isinstance(obj, Series):
s = SeriesWriter(
Expand Down Expand Up @@ -726,8 +726,8 @@ def nested_to_record(ds, prefix="", level=0):
def json_normalize(data, record_path=None, meta=None,
meta_prefix=None,
record_prefix=None,
separator='.',
errors='raise'):

"""
"Normalize" semi-structured JSON data into a flat table

Expand All @@ -744,6 +744,9 @@ def json_normalize(data, record_path=None, meta=None,
If True, prefix records with dotted (?) path, e.g. foo.bar.field if
path to records is ['foo', 'bar']
meta_prefix : string, default None
separator : string, default '.'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a .. versionadded:: directive here.

Also, might be better to make separator the last keyword argument. That way it won't break people using all positional arguments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call it sep

add a version added tag

Nested records will generate names separated by separator,
e.g., for separator='.', { 'foo' : { 'bar' : 0 } } -> foo.bar
errors : {'raise', 'ignore'}, default 'raise'
* ignore : will ignore KeyError if keys listed in meta are not
always present
Expand Down Expand Up @@ -828,7 +831,7 @@ def _pull_field(js, spec):
lengths = []

meta_vals = defaultdict(list)
meta_keys = ['.'.join(val) for val in meta]
meta_keys = [separator.join(val) for val in meta]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate whether separator is compat.string_types


def _recursive_extract(data, path, seen_meta, level=0):
if len(path) > 1:
Expand Down
30 changes: 30 additions & 0 deletions pandas/io/tests/json/test_json_norm.py
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,36 @@ def test_shallow_nested(self):
expected = DataFrame(ex_data, columns=result.columns)
tm.assert_frame_equal(result, expected)

def test_shallow_nested_with_separator(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this test can be simplified a lot. Could you do something like

result = json_normalize({"A": {"A": 1, "B": 2}}, separator='_')
expected = pd.DataFrame([[1, 2]], columns={"A_A", "A_B"})
assert_frame_equal(result, expected)

That way you're directly testing your change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add test with the default separator, and ensure that the columns are A.A, A.B.

data = [{'state': 'Florida',
'shortname': 'FL',
'info': {
'governor': 'Rick Scott'
},
'counties': [{'name': 'Dade', 'population': 12345},
{'name': 'Broward', 'population': 40000},
{'name': 'Palm Beach', 'population': 60000}]},
{'state': 'Ohio',
'shortname': 'OH',
'info': {
'governor': 'John Kasich'
},
'counties': [{'name': 'Summit', 'population': 1234},
{'name': 'Cuyahoga', 'population': 1337}]}]

result = json_normalize(data, 'counties',
['state', 'shortname',
['info', 'governor']],
separator='_')
ex_data = {'name': ['Dade', 'Broward', 'Palm Beach', 'Summit',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u also add unicode tests?

'Cuyahoga'],
'state': ['Florida'] * 3 + ['Ohio'] * 2,
'shortname': ['FL', 'FL', 'FL', 'OH', 'OH'],
'info_governor': ['Rick Scott'] * 3 + ['John Kasich'] * 2,
'population': [12345, 40000, 60000, 1234, 1337]}
expected = DataFrame(ex_data, columns=result.columns)
tm.assert_frame_equal(result, expected)

def test_meta_name_conflict(self):
data = [{'foo': 'hello',
'bar': 'there',
Expand Down