Skip to content

Commit 98ea53b

Browse files
committed
Merge pull request #8915 from tshauck/generate_bq_schema
ENH: adds ability to generate bq schema from df
2 parents 1781ee0 + efcf227 commit 98ea53b

File tree

4 files changed

+48
-0
lines changed

4 files changed

+48
-0
lines changed

doc/source/io.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3651,6 +3651,14 @@ data quickly, but it is not a direct replacement for a transactional database.
36513651
You can access the management console to determine project id's by:
36523652
<https://code.google.com/apis/console/b/0/?noredirect>
36533653

3654+
As of 0.15.2, the gbq module has a function ``generate_bq_schema`` which
3655+
will produce the dictionary representation of the schema.
3656+
3657+
.. code-block:: python
3658+
3659+
df = pandas.DataFrame({'A': [1.0]})
3660+
gbq.generate_bq_schema(df, default_type='STRING')
3661+
36543662
.. warning::
36553663

36563664
To use this module, you will need a valid BigQuery account. See

doc/source/whatsnew/v0.15.2.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,7 @@ Enhancements
7373
- ``Timedelta`` arithmetic returns ``NotImplemented`` in unknown cases, allowing extensions by custom classes (:issue:`8813`).
7474
- ``Timedelta`` now supports arithemtic with ``numpy.ndarray`` objects of the appropriate dtype (numpy 1.8 or newer only) (:issue:`8884`).
7575
- Added ``Timedelta.to_timedelta64`` method to the public API (:issue:`8884`).
76+
- Added ``gbq.generate_bq_schema`` function to the gbq module (:issue:`8325`).
7677

7778
.. _whatsnew_0152.performance:
7879

pandas/io/gbq.py

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -444,3 +444,31 @@ def to_gbq(dataframe, destination_table, project_id=None, chunksize=10000,
444444
dataset_id, table_id = destination_table.rsplit('.',1)
445445

446446
connector.load_data(dataframe, dataset_id, table_id, chunksize, verbose)
447+
448+
def generate_bq_schema(df, default_type='STRING'):
449+
""" Given a passed df, generate the associated big query schema.
450+
451+
Parameters
452+
----------
453+
df : DataFrame
454+
default_type : string
455+
The default big query type in case the type of the column
456+
does not exist in the schema.
457+
"""
458+
459+
type_mapping = {
460+
'i': 'INTEGER',
461+
'b': 'BOOLEAN',
462+
'f': 'FLOAT',
463+
'O': 'STRING',
464+
'S': 'STRING',
465+
'U': 'STRING',
466+
'M': 'TIMESTAMP'
467+
}
468+
469+
fields = []
470+
for column_name, dtype in df.dtypes.iteritems():
471+
fields.append({'name': column_name,
472+
'type': type_mapping.get(dtype.kind, default_type)})
473+
474+
return {'fields': fields}

pandas/io/tests/test_gbq.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -277,6 +277,17 @@ def test_google_upload_errors_should_raise_exception(self):
277277
with tm.assertRaises(gbq.UnknownGBQException):
278278
gbq.to_gbq(bad_df, 'pydata_pandas_bq_testing.new_test', project_id = PROJECT_ID)
279279

280+
def test_generate_bq_schema(self):
281+
282+
df = tm.makeMixedDataFrame()
283+
schema = gbq.generate_bq_schema(df)
284+
285+
test_schema = {'fields': [{'name': 'A', 'type': 'FLOAT'},
286+
{'name': 'B', 'type': 'FLOAT'},
287+
{'name': 'C', 'type': 'STRING'},
288+
{'name': 'D', 'type': 'TIMESTAMP'}]}
289+
290+
self.assertEqual(schema, test_schema)
280291

281292
@classmethod
282293
def tearDownClass(cls):

0 commit comments

Comments
 (0)