ArcGIS to Pandas Data Frame v2.0
One of the most popular posts on my blog discusses loading data from an ArcGIS table, either spatial (a Feature Class) or even nonspatial, to a Pandas DataFrame to take advantage of the powerful data analysis tools available in Python, such as SciKit-Learn or Keras paired with TensorFlow. This previous post, ArcGIS to Pandas DataFrame, details how to utilize the ArcGIS FeatureClassToNumPyArray tool as the method to get the data into a NumPy array to be loaded into a Pandas DataFrame. This, as I recently discovered, unfortunately does not scale very well.
The SearchCursor in the ArcPy Data Access module enables the creation of a filtered list of rows to load into a Pandas DataFrame through a list comprehension much more efficiently. While this will not scale indefinitely since it still does load all rows into one data structure, a list, before loading it into a data frame, this method will scale much better since the field list is limited only to the parameters of interest.
import arcpy
import pandas as pd
table = r'C:\path\to\table'
field_list = ['field01', 'field02']
df = pd.DataFrame([row for row in arcpy.da.SearchCursor(table, field_list)])
References:
- Pandas DataFrame: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html
- ArcGIS Data Access Search Cursor: http://pro.arcgis.com/en/pro-app/arcpy/data-access/searchcursor-class.htm