One of the most popular posts on my blog discusses loading data from an ArcGIS table, either spatial (a Feature Class) or even nonspatial, to a Pandas DataFrame to take advantage of the powerful data analysis tools available in Python, such as SciKit-Learn or Keras paired with TensorFlow. This previous post, ArcGIS to Pandas DataFrame, details how to utilize the ArcGIS FeatureClassToNumPyArray tool as the method to get the data into a NumPy array to be loaded into a Pandas DataFrame. This, as I recently discovered, unfortunately does not scale very well.
The SearchCursor in the ArcPy Data Access module enables the creation of a filtered list of rows to load into a Pandas DataFrame through a list comprehension much more efficiently. While this will not scale indefinitely since it still does load all rows into one data structure, a list, before loading it into a data frame, this method will scale much better since the field list is limited only to the parameters of interest.
import arcpy import pandas as pd table = r'C:\path\to\table' field_list = ['field01', 'field02'] df = pd.DataFrame([row for row in arcpy.da.SearchCursor(table, field_list)])
- Pandas DataFrame: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html
- ArcGIS Data Access Search Cursor: http://pro.arcgis.com/en/pro-app/arcpy/data-access/searchcursor-class.htm