Arcgis to Pandas Data Frame using a Search Cursor
Yesterday I started trying to load data using the FeatureClassToNumPyArray or TableToNumPyArray functions included in the Data Access module inside ArcPy. Unfortunately, I kept running into memory errors, repeatedly crashing both ArcMap and ArcCatalog. Finally, I even tried using the raw command prompt with Python 3.4 included with Pro, just to check. That did not work, either. A workaround was in order. This is what I came up with.
import arcpy
from pandas import DataFrame
def get_field_names(table):
"""
Get a list of field names not inclusive of the geometry and object id fields.
:param table: Table readable by ArcGIS
:return: List of field names.
"""
# list to store values
field_list = []
# iterate the fields
for field in arcpy.ListFields(table):
# if the field is not geometry nor object id, add it as is
if field.type != 'Geometry' and field.type != 'OID':
field_list.append(field.name)
# if geomtery is present, add both shape x and y for the centroid
elif field.type == 'Geometry':
field_list.append('SHAPE@XY')
# return the field list
return field_list
def table_to_pandas_dataframe(table, field_names=None):
"""
Load data into a Pandas Data Frame for subsequent analysis.
:param table: Table readable by ArcGIS.
:param field_names: List of fields.
:return: Pandas DataFrame object.
"""
# if field names are not specified
if not field_names:
# get a list of field names
field_names = get_field_names(table)
# create a pandas data frame
dataframe = DataFrame(columns=field_names)
# use a search cursor to iterate rows
with arcpy.da.SearchCursor(table, field_names) as search_cursor:
# iterate the rows
for row in search_cursor:
# combine the field names and row items together, and append them
dataframe = dataframe.append(
dict(zip(field_names, row)),
ignore_index=True
)
# return the pandas data frame
return dataframe
If you look closely I am just using the centroid of the geometry if the input table happens to be a feature class. While this somewhat ignores line and polygon geometries, it is the fastest solution I came up with...and unfortunately today is about efficiency, not being comprehensive. Hopefully this does provide a useful tool you can either use or extend if you run into the same type of challenge.