Arcgis to Pandas Data Frame using a Search Cursor

Joel McCune

Apr 8, 2016 • 1 min read

Yesterday I started trying to load data using the FeatureClassToNumPyArray or TableToNumPyArray functions included in the Data Access module inside ArcPy. Unfortunately, I kept running into memory errors, repeatedly crashing both ArcMap and ArcCatalog. Finally, I even tried using the raw command prompt with Python 3.4 included with Pro, just to check. That did not work, either. A workaround was in order. This is what I came up with.

import arcpy
from pandas import DataFrame


def get_field_names(table):
    """
    Get a list of field names not inclusive of the geometry and object id fields.
    :param table: Table readable by ArcGIS
    :return: List of field names.
    """
    # list to store values
    field_list = []

    # iterate the fields
    for field in arcpy.ListFields(table):

        # if the field is not geometry nor object id, add it as is
        if field.type != 'Geometry' and field.type != 'OID':
            field_list.append(field.name)

        # if geomtery is present, add both shape x and y for the centroid
        elif field.type == 'Geometry':
            field_list.append('SHAPE@XY')

    # return the field list
    return field_list


def table_to_pandas_dataframe(table, field_names=None):
    """
    Load data into a Pandas Data Frame for subsequent analysis.
    :param table: Table readable by ArcGIS.
    :param field_names: List of fields.
    :return: Pandas DataFrame object.
    """
    # if field names are not specified
    if not field_names:

        # get a list of field names
        field_names = get_field_names(table)

    # create a pandas data frame
    dataframe = DataFrame(columns=field_names)

    # use a search cursor to iterate rows
    with arcpy.da.SearchCursor(table, field_names) as search_cursor:

        # iterate the rows
        for row in search_cursor:

            # combine the field names and row items together, and append them
            dataframe = dataframe.append(
                dict(zip(field_names, row)), 
                ignore_index=True
            )

    # return the pandas data frame
    return dataframe

If you look closely I am just using the centroid of the geometry if the input table happens to be a feature class. While this somewhat ignores line and polygon geometries, it is the fastest solution I came up with...and unfortunately today is about efficiency, not being comprehensive. Hopefully this does provide a useful tool you can either use or extend if you run into the same type of challenge.