ArcGIS to Pandas Data Frame

ArcGIS to Pandas Data Frame

With the release of ArcGIS 10.4, a few packages included have come to my attention. While numpy has been in there for quite some time, now scipy and pandas are now included. GIS data scientists the world over are rejoicing.

While I am still a bit out from starting to dig into scipy, I regularly have a need for some pretty hard core data manipulation to get things cleaned up when a customer sends me data to use in a demonstration. Based on what I have been reading, Pandas appears to be a much better solution than the mass of list comprehensions and dictionaries I currently am building.

Getting started with Pandas means getting data loaded into the native in-memory data object representing tabular data, the DataFrame. Since Pandas is built to play nice with numpy, a numpy array can be used to build a Pandas DataFrame. Fortunately, a function is included in the ArcGIS Data Access module to accomplish this, FeatureClassToNumPyArray. Wrapping all this up into a single function is a pretty straight forward process.

import arcpy
from pandas import DataFrame

def feature_class_to_pandas_data_frame(feature_class, field_list):
    """
    Load data into a Pandas Data Frame for subsequent analysis.
    :param feature_class: Input ArcGIS Feature Class.
    :param field_list: Fields for input.
    :return: Pandas DataFrame object.
    """
    return DataFrame(
        arcpy.da.FeatureClassToNumPyArray(
            in_table=feature_class,
            field_names=field_list,
            skip_nulls=False,
            null_value=-99999
        )
    )

From here the next logical step is figuring out how to get data out of the Pandas DataFrame following analysis and back into ArcGIS - something I have yet to figure out. I will let you know when I figure it out.