Converting Pandas DataFrame To GeoDataFrame


Answer :

Convert the DataFrame's content (e.g. Lat and Lon columns) into appropriate Shapely geometries first and then use them together with the original DataFrame to create a GeoDataFrame.


from geopandas import GeoDataFrame
from shapely.geometry import Point

geometry = [Point(xy) for xy in zip(df.Lon, df.Lat)]
df = df.drop(['Lon', 'Lat'], axis=1)
gdf = GeoDataFrame(df, crs="EPSG:4326", geometry=geometry)

Result:


    Date/Time           ID      geometry
0 4/1/2014 0:11:00 140 POINT (-73.95489999999999 40.769)
1 4/1/2014 0:17:00 NaN POINT (-74.03449999999999 40.7267)



Since the geometries often come in the WKT format, I thought I'd include an example for that case as well:


import geopandas as gpd
import shapely.wkt

geometry = df['wktcolumn'].map(shapely.wkt.loads)
df = df.drop('wktcolumn', axis=1)
gdf = gpd.GeoDataFrame(df, crs="EPSG:4326", geometry=geometry)


Update 201912: The official documentation at https://geopandas.readthedocs.io/en/latest/gallery/create_geopandas_from_pandas.html does it succinctly using geopandas.points_from_xy like so:



gdf = geopandas.GeoDataFrame(
df, geometry=geopandas.points_from_xy(x=df.Longitude, y=df.Latitude)
)


You can also set a crs or z (e.g. elevation) value if you want.






Old Method: Using shapely



One-liners! Plus some performance pointers for big-data people.



Given a pandas.DataFrame that has x Longitude and y Latitude like so:



df.head()
x y
0 229.617902 -73.133816
1 229.611157 -73.141299
2 229.609825 -73.142795
3 229.607159 -73.145782
4 229.605825 -73.147274


Let's convert the pandas.DataFrame into a geopandas.GeoDataFrame as follows:



Library imports and shapely speedups:



import geopandas as gpd
import shapely
shapely.speedups.enable() # enabled by default from version 1.6.0


Code + benchmark times on a test dataset I have lying around:



#Martin's original version:
#%timeit 1.87 s ± 7.03 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
gdf = gpd.GeoDataFrame(df.drop(['x', 'y'], axis=1),
crs={'init': 'epsg:4326'},
geometry=[shapely.geometry.Point(xy) for xy in zip(df.x, df.y)])



#Pandas apply method
#%timeit 8.59 s ± 60.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
gdf = gpd.GeoDataFrame(df.drop(['x', 'y'], axis=1),
crs={'init': 'epsg:4326'},
geometry=df.apply(lambda row: shapely.geometry.Point((row.x, row.y)), axis=1))


Using pandas.apply is surprisingly slower, but may be a better fit for some other workflows (e.g. on bigger datasets using dask library):



Credits to:




  • Making shapefile from Pandas dataframe? (for the pandas apply method)

  • Speed up row-wise point in polygon with Geopandas (for the speedup hint)



Some Work-In-Progress references (as of 2017) for handling big dask datasets:




  • http://matthewrocklin.com/blog/work/2017/09/21/accelerating-geopandas-1

  • https://github.com/geopandas/geopandas/issues/461

  • https://github.com/mrocklin/dask-geopandas



Comments

Popular posts from this blog

Converting A String To Int In Groovy

"Cannot Create Cache Directory /home//.composer/cache/repo/https---packagist.org/, Or Directory Is Not Writable. Proceeding Without Cache"

Android SDK Location Should Not Contain Whitespace, As This Cause Problems With NDK Tools