Calculate Pandas DataFrame Time Difference Between Two Columns In Hours And Minutes
Answer :
Pandas timestamp differences returns a datetime.timedelta object. This can easily be converted into hours by using the *as_type* method, like so
import pandas df = pandas.DataFrame(columns=['to','fr','ans']) df.to = [pandas.Timestamp('2014-01-24 13:03:12.050000'), pandas.Timestamp('2014-01-27 11:57:18.240000'), pandas.Timestamp('2014-01-23 10:07:47.660000')] df.fr = [pandas.Timestamp('2014-01-26 23:41:21.870000'), pandas.Timestamp('2014-01-27 15:38:22.540000'), pandas.Timestamp('2014-01-23 18:50:41.420000')] (df.fr-df.to).astype('timedelta64[h]') to yield,
0    58 1     3 2     8 dtype: float64 This was driving me bonkers as the .astype() solution above didn't work for me.  But I found another way.  Haven't timed it or anything, but might work for others out there:
t1 = pd.to_datetime('1/1/2015 01:00') t2 = pd.to_datetime('1/1/2015 03:30')  print pd.Timedelta(t2 - t1).seconds / 3600.0 ...if you want hours. Or:
print pd.Timedelta(t2 - t1).seconds / 60.0 ...if you want minutes.
- How do I convert my results to only hours and minutes - The accepted answer only returns days + hours. Minutes are not included.
 
- The accepted answer only returns 
- To provide a column that has hours and minutes, as hh:mmorx hours y minutes, would require additional calculations and string formatting.
- This answer shows how to get either total hours or total minutes as a float, using timedeltamath, and is faster than using.astype('timedelta64[h]')
- Pandas Time Deltas User Guide
- Pandas Time series / date functionality User Guide
- python timedeltaobjects: See supported operations.
- The following sample data is already a datetime64[ns] dtype. It is required that all relevant columns are converted usingpandas.to_datetime().
import pandas as pd  # test data from OP, with values already in a datetime format data = {'to_date': [pd.Timestamp('2014-01-24 13:03:12.050000'), pd.Timestamp('2014-01-27 11:57:18.240000'), pd.Timestamp('2014-01-23 10:07:47.660000')],         'from_date': [pd.Timestamp('2014-01-26 23:41:21.870000'), pd.Timestamp('2014-01-27 15:38:22.540000'), pd.Timestamp('2014-01-23 18:50:41.420000')]}  # test dataframe; the columns must be in a datetime format; use pandas.to_datetime if needed df = pd.DataFrame(data)  # add a timedelta column if wanted. It's added here for information only # df['time_delta_with_sub'] = df.from_date.sub(df.to_date)  # also works df['time_delta'] = (df.from_date - df.to_date)  # create a column with timedelta as total hours, as a float type df['tot_hour_diff'] = (df.from_date - df.to_date) / pd.Timedelta(hours=1)  # create a colume with timedelta as total minutes, as a float type df['tot_mins_diff'] = (df.from_date - df.to_date) / pd.Timedelta(minutes=1)  # display(df)                   to_date               from_date             time_delta  tot_hour_diff  tot_mins_diff 0 2014-01-24 13:03:12.050 2014-01-26 23:41:21.870 2 days 10:38:09.820000      58.636061    3518.163667 1 2014-01-27 11:57:18.240 2014-01-27 15:38:22.540 0 days 03:41:04.300000       3.684528     221.071667 2 2014-01-23 10:07:47.660 2014-01-23 18:50:41.420 0 days 08:42:53.760000       8.714933     522.896000 Other methods
- An item of note from the podcast in Other Resources, .total_seconds()was added and merged when the core developer was on vacation, and would not have been approved.- This is also why there aren't other .total_xxmethods.
 
- This is also why there aren't other 
# convert the entire timedelta to seconds # this is the same as td / timedelta(seconds=1) (df.from_date - df.to_date).dt.total_seconds() [out]: 0    211089.82 1     13264.30 2     31373.76 dtype: float64  # get the number of days (df.from_date - df.to_date).dt.days [out]: 0    2 1    0 2    0 dtype: int64  # get the seconds for hours + minutes + seconds, but not days # note the difference from total_seconds (df.from_date - df.to_date).dt.seconds [out]: 0    38289 1    13264 2    31373 dtype: int64 Other Resources
- Talk Python to Me: Episode #271: Unlock the mysteries of time, Python's datetime that is! - Timedelta begins at 31 minutes
- As per Python core developer Paul Ganssle and python dateutilmaintainer:- Use (df.from_date - df.to_date) / pd.Timedelta(hours=1)
- Don't use (df.from_date - df.to_date).dt.total_seconds() / 3600- pandas.Series.dt.total_seconds
- .dtaccessor
 
 
- Use 
 
- Real Python: Using Python datetime to Work With Dates and Times
- The dateutilmodule provides powerful extensions to the standarddatetimemodule.
%%timeit test
 import pandas as pd  # dataframe with 2M rows data = {'to_date': [pd.Timestamp('2014-01-24 13:03:12.050000'), pd.Timestamp('2014-01-27 11:57:18.240000')], 'from_date': [pd.Timestamp('2014-01-26 23:41:21.870000'), pd.Timestamp('2014-01-27 15:38:22.540000')]} df = pd.DataFrame(data) df = pd.concat([df] * 1000000).reset_index(drop=True)  %%timeit (df.from_date - df.to_date) / pd.Timedelta(hours=1) [out]: 43.1 ms ± 1.05 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)  %%timeit (df.from_date - df.to_date).astype('timedelta64[h]') [out]: 59.8 ms ± 1.29 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 
Comments
Post a Comment