Column arithmetic in pandas dataframe using dates

I think this should be easy but I’m hitting a bit of a wall. I have a dataset that was imported into a pandas dataframe from a Stata .dta file. Several of the columns contain date data. The dataframe contains 100,000+ rows but a sample is given:

   cat  event_date  total
0   G2  2006-03-08     16
1   G2         NaT    NaN
2   G2         NaT    NaN
3   G3  2006-03-10     16
4   G3  2006-08-04     12
5   G3  2006-12-28     13
6   G3  2007-05-25     10
7   G4  2006-03-10     13
8   G4  2006-08-06     19
9   G4  2006-12-30     16

The data is stored as a datetime64 format:

>>> mydata[['cat','event_date','total']].dtypes
cat                    object
event_date     datetime64[ns]
total                 float64
dtype: object

All I would like to do is create a new column which gives the difference in days (rather than ‘us’ or ‘ns’!!!) between the event_date and a start date, say 2006-01-01. I’ve tried the following:

>>> mydata['new'] = mydata['event_date'] - np.datetime64('2006-01-01')

… but I get the message:

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

I’ve also tried a lambda function but that doesn’t work either.

However, if I wanted to simply add on one day to each date I can successfully use:

>>> mydata['plusone'] = mydata['event_date'] + np.timedelta64(1,'D')

That works fine.

Am I missing something straightforward here?

Thanks in advance for any help.

Best answer

Not sure why the numpy datetime64 is incompatible with pandas dtypes but using datetime objects worked fine for me:

In [39]:

import datetime as dt
mydata['new'] = mydata['event_date'] - dt.datetime(2006,1,1)
mydata
Out[39]:
      cat event_date  total      new
Index                               
0      G2 2006-03-08     16  66 days
1      G2        NaT    NaN      NaT
2      G2        NaT    NaN      NaT
3      G3 2006-03-10     16  68 days
4      G3 2006-08-04     12 215 days
5      G3 2006-12-28     13 361 days
6      G3 2007-05-25     10 509 days
7      G4 2006-03-10     13  68 days
8      G4 2006-08-06     19 217 days
9      G4 2006-12-30     16 363 days