I am trying to add two pandas Series together. The first Series is very large and has a MultiIndex. The index of the second series is a small subset of the index of the first.

```
df1 = pd.DataFrame(np.ones((1000,5000)),dtype=int).stack()
df1 = pd.DataFrame(df1, columns = ['total'])
df2 = pd.concat([df1.iloc[50:55],df1.iloc[2000:2005]]) # df2 is tiny subset of df1
```

Using the regular Series.add function takes about 9 seconds the first time, and 2 seconds on subsequent tries (maybe because pandas optimizes how the df is stored in memory?).

```
starttime = time.time()
df1.total.add(df2.total,fill_value=0).sum()
print "Method 1 took %f seconds" % (time.time() - starttime)
```

Manually iterating over rows takes about 2/3 as long as Series.add the first time, and about 1/100 as long as Series.add on subsequent tries.

```
starttime = time.time()
result = df1.total.copy()
for row_index, row in df2.iterrows():
result[row_index] += row
print "Method 2 took %f seconds" % (time.time() - starttime)
```

The speed difference is particularly noticeable when (as here) the Index is a MultiIndex.

Why does Series.add not work well here? Any suggestions for speeding this up? Is there a more efficient alternative to iterating over each element of the Series?

Also, how do I sort or structure the data frame to improve the performance of either method? The second time either of these methods is run is appreciably faster. How do I get this performance on the first time? Sorting using sort_index helps only marginally.

Best answer

You donâ€™t need for loop:

`df1.total[df2.index] += df2.total`