I have the following piece of code doing exactly what i want (it is part of a kriging method). But the problem is that it goes too slow, and i wish to know if there is any option to push the for-loop down to numpy? If i push out the numpy.sum, and use the axis argument there, it speeds up a little bit, but apparently that is not the bottleneck. Any ideas on how i can push down the forloop to numpy to speed it up, or other ways to speed it up?)

```
# n = 2116
print GRZVV.shape # (16309, 2116)
print GinvVV.shape # (2117, 2117)
VVg = numpy.empty((GRZVV.shape[0]))
for k in xrange(GRZVV.shape[0]):
GRVV = numpy.empty((n+1, 1))
GRVV[n, 0] = 1
GRVV[:n, 0] = GRZVV[k, :]
EVV = numpy.array(GinvVV * GRVV) # GinvVV is numpy.matrix
VVg[k] = numpy.sum(EVV[:n, 0] * VV)
```

I posted the dimensions of the ndarrays n matrix to clear some stuff out

edit: shape of VV is 2116

Best answer

You could do the following in place of your loop over k (runtime ~3s):

```
tmp = np.concatenate((GRZVV, np.ones((16309,1),dtype=np.double)), axis=1)
EVV1 = np.dot(GinvVV, tmp.T)
#Changed line below based on *askewchan's* recommendation
VVg1 = np.sum(np.multiply(EVV1[:n,:],VV[:,np.newaxis]), axis=0)
```