Vectorized spherical bessel functions in python?

I noticed that scipy.special Bessel functions of order n and argument x jv(n,x) are vectorized in x:

In [14]: import scipy.special as sp
In [16]: sp.jv(1, range(3)) # n=1, [x=0,1,2]
Out[16]: array([ 0., 0.44005059, 0.57672481])

But there’s no corresponding vectorized form of the spherical Bessel functions, sp.sph_jn:

In [19]: sp.sph_jn(1,range(3)) 

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-19-ea59d2f45497> in <module>()
----> 1 sp.sph_jn(1,range(3)) #n=1, 3 value array

/home/glue/anaconda/envs/fibersim/lib/python2.7/site-packages/scipy/special/basic.pyc in sph_jn(n, z)
    262     """
    263     if not (isscalar(n) and isscalar(z)):
--> 264         raise ValueError("arguments must be scalars.")
    265     if (n != floor(n)) or (n < 0):
    266         raise ValueError("n must be a non-negative integer.")

ValueError: arguments must be scalars.

Furthermore, the spherical Bessel function compute all orders of N in one pass. So if I wanted the n=5 Bessel function for argument x=10, it returns n=1,2,3,4,5. It actually returns jn and its derivative in one pass:

In [21]: sp.sph_jn(5,10)
Out[21]: 
(array([-0.05440211,  0.07846694,  0.07794219, -0.03949584, -0.10558929,
        -0.05553451]),
 array([-0.07846694, -0.0700955 ,  0.05508428,  0.09374053,  0.0132988 ,
        -0.07226858]))

Why does this asymmetry exist in the API, and does anyone know of a library that will return spherical Bessel functions vectorized, or at least more quickly (ie in cython)?

Best answer

You can write a cython function to speedup the calculation, the first thing you must to do is to get the address of the fortran function SPHJ, here is how to do this in Python:

from scipy import special as sp
sphj = sp.specfun.sphj

import ctypes
addr = ctypes.pythonapi.PyCObject_AsVoidPtr(ctypes.py_object(sphj._cpointer))

Then you can call the fortran function directly in Cython, note I use prange() to use multicore to speedup the calculation:

%%cython -c-Ofast -c-fopenmp --link-args=-fopenmp
from cpython.mem cimport PyMem_Malloc, PyMem_Free
from cython.parallel import prange
import numpy as np
import cython
from cpython cimport PyCObject_AsVoidPtr
from scipy import special

ctypedef void (*sphj_ptr) (const int *n, const double *x, 
                            const int *nm, const double *sj, const double *dj) nogil

cdef sphj_ptr _sphj=<sphj_ptr>PyCObject_AsVoidPtr(special.specfun.sphj._cpointer)


@cython.wraparound(False)
@cython.boundscheck(False)
def cython_sphj2(int n, double[::1] x):
    cdef int count = x.shape[0]
    cdef double * sj = <double *>PyMem_Malloc(count * sizeof(double) * (n + 1))
    cdef double * dj = <double *>PyMem_Malloc(count * sizeof(double) * (n + 1))
    cdef int * mn    = <int *>PyMem_Malloc(count * sizeof(int))
    cdef double[::1] res = np.empty(count)
    cdef int i
    if count < 100:
        for i in range(x.shape[0]):
            _sphj(&n, &x[i], mn + i, sj + i*(n+1), dj + i*(n+1))
            res[i] = sj[i*(n+1) + n]    #choose the element you want here        
    else:
        for i in prange(count,  nogil=True):
            _sphj(&n, &x[i], mn + i, sj + i*(n+1), dj + i*(n+1))
            res[i] = sj[i*(n+1) + n]    #choose the element you want here
    PyMem_Free(sj)
    PyMem_Free(dj)
    PyMem_Free(mn)
    return res.base

To compare, here is the Python function that call sphj() in a forloop:

import numpy as np
def python_sphj(n, x):
    sphj = special.specfun.sphj
    res = np.array([sphj(n, v)[1][n] for v in x])
    return res

Here is the %timit results for 10 elements:

x = np.linspace(1, 2, 10)
r1 = cython_sphj2(4, x)
r2 = python_sphj(4, x)
assert np.allclose(r1, r2)
%timeit cython_sphj2(4, x)
%timeit python_sphj(4, x)

the result:

10000 loops, best of 3: 21.5 µs per loop
10000 loops, best of 3: 28.1 µs per loop

Here is the results for 100000 elements:

x = np.linspace(1, 2, 100000)
r1 = cython_sphj2(4, x)
r2 = python_sphj(4, x)
assert np.allclose(r1, r2)
%timeit cython_sphj2(4, x)
%timeit python_sphj(4, x)

the result:

10 loops, best of 3: 44.7 ms per loop
1 loops, best of 3: 231 ms per loop