Numpy 两个矩阵部分维度相乘，有没有很快的方法？

推荐学习书目

› Learn Python the Hard Way

Python Sites

› PyPI - Python Package Index

› http://diveintopython.org/toc/index.html

› Pocoo

值得关注的项目

› PyPy

› Celery

› Jinja2

› Read the Docs

› gevent

› pyenv

› virtualenv

› Stackless Python

› Beautiful Soup

› 结巴中文分词

› Green Unicorn

› Sentry

› Shovel

› Pyflakes

› pytest

Python 编程

› pep8 Checker

Styles

› PEP 8

› Google Python Style Guide

› Code Style from The Hitchhiker's Guide

This topic created in 2811 days ago, the information mentioned may be changed or developed.

例如，有两个三维矩阵 m1，m2，shape 都是 (10000, 2, 2)

m1 = array([x1, x2, ..., x9999])
m2 = array([y1, y2, ..., y9999])

xn, yn 都是 2x2 的矩阵。

希望得到的结果是：

res = array([dot(x1, y1), dot(x2, y2), ..., dot(x9999, y9999)])

目前我试过最快的方法是 map（循环、列表推导都试过，numpy.vectorize 没成功），但感觉还是不够快。各位有没有更好的方法？

array

dot

矩阵

x9999

13 replies • 2018-09-28 07:42:31 +08:00

JeffKing

Sep 27, 2018 via iPhone

多进程试试

Philippa

Sep 27, 2018 via iPhone

Python 已经很快了底层做了很多优化，不行把矩阵拆开多进程大批量的情况下可以加快最后再组合回来。再不行试一下用 cpp。

justou

Sep 27, 2018

放到 cython 里面用 prange 多线程算

Alan1312

Sep 27, 2018 via iPhone

https://software.intel.com/en-us/articles/numpyscipy-with-intel-mkl

试试这个，再不行就只能用 Matlab 了，不要尝试手写 cpp，几乎不可能比 MKL 更快的了

GenkunAbe

Sep 27, 2018

不知道是不是我理解的有问题，numpy.matmul 是可以实现多个矩阵对应元素相乘的。

按照我对 lz 的描述的理解，直接 np.matmul(m1, m2) 就可以达到目的。

wizardforcel

Sep 27, 2018 via Android

https://docs.scipy.org/doc/numpy/reference/generated/numpy.matmul.html

If either argument is N-D, N > 2, it is treated as a stack of matrices residing in the last two indexes and broadcast accordingly.

意思是，如果输入大于 2d，会当做矩阵的数组 /矩阵...

所以直接乘就行了

nooper

Sep 27, 2018 via iPad

有的有的，直接选择矩阵特定的切片元素相乘就好因为直接是矩阵计算不需要循环，性能可以的

necomancer

Sep 27, 2018

matmul(...)
matmul(a, b, out=None)

Matrix product of two arrays.

The behavior depends on the arguments in the following way.

- If both arguments are 2-D they are multiplied like conventional
matrices.
- If either argument is N-D, N > 2, it is treated as a stack of
matrices residing in the last two indexes and broadcast accordingly.
...

所以 np.matmul(m1, m2) 就行了。

In [1]: a = np.random.random((100000,2,2))

In [2]: b = np.random.random((100000,2,2))

In [3]: c = np.matmul(a, b)

In [4]: d = np.array([np.dot(i,j) for i,j in zip(a,b)])

In [5]: np.allclose(c,d)
Out[5]: True

In [6]: %timeit np.matmul(a,b)
41.1 ms ± 81.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [7]: %timeit np.array([np.dot(i,j) for i, j in zip(a,b)])
231 ms ± 5.81 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

necomancer

Sep 27, 2018

加上 vectorize 的：

In [28]: def mydot(a, b):
...: return np.dot(a,b)
...:
...:

In [29]:

In [29]: vmydot = np.vectorize(mydot, signature='(n,m),(m,n)->(n,n)')

In [30]: e = vmydot(a, b)

In [31]: np.allclose(d,e)
Out[31]: True

In [32]: %timeit vmydot(a,b)
467 ms ± 4.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

xinhangliu

Sep 27, 2018 via Android

@necomancer 非常感谢！看来我还是学艺不精

necomancer

Sep 28, 2018

不客气。我很久没鼓捣过这玩意儿了，我记得 numba 的各路 jit 处理这个问题是很嗨的，你要是经常遇到这种问题，去看看 numba 吧，有 GPU 加速更开心 ^_^

In [1]: from numba import guvectorize, float64

In [2]: a = np.random.random((100000,2,2))

In [3]: b = np.random.random((100000,2,2))

In [4]: c = np.matmul(a, b)

In [5]: d = np.array([np.dot(i,j) for i,j in zip(a,b)])

In [6]: @guvectorize([(float64[:,:],float64[:,:], float64[:,:])], '(n,m),(m,n)->(n,n)', target='parallel') #target='cpu','gpu'
...: def mydot(a,b,res):
...: for i in range(res.shape[0]):
...: for j in range(res.shape[1]):
...: tmp = 0.
...: for k in range(a.shape[1]):
...: tmp += a[i, k] * b[k, j]
...: res[i, j] = tmp
...:

In [7]: e = mydot(a, b)

In [8]: np.allclose(c,e)
Out[8]: True

In [9]: np.allclose(c,d)
Out[9]: True

In [10]: %timeit mydot(a,b)
234 µs ± 4.02 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [11]: %timeit np.array(list(map(np.dot, a, b)))
210 ms ± 2.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [12]: %timeit np.array([np.dot(i,j) for i, j in zip(a,b)])
235 ms ± 5.87 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [13]: %timeit np.matmul(a,b)
41.1 ms ± 90 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Vanquish5419

Sep 28, 2018 via iPhone

no.tensordot

Vanquish5419

Sep 28, 2018 via iPhone

@zhucegeqiu 打错，np.tensordot