高性能的Python扩展（1）

392 查看

简介

通常来说，Python不是一种高性能的语言，在某种意义上，这种说法是真的。但是，随着以Numpy为中心的数学和科学软件包的生态圈的发展，达到合理的性能不会太困难。

当性能成为问题时，运行时间通常由几个函数决定。用C重写这些函数，通常能极大的提升性能。

在本系列的第一部分中，我们来看看如何使用NumPy的C API来编写C语言的Python扩展，以改善模型的性能。在以后的文章中，我们将在这里提出我们的解决方案，以进一步提升其性能。

文件

这篇文章中所涉及的文件可以在Github上获得。

模拟

作为这个练习的起点，我们将在像重力的力的作用下为N体来考虑二维N体的模拟。

以下是将用于存储我们世界的状态，以及一些临时变量的类。

# lib/sim.py

class World(object):

"""World is a structure that holds the state of N bodies and

additional variables.

threads : (int) The number of threads to use for multithreaded

implementations.

STATE OF THE WORLD:

N : (int) The number of bodies in the simulation.

m : (1D ndarray) The mass of each body.

r : (2D ndarray) The position of each body.

v : (2D ndarray) The velocity of each body.

F : (2D ndarray) The force on each body.

TEMPORARY VARIABLES:

Ft : (3D ndarray) A 2D force array for each thread's local storage.

s : (2D ndarray) The vectors from one body to all others.

s3 : (1D ndarray) The norm of each s vector.

NOTE: Ft is used by parallel algorithms for thread-local

storage. s and s3 are only used by the Python

implementation.

"""

def __init__(self, N, threads=1,

m_min=1, m_max=30.0, r_max=50.0, v_max=4.0, dt=1e-3):

self.threads = threads

self.N = N

self.m = np.random.uniform(m_min, m_max, N)

self.r = np.random.uniform(-r_max, r_max, (N, 2))

self.v = np.random.uniform(-v_max, v_max, (N, 2))

self.F = np.zeros_like(self.r)

self.Ft = np.zeros((threads, N, 2))

self.s = np.zeros_like(self.r)

self.s3 = np.zeros_like(self.m)

self.dt = dt

在开始模拟时，N体被随机分配质量m，位置r和速度v。对于每个时间步长，接下来的计算有：

合力F，每个体上的合力根据所有其他体的计算。
速度v，由于力的作用每个体的速度被改变。
位置R，由于速度每个体的位置被改变。

第一步是计算合力F，这将是我们的瓶颈。由于世界上存在的其他物体，单一物体上的力是所有作用力的总和。这导致复杂度为O（N^2）。速度v和位置r更新的复杂度都是O（N）。

如果你有兴趣，这篇维基百科的文章介绍了一些可以加快力的计算的近似方法。

纯Python

在纯Python中，使用NumPy数组是时间演变函数的一种实现方式，它为优化提供了一个起点，并涉及测试其他实现方式。

# lib/sim.py

def compute_F(w):

"""Compute the force on each body in the world, w."""

for i in xrange(w.N):

w.s[:] = w.r - w.r[i]

w.s3[:] = (w.s[:,0]**2 + w.s[:,1]**2)**1.5

w.s3[i] = 1.0 # This makes the self-force zero.

w.F[i] = (w.m[i] * w.m[:,None] * w.s / w.s3[:,None]).sum(0)

def evolve(w, steps):

"""Evolve the world, w, through the given number of steps."""

for _ in xrange(steps):

compute_F(w)

w.v += w.F * w.dt / w.m[:,None]

w.r +=span>:,None]

w.r +=Ǎ意义上，这种说法是真的。但是，随着以Numpy为中心的数学和科学软件包的生态圈的发展，达到合理的性能不会太困难。

当性能成为问题时，运行时间通常由几个函数决定。用C重写这些函数，通常能极大的提升性能。

文件

这篇文章中所涉及的文件可以在Github上获得。

模拟

作为这个练习的起点，我们将在像重力的力的作用下为N体来考虑二维N体的模拟。

以下是将用于存储我们世界的状态，以及一些临时变量的类。

# lib/sim.py

class World(object):

"""World is a structure that holds the state of N bodies and

additional variables.

threads : (int) The number of threads to use for multithreaded

implementations.

STATE OF THE WORLD:

N : (int) The number of bodies in the simulation.

m : (1D ndarray) The mass of each body.

r : (2D ndarray) The position of each body.

v : (2D ndarray) The velocity of each body.

F : (2D ndarray) The force on each body.

TEMPORARY VARIABLES:

Ft : (3D ndarray) A 2D force array for each thread's local storage.

s : (2D ndarray) The vectors from one body to all others.

s3 : (1D ndarray) The norm of each s vector.

NOTE: Ft is used by parallel algorithms for thread-local

storage. s and s3 are only used by the Python

implementation.

"""

def __init__(self, N, threads=1,

m_min=1, m_max=30.0, r_max=50.0, v_max=4.0, dt=1e-3):

self.threads = threads

self.N = N

self.m = np.random.uniform(m_min, m_max, N)

self.r = np.random.uniform(-r_max, r_max, (N, 2))

self.v = np.random.uniform(-v_max, v_max, (N, 2))

self.F = np.zeros_like(self.r)

self.Ft = np.zeros((threads, N, 2))

self.s = np.zeros_like(self.r)

self.s3 = np.zeros_like(self.m)

self.dt = dt

在开始模拟时，N体被随机分配质量m，位置r和速度v。对于每个时间步长，接下来的计算有：

合力F，每个体上的合力根据所有其他体的计算。
速度v，由于力的作用每个体的速度被改变。
位置R，由于速度每个体的位置被改变。

如果你有兴趣，这篇维基百科的文章介绍了一些可以加快力的计算的近似方法。

纯Python

在纯Python中，使用NumPy数组是时间演变函数的一种实现方式，它为优化提供了一个起点，并涉及测试其他实现方式。

# lib/sim.py

def compute_F(w):

"""Compute the force on each body in the world, w."""

for i in xrange(w.N):

w.s[:] = w.r - w.r[i]

w.s3[:] = (w.s[:,0]**2 + w.s[:,1]**2)**1.5

w.s3[i] = 1.0 # This makes the self-force zero.

w.F[i] = (w.m[i] * w.m[:,None] * w.s / w.s3[:,None]).sum(0)

def evolve(w, steps):

"""Evolve the world, w, through the given number of steps."""

for _ in xrange(steps):

compute_F(w)

w.v += w.F * w.dt / w.m[:,None]

w.r +=span>