cs131 hw2
part 1 canny Edge Detector
- canny 算子可以分为以下五步:
- Smoothing
- Finding gradients
- Non-maximum supperession
- Double thredsholding
- Edge tracking by hysterisis
1.1 Smoothing
平滑一张图片,可以使用(Gaussian kernel)高斯核来卷积;
def conv(image, kernel):
Hi, Wi = image.shape
Hk, Wk = kernel.shape
out = np.zeros((Hi, Wi))
.
pad_width0 = Hk // 2
pad_width1 = Wk // 2
pad_width = ((pad_width0,pad_width0),(pad_width1,pad_width1))
padded = np.pad(image, pad_width, mode='edge')
# 卷积过程
kernel = np.flip(np.flip(kernel, 0), 1) # 上下翻转,在左右翻转
for i in range(Hi):
for j in range(Wi):
out[i, j] = np.sum(padded[i:(i + Hk), j:(j + Wk)] * kernel)
return out
def gaussian_kernel(size, sigma):
kernel = np.zeros((size, size))
# size = 2k+1
k = size // 2
for i in range(size):
for j in range(size):
kernel[i, j] = np.exp(-(((i-k)**2 + (j-k)**2))/2*sigma**2) / (2 * np.pi * (sigma**2))
return kernel
输出为:
- [问题] What is the effect of the kernel_size and sigma?
- 高斯核的sigma和size越大,就越模糊;
1.2 Finding gradients
1.2.1 偏导数
- 图片的偏导数可以计算为:
- 对于一张图片,我们需要给图片添加一层为0 的padding:
def partial_x(img):
out = None
# Hi, Wi = img.shape
# padd = np.zeros((Hi, Wi+2))
# padd[:, 1:Wi+1] = img
#
# out = np.zeros((Hi, Wi))
# for i in range(Wi):
# out[:, i] = (padd[:, i+2] - padd[:, i]) / 2
# 使用卷积
kernel = np.array([[1, 0, -1]]) / 2
out = conv(img, kernel)
return out
def partial_y(img):
out = None
# Hi, Wi = img.shape
# padd = np.zeros((Hi + 2, Wi))
# padd[1:Hi+1, :] = img
#
# out = np.zeros((Hi, Wi))
# for i in range(Hi):
# out[i, :] = (padd[i+2, :] - padd[i, :]) / 2
kernel = np.array([[1, 0, -1]]).T / 2
out = conv(img, kernel)
return out
结果为:
- [问题]:What is the reason for performing smoothing prior to computing the gradients?
- 在计算导数之前先进行smoothing,是为了降低噪声;
1.2.2 导数
- Hint: Use np.arctan2 to compute
- the “y-coordinate” is the first function parameter, the “x-coordinate” is the second.)
def gradient(img):
G = np.zeros(img.shape)
theta = np.zeros(img.shape)
gx = partial_x(img)
gy = partial_y(img)
G = np.sqrt(gx**2 + gy**2)
theta = np.arctan2(gy, gx) # (-pi/2, pi/2)
theta = (np.rad2deg(np.arctan2(gy, gx)) + 180) % 360
return G, theta
1.3 Non - maximum suppression
- 在八个方向上[0,45,90,135,180,225,270,315]比较;
def non_maximum_suppression(G, theta):
H, W = G.shape
out = np.zeros((H, W))
# Round the gradient direction to the nearest 45 degrees
theta = np.floor((theta + 22.5) / 45) * 45 # 方向定位
# 添加一层padding
padd = np.zeros((H+2,W +2))
padd[1:H+1, 1:W+1] = G
for m in range(1, H+1):
for n in range(1, W+1):
# 题目定义为顺时针方向,和逆时针相反,y方向相反
rad = np.deg2rad(theta[m-1, n-1])
i =int(np.around(np.sin(rad))) # 行
j =int(np.around(np.cos(rad))) # 列
p1 = padd[m+i, n+j]
p2 = padd[m-i, n-j]
if(padd[m, n] > p1 and padd[m, n] > p2): # 一个方向上
out[m-1, n-1] = padd[m, n]
else:
out[m-1, n-1] = 0
return out
结果为:
Thetas: 0
[[0. 0. 0.6]
[0. 0. 0.7]
[0. 0. 0.6]]
Thetas: 45
[[0. 0. 0.6]
[0. 0. 0.7]
[0.4 0.5 0.6]]
Thetas: 90
[[0.4 0. 0. ]
[0. 0. 0.7]
[0.4 0. 0. ]]
Thetas: 135
[[0.4 0.5 0.6]
[0. 0. 0.7]
[0. 0. 0.6]]
1.4 Double Thresholding
- 阈值假设:
- 定义两个阈值:low High
- 如果 小于low,不是一个边
- 如果大于High, 就是一条强边
- 如果在low和High之间,弱边
使用 np.where
:np.where
def double_thresholding(img, high, low):
strong_edges = np.zeros(img.shape)
weak_edges = np.zeros(img.shape)
strong_edges = np.where(img>=high,1,0)
weak_edges = np.where(img>=low,1,0)
weak_edges = weak_edges-strong_edges
return strong_edges, weak_edges
1.5 Edge tracking
- strong edges : 确定的边
- weak edges: 只有和strong edges相连的边才是正确的边, 其他的就是噪点;
def get_neighbors(y, x, H, W):
# 返回(y,x)周围等于(y,x)的邻居的坐标列表[(i,j),(i2,j2)..]
neighbors = []
for i in (y-1, y, y+1):
for j in (x-1, x, x+1):
if i >= 0 and i < H and j >= 0 and j < W:
if (i == y and j == x):
continue
neighbors.append((i, j))
return neighbors
def link_edges(strong_edges, weak_edges):
# 寻找强边像素相连的弱边像素
H, W = strong_edges.shape
indices = np.stack(np.nonzero(strong_edges)).T
edges = np.zeros((H, W))
### YOUR CODE HERE
edges = np.copy(strong_edges)
for i in range(1, H-1):
for j in range(1, W-1):
neighbors = get_neighbors(j, i, H, W)
if weak_edges[i, j] and np.any(edges[x, y] for x, y in neighbors):
edges[i, j] = True
### END YOUR CODE
return edges
1.6 Canny edge detector
- 实现canny算子
def canny(img, kernel_size=5, sigma=1.4, high=20, low=15):
### YOUR CODE HERE
kernel = gaussian_kernel(kernel_size, sigma) # 1. smoothing
smoothed = conv(img,kernel)
G, theta = gradient(smoothed) # 2. 梯度计算
nms = non_maximum_suppression(G,theta) # 3. non-maximum_suppression
strong_edges, weak_edges = double_thresholding(nms, low,high) #double thresholding
edge = link_edges(strong_edges,weak_edges) # link
### END YOUR CODE
return edge
Part 2 Lane Detection
- 检测平面
算法: - 使用canny算子检测边
- 提取兴趣区域的边
- 运行Hough transform 来检测平面
2.1 边检测
2.2 Extracting region of interest (ROI)
- 提取兴趣区域
- 使用一个二位掩码来定义兴趣区域;然后提取兴趣区域的边;
2.3 Fitting lines using Hough transform
- 边检测的输出为连接点的集合;
使用hough变换来检测一条直线;
直线 y =ax + b 被表示为:(a,b),这种方式不能表示垂直线段;
- 因此,使用极方程表示:
def hough_transform(img):
""" Transform points in the input image into Hough space.
Use the parameterization:
rho = x * cos(theta) + y * sin(theta)
to transform a point (x,y) to a sine-like function in Hough space.
Args:
img: binary image of shape (H, W)
Returns:
accumulator: numpy array of shape (m, n)
rhos: numpy array of shape (m, )
thetas: numpy array of shape (n, )
"""
# Set rho and theta ranges
W, H = img.shape
diag_len = int(np.ceil(np.sqrt(W * W + H * H))) # 对角线长度,np.ceil 向大取整
rhos = np.linspace(-diag_len, diag_len, diag_len * 2.0 + 1) # 2倍,等差数列
thetas = np.deg2rad(np.range(-90.0, 90.0))
# Cache some reusable values
cos_t = np.cos(thetas)
sin_t = np.sin(thetas)
num_thetas = len(thetas)
# Initialize accumulator in the Hough space
accumulator = np.zeros((2 * diag_len + 1, num_thetas), dtype=np.uint64)
ys, xs = np.nonzero(img)
# Transform each point (x, y) in image
# Find rho corresponding to values in thetas
# and increment the accumulator in the corresponding coordiate.
### YOUR CODE HERE
for i, j in zip(ys, xs):
for idx in range(thetas.shape[0]):
r = j * cos_t[idx] + i * sin_t[idx]
accumulator[int(r + diag_len), idx] += 1
### END YOUR CODE
return accumulator, rhos, thetas
结果为: