cs131 hw3

内容

Harries corner detector
RANSAC
HoG descriptor

相关： Panaorama Stitching(全景拼接)

全景图片的拼接过程一般如下：

使用Harries detector寻找关键点
建立每个关键点的descriptor, 比较两幅图片的descriptor，寻找keypoints对
在keypoints对上，使用 least-squares method求解仿设变换矩阵
使用RANSAC优化矩阵，然后对多幅图片进行变换，得到全景

额外的任务：

实现HoG descriptor

1. Harris Corner Detector

harris 的基本算法分为如下几步：



def harris_corners(img, window_size=3, k=0.04):
    H, W = img.shape
    window = np.ones((window_size, window_size))

    response = np.zeros((H, W))
    #第一步： 偏导数
    dx = filters.sobel_v(img)
    dy = filters.sobel_h(img)

    # 第二步： 偏导数乘积
    dxx = dx * dx
    dyy = dy * dy
    dxy = dx * dy

    # 第三步： 形成矩阵
    mxx = convolve(dxx, window)
    mxy = convolve(dxy, window)
    myy = convolve(dyy, window)      #加权计算

    # 第四步： 计算response
    for i in range(H):
        for j in range(W):
            M = np.array([[mxx[i, j], mxy[i, j]], [mxy[i, j], myy[i, j]]])
            response[i, j] = np.linalg.det(M) - k * np.trace(M) ** 2

    return response

2. Describing and Matching Keypoints

2.1 Descriptors 创建

函数： simple_descriptor()
参数：

patch: 灰度图片的一个patch，大小（H,W）

返回值：

features： 1D的数组(H * W)

实现：

将H * W 的patch排列为1D，然后归一化,采用高斯核（x - u）/delta

def simple_descriptor(patch):
    feature = []

    patch = patch.reshape(-1)
    mean = np.mean(patch)       # 均值
    delta = np.std(patch)       # 标准差
    if delta > 0.0:
        patch = (patch - mean) / delta
    else:
        patch = patch - mean

    feature = list(patch)
    return feature

2.2 descriptors匹配

通过计算两个descriptors的欧式距离来比较
点A：最近点B,第二近点C
如果（dist(A-B)/ dist(A-C)）小于某个阈值，说明A与B为最佳配对

函数：match_descriptors(desc1, desc2, threshold=0.5)
参数：

desc1,desc2: 两个描述符数组
thresh：阈值

返回值：

matches:配对点

def match_descriptors(desc1, desc2, threshold=0.5):
    matches = []

    N = desc1.shape[0]
    dists = cdist(desc1, desc2)     # 每个向量的欧式距离 N * N

    idx = np.argsort(dists, axis=1)     # 从小到大对dist排序，返回序号， N * N

    for i in range(N):
        closed_dist = dists[i, idx[i, 0]]
        second_dist = dists[i, idx[i, 1]]
        if(closed_dist < threshold * second_dist):      # 比较
            matches.append([i, idx[i, 0]])

    matches = np.array(matches)
    return matches

3. Transformation Estimation

-通过匹配点来估计两个图片的仿射变换，使用最小二乘法来计算

函数： fit_affine_matrix(p1, p2)

参数：

p1, p2: 两组对应点

返回值：

H：仿射变换



def fit_affine_matrix(p1, p2):

    assert (p1.shape[0] == p2.shape[0]),\
        'Different number of points in p1 and p2'
    p1 = pad(p1)
    p2 = pad(p2)        # 齐次矩阵

    H = np.linalg.lstsq(p2, p1)[0]
    # Sometimes numerical issues cause least-squares to produce the last
    # column which is not exactly [0, 0, 1]
    H[:,2] = np.array([0, 0, 1])
    return H

4. RANSAC 优化匹配

一般步骤为：

选择随机的匹配对
计算仿射变换矩阵
寻找RANSAC拟合的inliers
重复直到找到包含inliers最多的模型
重新计算最小二乘误差在内点

函数： ransac(keypoints1, keypoints2, matches, n_iters=200, threshold=20)

参数：

keypoints1, keypoints2: 关键点
matches：对应点序号
n_iters: 迭代次数
thresh: 阈值

返回值：

H : 仿射变换矩阵

实现：

RANSAC寻找的就是使得配对点数尽可能多的H。
首先，随机挑选对应对，使用最小二乘求解一个变换h,计算通过当前h，能够得到的最大对应点的数量
然后，迭代n_iters次，选择最大对应点数量的变换矩阵矩阵H
另一种实现方式，可以查看[cs231A_Homework_3.2]

def ransac(keypoints1, keypoints2, matches, n_iters=200, threshold=20):

    # Copy matches array, to avoid overwriting it
    orig_matches = matches.copy()
    matches = matches.copy()

    N = matches.shape[0]
    print(N)
    n_samples = int(N * 0.2)                            # 随机取样

    matched1 = pad(keypoints1[matches[:, 0]])            #第一列的序号 齐次矩阵
    matched2 = pad(keypoints2[matches[:, 1]])            # 第二列的序号

    max_inliers = np.zeros(N)
    n_inliers = 0

    # RANSAC iteration start
    for i in range(n_iters):

        temp_max = np.zeros(N, dtype=np.int32)      # 临时变量
        temp_n = 0

        idx = np.random.choice(N, n_samples, replace=False)     # 随机抽取 n_samples
        p1 = matched1[idx, :]
        p2 = matched2[idx, :]
        H = np.linalg.lstsq(p2, p1)[0]              # 临时变换 H
        H[:, 2] = np.array([0, 0, 1])

        temp_max = np.linalg.norm(matched2.dot(H) - matched1, axis=1) ** 2 < threshold      # 计算当前对应点的数量
        temp_n = np.sum(temp_max)

        if temp_n > n_inliers:          # 保存最大数量
            max_inliers = temp_max.copy()
            n_inliers = temp_n

    H = np.linalg.lstsq(matched2[max_inliers], matched1[max_inliers])[0]
    H[:, 2] = np.array([0, 0, 1])
    return H, matches[max_inliers]

5. HoG

HoG的计算步骤一般如下：

计算图片的x,y方向上的偏导数，使用skimage.filters
将图片划分多个block,每个block划分为cells,计算每个cell的梯度直方图
flatten 每个block为特征向量
归一化

函数：hog_descriptor(patch, pixels_per_cell=(8,8))
参数：

patch : 图片(H,W)
pixels_per_cell: 每个cell的size(M,N)

返回值：

block : 一个block的特征向量（（HWn_bins） / (M*N)）
实现：
将patch划分为多个cells,计算每个cells的梯度，然后计算梯度直方图
最后归一化


def hog_descriptor(patch, pixels_per_cell=(8,8)):

    assert (patch.shape[0] % pixels_per_cell[0] == 0),\
                'Heights of patch and cell do not match'
    assert (patch.shape[1] % pixels_per_cell[1] == 0),\
                'Widths of patch and cell do not match'

    n_bins = 9
    degrees_per_bin = 180 // n_bins             # 8 个方向，每个方向 20 度

    # sobel求解梯度
    Gx = filters.sobel_v(patch)
    Gy = filters.sobel_h(patch)

    # 梯度值和方向
    G = np.sqrt(Gx**2 + Gy**2)
    theta = (np.arctan2(Gy, Gx) * 180 / np.pi) % 180

    # Group entries of G and theta into cells of shape pixels_per_cell, (M, N)
    #   G_cells.shape = theta_cells.shape = (H//M, W//N)
    #   G_cells[0, 0].shape = theta_cells[0, 0].shape = (M, N)
    G_cells = view_as_blocks(G, block_shape=pixels_per_cell)            # 划分为block
    theta_cells = view_as_blocks(theta, block_shape=pixels_per_cell)
    rows = G_cells.shape[0]
    cols = G_cells.shape[1]

    # For each cell, keep track of gradient histrogram of size n_bins
    cells = np.zeros((rows, cols, n_bins))

    # Compute histogram per cell
    for i in range(rows):
        for j in range(cols):
            for m in range(pixels_per_cell[0]):
                for n in range(pixels_per_cell[1]):
                    idx = int(theta_cells[i, j, m, n] // degrees_per_bin)       # 计算当前像素点 位于n_bins 直方图的那个区间
                    if idx == 9:    # 180 度
                        idx = 8
                    cells[i, j, idx] += G_cells[i, j, m, n]             # 统计

    cells = (cells - np.mean(cells)) / np.std(cells)        # 归一化
    block = cells.reshape(-1)

    return block

6. 图片融合

当两张图片融合的时候，可以看到明显的线条，使用一种叫做linear blending的方法平滑消除问题。

基本步骤如下：

定义 left 和 right margins
定义一个 weight matrix 对于图片1：
- 左边的权重等于1
- 从left margin 到 right margin，权重递减为 1 to 0
定一个图片2 的 weight matrix
- 在 right margin 的右边，定义 weight = 1
- 从right margin to left margin, 权重递减为 0 to 1
将 weight matrix 应用对应的图片上
合并图片

def linear_blend(img1_warped, img2_warped):
    """
    Linearly blend img1_warped and img2_warped by following the steps:

    1. Define left and right margins (already done for you)
    2. Define a weight matrices for img1_warped and img2_warped
        np.linspace and np.tile functions will be useful
    3. Apply the weight matrices to their corresponding images
    4. Combine the images

    Args:
        img1_warped: Refernce image warped into output space
        img2_warped: Transformed image warped into output space

    Returns:
        merged: Merged image in output space
    """
    out_H, out_W = img1_warped.shape # Height and width of output space
    img1_mask = (img1_warped != 0)  # Mask == 1 inside the image
    img2_mask = (img2_warped != 0) # Mask == 1 inside the image

    # Find column of middle row where warped image 1 ends
    # This is where to end weight mask for warped image 1
    # np.fliplr 左右翻转； np.argmax:最大值的下标
    right_margin = out_W - np.argmax(np.fliplr(img1_mask)[out_H//2, :].reshape(1, out_W), 1)[0] # 最大值列

    # Find column of middle row where warped image 2 starts
    # This is where to start weight mask for warped image 2
    left_margin = np.argmax(img2_mask[out_H//2, :].reshape(1, out_W), 1)[0]

    left_matrix = np.array(img1_mask,  dtype=np.float64)        # 非常重要。转换为浮点类型
    right_matrix = np.array(img2_mask, dtype=np.float64)

    # 渐进变换区域
    left_matrix[:, left_margin: right_margin] = np.tile(np.linspace(1, 0, right_margin - left_margin), (out_H, 1))
    right_matrix[:, left_margin: right_margin] = np.tile(np.linspace(0, 1, right_margin - left_margin), (out_H, 1))

    img1 = left_matrix * img1_warped
    img2 = right_matrix * img2_warped

    merged = img1 + img2

    return merged

上述方法存在一些问题，因为在两幅图片的margin判断的时候，可能没有取到边界的地方，导致结果图片中存在错误。

可以使用下面的一种方法，先计算

merged = img1_warped + img2_warped 

img1_mask = img1_mask * 1.0
img2_mask = img2_mask * 1.0


merged = img1_warped + img2_warped

overlap = (img1_mask * 1.0 + img2_mask)

### YOUR CODE HERE
#找到填充图像2的左边界，也就是列像素不全为0的位置
for col in range(img2_warped.shape[1]):
    if not np.all(img2_mask[:,col]==False):
        break
left_region = col 
right_region = img1.shape[1]
width_region = right_region - left_region + 1

for col in range(left_region, right_region +1):
    for row in range(img2_warped.shape[0]):
        alpha = 1 - (col - left_region)/width_region
        if img1_mask[row, col] and img2_mask[row, col]:
            merged[row, col] = alpha * img1_warped[row, col] + (1 - alpha) * img2_warped[row, col]

output = merged        
### END YOUR CODE

7. 全景融合

2. 结果

my_cs131_hw3

CS131-Homework-3

内容

相关： Panaorama Stitching(全景拼接)

1. Harris Corner Detector

2. Describing and Matching Keypoints

2.1 Descriptors 创建

2.2 descriptors匹配

3. Transformation Estimation

4. RANSAC 优化匹配

5. HoG

6. 图片融合

7. 全景融合

2. 结果

3. 参考资料