Image Processing 100 Knocks Answers
Q1. Channel Swap
Read an image and convert it from BGR to RGB.
Answer
import cv2
img = cv2.imread("./img/imori.jpeg")
rgb_img = img[:, :, [2,1,0]].copy()
This can be done using numpy array operations.
cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
can also perform the conversion.
Q2. Grayscale Conversion
Expressed as Y= 0.2126 R + 0.7152 G + 0.0722 B.
Answer
img = cv2.imread("./img/imori.jpeg")
gray_img = img[:,:,0] * 0.0722 + img[:,:,1] * 0.7152 + img[:,:,2] * 0.2126
Using OpenCV functionality:
cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
can perform the conversion. It could also be done with matrix computation.
Q3. Binarization
Binarize a grayscale image. The threshold is 128.
Answer
img = cv2.imread("./img/imori.jpeg")
gray_img = img[:,:,0] * 0.0722 + img[:,:,1] * 0.7152 + img[:,:,2] * 0.2126
ret, gray_img = cv2.threshold(gray_img,128,255,cv2.THRESH_BINARY)
Applying conditional branching to each pixel individually would be computationally expensive.
Q4. Otsu's Binarization
A method that automatically determines the threshold for binarization. The goal is to maximize the inter-class variance.
ret, gray_img = cv2.threshold(gray_img,0,255,cv2.THRESH_OTSU)
did not work. The cause was the preceding step:
gray_img = img[:,:,0] * 0.0722 + img[:,:,1] * 0.7152 + img[:,:,2] * 0.2126
It does not work when gray_img is of type float.
Answer
gray_img = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
ret, th = cv2.threshold(gray_img, 0, 255, cv2.THRESH_OTSU)
Q5. HSV Conversion
HSV conversion is a method of representing colors using Hue, Saturation, and Value (brightness).
hsv_img = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
hsv_img[:,:,0] = (hsv_img[:,:,0] + 180) % 360
hsv_img = cv2.cvtColor(hsv_img, cv2.COLOR_HSV2BGR)
The colors seem slightly off. Apparently the Hue range is [0:179].
hsv[:,:,0] = (hsv[:,:,0] + 90) % 180
Still different. Implemented without using OpenCV.
Answer
import cv2
import numpy as np
img = cv2.imread("./img/imori.jpeg")
h,w,c = img.shape
hsv = img / 255
for i in range(w):
for j in range(h):
b,g,r = img[i,j,:] / 255
max_val = max(b,g,r)
min_val = min(b,g,r)
val = max_val
sat = max_val - min_val
if max_val == min_val:
hue = 0
elif min_val == b:
hue = 60 * (g-r) / sat + 60
elif min_val == r:
hue = 60 * (b-g) / sat + 180
else:
hue = 60 * (r-b) / sat + 300
# print(hsv[i,j,:])
hsv[i,j,:] = [hue,sat,val]
# print(hsv[i,j,:])
cv2.imwrite("./img/hsv_moto.jpeg",hsv)
hsv[:,:,0] = (hsv[:,:,0] + 180) % 360
revers_img = img/255
for i in range(w):
for j in range(h):
hue,sat,val = hsv[i,j,:]
c = sat
h_dot = hue / 60
x = c * (1 - abs(h_dot % 2 - 1))
if (0 <= h_dot) & (h_dot < 1):
add_h = [c,x,0]
elif (1<= h_dot) & (h_dot < 2):
add_h = [x,c,0]
elif (2<= h_dot) & (h_dot < 3):
add_h = [0,c,x]
elif (3<= h_dot) & (h_dot < 4):
add_h = [0,x,c]
elif (4<= h_dot) & (h_dot < 5):
add_h = [x,0,c]
elif (5<= h_dot) & (h_dot < 6):
add_h = [c,0,x]
else:
add_h = [0,0,0]
revers_img[i,j,:] = np.multiply([1,1,1], (val - c)) + add_h
revers_img = revers_img * 255
revers_img = revers_img[:, :, [2,1,0]]
cv2.imwrite("./img/hsv.jpeg",revers_img)
Need to be more conscious of the height, width, channel ordering. It would be nice to make the code shorter. There are too many if branches.
Q6. Color Reduction
Reduce colors so that R, G, B are each one of 224 (4 values each).
Answer
img = cv2.imread("./img/imori.jpeg")
img = (img // 64 + 1) * 64 - 32
cv2.imwrite("./img/result_img.jpeg",img)
Implemented without using if statements.
Q7. Average Pooling
Divide the image into a grid (partition into fixed-size regions) and fill each region (cell) with the average value of its pixels. imori.jpg is 128x128, so divide into 8x8 grids and apply average pooling.
Answer
import cv2
import numpy as np
img = cv2.imread("./img/imori.jpeg")
def average_pooling(img,karnel):
pool_img = img.copy()
height,width,channel = img.shape
for i in range(0,height,karnel[0]):
for j in range(0,width,karnel[1]):
ave = np.mean(img[i:i+karnel[0],j:j+karnel[1],:],axis = 0)
ave = np.mean(ave,axis = 0)
pool_img[i:i+karnel[0],j:j+karnel[1],:] = ave
return pool_img
kar = (8,8)
img = average_pooling(img,kar)
cv2.imwrite("./img/pool_img.jpeg",img)
I ended up using two for loops. I could not think of a way to reduce them further. Also, I computed the row and column averages separately in two steps; it would have been better to find a way to do it in one step.
def average_pooling(img,karnel)
img is the image, karnel is the grid division range.
Q8. Max Pooling
Apply pooling using the maximum value instead of the average.
Answer
import cv2
import numpy as np
img = cv2.imread("./img/imori.jpeg")
def average_pooling(img,karnel):
pool_img = img.copy()
height,width,channel = img.shape
for i in range(0,height,karnel[0]):
for j in range(0,width,karnel[1]):
ave = np.max(img[i:i+karnel[0],j:j+karnel[1],:],axis = 0)
ave = np.max(ave,axis = 0)
pool_img[i:i+karnel[0],j:j+karnel[1],:] = ave
return pool_img
kar = (8,8)
img = average_pooling(img,kar)
cv2.imwrite("./img/poolmax_img.jpeg",img)
Simply changed the averaging part to compute the maximum value.
Q9. Gaussian Filter
Implement a Gaussian filter (3x3, standard deviation 1.3) and remove noise from imori_noise.jpg.
A Gaussian filter smooths the surrounding pixels of a target pixel using Gaussian distribution weights, defined by the following formula. Such weights are called a kernel or filter.
Answer
import cv2
import numpy as np
img = cv2.imread("./img/imori_noise.jpeg")
def gausu_filter(img,karnel,sigma):
height,width,channel = img.shape
pad = karnel // 2
pad_img = np.zeros((height + pad * 2,width + pad * 2, channel))
pad_img[pad:pad+height,pad:pad+width] = img
weight = gausu(sigma,karnel,pad)
gausu_img = img.copy()
for i in range(height):
for j in range(width):
gausu_img[i,j,0] = np.sum(pad_img[i:i+pad*2+1,j:j+pad*2+1,0]*weight)
gausu_img[i,j,1] = np.sum(pad_img[i:i+pad*2+1,j:j+pad*2+1,1]*weight)
gausu_img[i,j,2] = np.sum(pad_img[i:i+pad*2+1,j:j+pad*2+1,2]*weight)
print(gausu_img)
return gausu_img
def gausu(sigma,karnel,pading):
filt = np.zeros((karnel,karnel))
for x in range(pading * -1, pading + 1):
for y in range(pading * -1, pading + 1):
print(x,y)
filt[x+pading,y+pading] = 1 / (2*np.pi*sigma*sigma) * np.exp((-1 * (x*x + y*y))/(2 * (sigma**2)))
filt /= filt.sum()
return filt
kar = 3
sig = 1.3
img = gausu_filter(img,kar,sig)
cv2.imwrite("./img/gausu_img.jpeg",img)
Used a separate function to create the filter.
Q10. Median Filter
Implement a median filter (3x3) and remove noise from imori_noise.jpg. This filter outputs the median value within a 3x3 region around the target pixel. Apply zero padding as well.
Used this site as a reference to write with fewer for loops.
Answer
import numpy as np
import cv2
from numpy.lib.stride_tricks import as_strided
def pool2d(A, kernel_size, stride, padding, pool_mode='max'):
'''
2D Pooling
Parameters:
A: input 2D array
kernel_size: int, the size of the window
stride: int, the stride of the window
padding: int, implicit zero paddings on both sides of the input
pool_mode: string, 'max' or 'avg'
'''
# Padding
A = np.pad(A, ((padding,padding),(padding,padding),(0,0)), mode='constant')
# Window view of A
output_shape = ((A.shape[0] - kernel_size)//stride + 1,
(A.shape[1] - kernel_size)//stride + 1,
A.shape[2])
kernel_size = (kernel_size, kernel_size)
A_w = as_strided(A, shape = output_shape + kernel_size,
strides = (stride*A.strides[0],
stride*A.strides[1],
stride*A.strides[2]
) + A.strides[0:2])
A_w = A_w.reshape(-1, *kernel_size)
# Return the result of pooling
if pool_mode == 'max':
return A_w.max(axis=(1,2)).reshape(output_shape)
elif pool_mode == 'avg':
return A_w.mean(axis=(1,2)).reshape(output_shape)
elif pool_mode == "med":
return np.median(A_w,axis=(1,2)).reshape(output_shape)
img = cv2.imread("./img/imori_noise.jpeg")
height,width,channel = img.shape
karn = 3
padding = karn // 2
medhian = pool2d(img, kernel_size=karn, stride=1, padding=padding, pool_mode='med')
cv2.imwrite("./img/medhian_img.jpeg",medhian)
By referencing the site, I was able to implement it without using for loops. However, I do not fully understand the role of the strides argument in as_strided. Based on research, it appears to represent memory stride distances.
strides = (stride*A.strides[0],stride*A.strides[1],stride*A.strides[2]) + A.strides[0:2])
has a shape of
(390, 3, 1, 390, 3)
The first part (390, 3, 1) represents (height, width, channel), and it seems like the (height, width) dimensions are appended again?
Q11. Smoothing Filter
Implement a smoothing filter (3x3).
A smoothing filter outputs the average of the pixel values within the filter window.
Answer
import numpy as np
import cv2
from numpy.lib.stride_tricks import as_strided
def pool2d(A, kernel_size, stride, padding, pool_mode='max'):
'''
2D Pooling
Parameters:
A: input 2D array
kernel_size: int, the size of the window
stride: int, the stride of the window
padding: int, implicit zero paddings on both sides of the input
pool_mode: string, 'max' or 'avg'
'''
# Padding
A = np.pad(A, ((padding,padding),(padding,padding),(0,0)), mode='constant')
# Window view of A
output_shape = ((A.shape[0] - kernel_size)//stride + 1,
(A.shape[1] - kernel_size)//stride + 1,
A.shape[2])
kernel_size = (kernel_size, kernel_size)
print((stride*A.strides[0],stride*A.strides[1],stride*A.strides[2]) + A.strides[0:2])
A_w = as_strided(A, shape = output_shape + kernel_size,
strides = (stride*A.strides[0],
stride*A.strides[1],
stride*A.strides[2]
) + A.strides[0:2])
A_w = A_w.reshape(-1, *kernel_size)
# Return the result of pooling
if pool_mode == 'max':
return A_w.max(axis=(1,2)).reshape(output_shape)
elif pool_mode == 'avg':
return A_w.mean(axis=(1,2)).reshape(output_shape)
elif pool_mode == "med":
return np.median(A_w,axis=(1,2)).reshape(output_shape)
img = cv2.imread("./img/imori.jpeg")
height,width,channel = img.shape
karn = 3
padding = karn // 2
mean = pool2d(img, kernel_size=karn, stride=1, padding=padding, pool_mode='avg')
cv2.imwrite("./img/mean_img.jpeg",mean)
Simply changed the last part of the median filter to use the average.
Q12. Motion Filter
Implement a motion filter (3x3).
A motion filter computes the average along the diagonal direction and is defined by the following formula:
[[1/3,0,0]
[0,1/3,0]
[0,0,1/3]]
Answer
import numpy as np
import cv2
from numpy.lib.stride_tricks import as_strided
def pool2d(A, kernel_size, stride, padding, pool_mode='max'):
'''
2D Pooling
Parameters:
A: input 2D array
kernel_size: int, the size of the window
stride: int, the stride of the window
padding: int, implicit zero paddings on both sides of the input
pool_mode: string, 'max' or 'avg'
'''
# Padding
A = np.pad(A, ((padding,padding),(padding,padding),(0,0)), mode='constant')
# Window view of A
output_shape = ((A.shape[0] - kernel_size)//stride + 1,
(A.shape[1] - kernel_size)//stride + 1,
A.shape[2])
kernel_size = (kernel_size, kernel_size)
print((stride*A.strides[0],stride*A.strides[1],stride*A.strides[2]) + A.strides[0:2])
A_w = as_strided(A, shape = output_shape + kernel_size,
strides = (stride*A.strides[0],
stride*A.strides[1],
stride*A.strides[2]
) + A.strides[0:2])
A_w = A_w.reshape(-1, *kernel_size)
weight = [[1/3,0,0],[0,1/3,0],[0,0,1/3]]
weight = np.array(weight).reshape(-1,3,3)
# Return the result of pooling
if pool_mode == 'max':
return A_w.max(axis=(1,2)).reshape(output_shape)
elif pool_mode == 'avg':
return A_w.mean(axis=(1,2)).reshape(output_shape)
elif pool_mode == "med":
return np.median(A_w,axis=(1,2)).reshape(output_shape)
elif pool_mode == "motion":
return np.sum(A_w*weight,axis = (1,2)).reshape(output_shape)
img = cv2.imread("./img/imori.jpeg")
height,width,channel = img.shape
karn = 3
padding = karn // 2
motion = pool2d(img, kernel_size=karn, stride=1, padding=padding, pool_mode='motion')
print(motion.shape)
cv2.imwrite("./img/motion_img.jpeg",motion)
Created the weights and multiplied each element by the corresponding value.
Q13. MAX-MIN Filter
The MAX-MIN filter outputs the difference between the maximum and minimum pixel values within the filter window and is one of the edge detection filters.
Answer
import numpy as np
import cv2
from numpy.lib.stride_tricks import as_strided
def pool2d(A, kernel_size, stride, padding, pool_mode='max'):
'''
2D Pooling
Parameters:
A: input 2D array
kernel_size: int, the size of the window
stride: int, the stride of the window
padding: int, implicit zero paddings on both sides of the input
pool_mode: string, 'max' or 'avg'
'''
# Padding
A = np.pad(A, ((padding,padding),(padding,padding)), mode='constant')
# Window view of A
output_shape = ((A.shape[0] - kernel_size)//stride + 1,
(A.shape[1] - kernel_size)//stride + 1)
kernel_size = (kernel_size, kernel_size)
# print((stride*A.strides[0],stride*A.strides[1],stride*A.strides[2]) + A.strides[0:2])
A_w = as_strided(A, shape = output_shape + kernel_size,
strides = (stride*A.strides[0],
stride*A.strides[1],
) + A.strides)
A_w = A_w.reshape(-1, *kernel_size)
weight = [[1/3,0,0],[0,1/3,0],[0,0,1/3]]
weight = np.array(weight).reshape(-1,3,3)
# Return the result of pooling
if pool_mode == 'max':
return A_w.max(axis=(1,2)).reshape(output_shape)
elif pool_mode == 'min':
return A_w.min(axis=(1,2)).reshape(output_shape)
elif pool_mode == 'avg':
return A_w.mean(axis=(1,2)).reshape(output_shape)
elif pool_mode == "med":
return np.median(A_w,axis=(1,2)).reshape(output_shape)
elif pool_mode == "motion":
return np.sum(A_w*weight,axis = (1,2)).reshape(output_shape)
elif pool_mode == "max_min":
max_pool = A_w.max(axis=(1,2)).reshape(output_shape)
min_pool = A_w.min(axis=(1,2)).reshape(output_shape)
return max_pool - min_pool
img = cv2.imread("./img/imori.jpeg")
img = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
height,width= img.shape
karn = 3
padding = karn // 2
max_min = pool2d(img, kernel_size=karn, stride=1, padding=padding, pool_mode='max_min')
print(max_min.shape)
cv2.imwrite("./img/min_max_img.jpeg",max_min)
I converted to grayscale beforehand, but would the result be the same if I created it from a BGR image and then converted to grayscale?
Q14. Differential Filter
Implement a differential filter (3x3).
A differential filter extracts edges where rapid intensity changes occur by computing differences between adjacent pixels.
Vertical filter
[[0,0,0]
[-1,1,0]
[0,0,0]]
Horizontal filter
[[0,-1,0]
[0,1,0]
[0,0,0]]
Answer
import numpy as np
import cv2
from numpy.lib.stride_tricks import as_strided
def pool2d(A, kernel_size, stride, padding, pool_mode='max'):
'''
2D Pooling
Parameters:
A: input 2D array
kernel_size: int, the size of the window
stride: int, the stride of the window
padding: int, implicit zero paddings on both sides of the input
pool_mode: string, 'max' or 'avg'
'''
# Padding
A = np.pad(A, ((padding,padding),(padding,padding),(0,0)), mode='constant')
# Window view of A
output_shape = ((A.shape[0] - kernel_size)//stride + 1,
(A.shape[1] - kernel_size)//stride + 1,
A.shape[2])
kernel_size = (kernel_size, kernel_size)
print((stride*A.strides[0],stride*A.strides[1],stride*A.strides[2]) + A.strides[0:2])
A_w = as_strided(A, shape = output_shape + kernel_size,
strides = (stride*A.strides[0],
stride*A.strides[1],
stride*A.strides[2]
) + A.strides[0:2])
A_w = A_w.reshape(-1, *kernel_size)
weight = [[1/3,0,0],[0,1/3,0],[0,0,1/3]]
weight = np.array(weight).reshape(-1,3,3)
weight_w = [[0,0,0],[-1,1,0],[0,0,0]]
weight_h = [[0,-1,0],[0,1,0],[0,0,0]]
weight_w = np.array(weight_w).reshape(-1,3,3)
weight_h = np.array(weight_h).reshape(-1,3,3)
# Return the result of pooling
if pool_mode == 'max':
return A_w.max(axis=(1,2)).reshape(output_shape)
elif pool_mode == 'min':
return A_w.min(axis=(1,2)).reshape(output_shape)
elif pool_mode == 'avg':
return A_w.mean(axis=(1,2)).reshape(output_shape)
elif pool_mode == "med":
return np.median(A_w,axis=(1,2)).reshape(output_shape)
elif pool_mode == "motion":
return np.sum(A_w*weight,axis = (1,2)).reshape(output_shape)
elif pool_mode == "max_min":
max_pool = A_w.max(axis=(1,2)).reshape(output_shape)
min_pool = A_w.min(axis=(1,2)).reshape(output_shape)
return max_pool - min_pool
elif pool_mode == "diff_w":
return np.sum(A_w*weight_w,axis = (1,2)).reshape(output_shape)
elif pool_mode == "diff_h":
return np.sum(A_w*weight_h,axis = (1,2)).reshape(output_shape)
img = cv2.imread("./img/imori.jpeg")
height,width,channel = img.shape
karn = 3
padding = karn // 2
diff_w_img = pool2d(img, kernel_size=karn, stride=1, padding=padding, pool_mode='diff_w')
diff_h_img = pool2d(img, kernel_size=karn, stride=1, padding=padding, pool_mode='diff_h')
cv2.imwrite("./img/diff_h_img.jpeg",diff_h_img)
cv2.imwrite("./img/diff_w_img.jpeg",diff_w_img)
I found that separating the filters makes the processing straightforward.
Q20. Histogram Display
Display the histogram of imori_dark.jpg using matplotlib.
Answer
import numpy as np
import cv2
import matplotlib.pyplot as plt
img = cv2.imread("./img/imori_dark.jpeg")
gaso = np.array(img).flatten()
plt.hist(gaso,bins=255,range=(0,255),rwidth=0.8)
plt.show()
The data must be flattened to 1D before creating a histogram.
Q21. Histogram Normalization
Implement histogram normalization. Convert an image with pixel values in the range [c, d] to the range [a, b].
Answer
import numpy as np
import cv2
import matplotlib.pyplot as plt
def gray_scale_trans(img,a=0,b=255):
out = img.copy()
c = img.min()
d = img.max()
out = (b-a)/(d-c)*(out-c)+a
np.where(out < a, a, out)
np.where(b < out, b, out)
return out
img = cv2.imread("./img/imori_dark.jpeg")
trans_img = gray_scale_trans(img)
gaso = np.array(trans_img).flatten()
plt.hist(gaso,bins=255,range=(0,255),rwidth=0.8)
cv2.imwrite("./img/trans_img.jpeg",trans_img)
plt.show()
Implemented using np.where. Expressed histogram normalization as a function.
Q.22 Histogram Manipulation
Manipulate the histogram so that the mean becomes m0=128 and the standard deviation becomes s0=52. This is an operation to flatten the histogram.
Answer
import numpy as np
import cv2
import matplotlib.pyplot as plt
def hist_heitan(img,m0=128,s0=52):
out = img.copy()
s = np.std(img)
m = np.average(img)
out = s0 / s * (out - m) + m0
return out
img = cv2.imread("./img/imori_dark.jpeg")
trans_img = hist_heitan(img)
gaso = np.array(trans_img).flatten()
plt.hist(gaso,bins=255,range=(0,255),rwidth=0.8)
cv2.imwrite("./img/trans_img_1.jpeg",trans_img)
plt.show()
Implemented a function to flatten the histogram.
Q.23 Histogram Equalization
Implement histogram equalization. Histogram equalization is an operation that flattens the histogram without requiring values like the mean or standard deviation -- it equalizes the histogram values.
import numpy as np
import cv2
import matplotlib.pyplot as plt
def hist_heitan_function(img, z_max=255):
out = img.copy()
height, width, channel = img.shape
S = height * width * channel
sum_h = 0
for i in range(1, 255):
ind = np.where(img == i)
sum_h += len(img[ind])
z_prime = z_max / S * sum_h
out[ind] = z_prime
return out
img = cv2.imread("./img/imori.jpeg")
trans_img = hist_heitan_function(img)
gaso = np.array(trans_img).flatten()
plt.hist(gaso, bins=255, range=(0, 255), rwidth=0.8)
cv2.imwrite("./img/trans_img_3.jpeg", trans_img)
plt.show()
I tried to eliminate the for loop using np.where but could not get it to work.