Skip to main content

Image Processing 100 Knocks Answers

Q1. Channel Swap

Read an image and convert it from BGR to RGB.

Answer

import cv2
img = cv2.imread("./img/imori.jpeg")
rgb_img = img[:, :, [2,1,0]].copy()

This can be done using numpy array operations.

cv2.cvtColor(img,cv2.COLOR_BGR2RGB)

can also perform the conversion.

Q2. Grayscale Conversion

Expressed as Y= 0.2126 R + 0.7152 G + 0.0722 B.

Answer

img = cv2.imread("./img/imori.jpeg")
gray_img = img[:,:,0] * 0.0722 + img[:,:,1] * 0.7152 + img[:,:,2] * 0.2126

Using OpenCV functionality:

cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)

can perform the conversion. It could also be done with matrix computation.

Q3. Binarization

Binarize a grayscale image. The threshold is 128.

Answer

img = cv2.imread("./img/imori.jpeg")
gray_img = img[:,:,0] * 0.0722 + img[:,:,1] * 0.7152 + img[:,:,2] * 0.2126
ret, gray_img = cv2.threshold(gray_img,128,255,cv2.THRESH_BINARY)

Applying conditional branching to each pixel individually would be computationally expensive.

Q4. Otsu's Binarization

A method that automatically determines the threshold for binarization. The goal is to maximize the inter-class variance.

ret, gray_img = cv2.threshold(gray_img,0,255,cv2.THRESH_OTSU)

did not work. The cause was the preceding step:

gray_img = img[:,:,0] * 0.0722 + img[:,:,1] * 0.7152 + img[:,:,2] * 0.2126

It does not work when gray_img is of type float.

Answer

gray_img = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
ret, th = cv2.threshold(gray_img, 0, 255, cv2.THRESH_OTSU)

Q5. HSV Conversion

HSV conversion is a method of representing colors using Hue, Saturation, and Value (brightness).

hsv_img = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
hsv_img[:,:,0] = (hsv_img[:,:,0] + 180) % 360
hsv_img = cv2.cvtColor(hsv_img, cv2.COLOR_HSV2BGR)

The colors seem slightly off. Apparently the Hue range is [0:179].

hsv[:,:,0] = (hsv[:,:,0] + 90) % 180

Still different. Implemented without using OpenCV.

Answer

import cv2
import numpy as np
img = cv2.imread("./img/imori.jpeg")
h,w,c = img.shape
hsv = img / 255
for i in range(w):
for j in range(h):
b,g,r = img[i,j,:] / 255
max_val = max(b,g,r)
min_val = min(b,g,r)
val = max_val
sat = max_val - min_val
if max_val == min_val:
hue = 0
elif min_val == b:
hue = 60 * (g-r) / sat + 60
elif min_val == r:
hue = 60 * (b-g) / sat + 180
else:
hue = 60 * (r-b) / sat + 300
# print(hsv[i,j,:])
hsv[i,j,:] = [hue,sat,val]
# print(hsv[i,j,:])
cv2.imwrite("./img/hsv_moto.jpeg",hsv)
hsv[:,:,0] = (hsv[:,:,0] + 180) % 360
revers_img = img/255
for i in range(w):
for j in range(h):
hue,sat,val = hsv[i,j,:]
c = sat
h_dot = hue / 60
x = c * (1 - abs(h_dot % 2 - 1))
if (0 <= h_dot) & (h_dot < 1):
add_h = [c,x,0]
elif (1<= h_dot) & (h_dot < 2):
add_h = [x,c,0]
elif (2<= h_dot) & (h_dot < 3):
add_h = [0,c,x]
elif (3<= h_dot) & (h_dot < 4):
add_h = [0,x,c]
elif (4<= h_dot) & (h_dot < 5):
add_h = [x,0,c]
elif (5<= h_dot) & (h_dot < 6):
add_h = [c,0,x]
else:
add_h = [0,0,0]
revers_img[i,j,:] = np.multiply([1,1,1], (val - c)) + add_h

revers_img = revers_img * 255
revers_img = revers_img[:, :, [2,1,0]]
cv2.imwrite("./img/hsv.jpeg",revers_img)

Need to be more conscious of the height, width, channel ordering. It would be nice to make the code shorter. There are too many if branches.

Q6. Color Reduction

Reduce colors so that R, G, B are each one of 224 (4 values each).

Answer

img = cv2.imread("./img/imori.jpeg")
img = (img // 64 + 1) * 64 - 32
cv2.imwrite("./img/result_img.jpeg",img)

Implemented without using if statements.

Q7. Average Pooling

Divide the image into a grid (partition into fixed-size regions) and fill each region (cell) with the average value of its pixels. imori.jpg is 128x128, so divide into 8x8 grids and apply average pooling.

Answer

import cv2
import numpy as np
img = cv2.imread("./img/imori.jpeg")

def average_pooling(img,karnel):
pool_img = img.copy()
height,width,channel = img.shape
for i in range(0,height,karnel[0]):
for j in range(0,width,karnel[1]):
ave = np.mean(img[i:i+karnel[0],j:j+karnel[1],:],axis = 0)
ave = np.mean(ave,axis = 0)
pool_img[i:i+karnel[0],j:j+karnel[1],:] = ave
return pool_img


kar = (8,8)
img = average_pooling(img,kar)
cv2.imwrite("./img/pool_img.jpeg",img)

I ended up using two for loops. I could not think of a way to reduce them further. Also, I computed the row and column averages separately in two steps; it would have been better to find a way to do it in one step.

def average_pooling(img,karnel)

img is the image, karnel is the grid division range.

Q8. Max Pooling

Apply pooling using the maximum value instead of the average.

Answer

import cv2
import numpy as np
img = cv2.imread("./img/imori.jpeg")

def average_pooling(img,karnel):
pool_img = img.copy()
height,width,channel = img.shape
for i in range(0,height,karnel[0]):
for j in range(0,width,karnel[1]):
ave = np.max(img[i:i+karnel[0],j:j+karnel[1],:],axis = 0)
ave = np.max(ave,axis = 0)
pool_img[i:i+karnel[0],j:j+karnel[1],:] = ave
return pool_img


kar = (8,8)
img = average_pooling(img,kar)
cv2.imwrite("./img/poolmax_img.jpeg",img)

Simply changed the averaging part to compute the maximum value.

Q9. Gaussian Filter

Implement a Gaussian filter (3x3, standard deviation 1.3) and remove noise from imori_noise.jpg.

A Gaussian filter smooths the surrounding pixels of a target pixel using Gaussian distribution weights, defined by the following formula. Such weights are called a kernel or filter.

Answer

import cv2
import numpy as np
img = cv2.imread("./img/imori_noise.jpeg")

def gausu_filter(img,karnel,sigma):
height,width,channel = img.shape
pad = karnel // 2
pad_img = np.zeros((height + pad * 2,width + pad * 2, channel))
pad_img[pad:pad+height,pad:pad+width] = img
weight = gausu(sigma,karnel,pad)
gausu_img = img.copy()
for i in range(height):
for j in range(width):
gausu_img[i,j,0] = np.sum(pad_img[i:i+pad*2+1,j:j+pad*2+1,0]*weight)
gausu_img[i,j,1] = np.sum(pad_img[i:i+pad*2+1,j:j+pad*2+1,1]*weight)
gausu_img[i,j,2] = np.sum(pad_img[i:i+pad*2+1,j:j+pad*2+1,2]*weight)
print(gausu_img)
return gausu_img

def gausu(sigma,karnel,pading):
filt = np.zeros((karnel,karnel))
for x in range(pading * -1, pading + 1):
for y in range(pading * -1, pading + 1):
print(x,y)
filt[x+pading,y+pading] = 1 / (2*np.pi*sigma*sigma) * np.exp((-1 * (x*x + y*y))/(2 * (sigma**2)))
filt /= filt.sum()
return filt



kar = 3
sig = 1.3
img = gausu_filter(img,kar,sig)
cv2.imwrite("./img/gausu_img.jpeg",img)

Used a separate function to create the filter.

Q10. Median Filter

Implement a median filter (3x3) and remove noise from imori_noise.jpg. This filter outputs the median value within a 3x3 region around the target pixel. Apply zero padding as well.

Used this site as a reference to write with fewer for loops.

Answer

import numpy as np
import cv2
from numpy.lib.stride_tricks import as_strided

def pool2d(A, kernel_size, stride, padding, pool_mode='max'):
'''
2D Pooling

Parameters:
A: input 2D array
kernel_size: int, the size of the window
stride: int, the stride of the window
padding: int, implicit zero paddings on both sides of the input
pool_mode: string, 'max' or 'avg'
'''
# Padding
A = np.pad(A, ((padding,padding),(padding,padding),(0,0)), mode='constant')


# Window view of A
output_shape = ((A.shape[0] - kernel_size)//stride + 1,
(A.shape[1] - kernel_size)//stride + 1,
A.shape[2])
kernel_size = (kernel_size, kernel_size)

A_w = as_strided(A, shape = output_shape + kernel_size,
strides = (stride*A.strides[0],
stride*A.strides[1],
stride*A.strides[2]
) + A.strides[0:2])
A_w = A_w.reshape(-1, *kernel_size)

# Return the result of pooling
if pool_mode == 'max':
return A_w.max(axis=(1,2)).reshape(output_shape)
elif pool_mode == 'avg':
return A_w.mean(axis=(1,2)).reshape(output_shape)
elif pool_mode == "med":
return np.median(A_w,axis=(1,2)).reshape(output_shape)



img = cv2.imread("./img/imori_noise.jpeg")
height,width,channel = img.shape
karn = 3
padding = karn // 2
medhian = pool2d(img, kernel_size=karn, stride=1, padding=padding, pool_mode='med')

cv2.imwrite("./img/medhian_img.jpeg",medhian)

By referencing the site, I was able to implement it without using for loops. However, I do not fully understand the role of the strides argument in as_strided. Based on research, it appears to represent memory stride distances.

strides = (stride*A.strides[0],stride*A.strides[1],stride*A.strides[2]) + A.strides[0:2])

has a shape of

(390, 3, 1, 390, 3)

The first part (390, 3, 1) represents (height, width, channel), and it seems like the (height, width) dimensions are appended again?

Q11. Smoothing Filter

Implement a smoothing filter (3x3).

A smoothing filter outputs the average of the pixel values within the filter window.

Answer

import numpy as np
import cv2
from numpy.lib.stride_tricks import as_strided

def pool2d(A, kernel_size, stride, padding, pool_mode='max'):
'''
2D Pooling

Parameters:
A: input 2D array
kernel_size: int, the size of the window
stride: int, the stride of the window
padding: int, implicit zero paddings on both sides of the input
pool_mode: string, 'max' or 'avg'
'''
# Padding
A = np.pad(A, ((padding,padding),(padding,padding),(0,0)), mode='constant')


# Window view of A
output_shape = ((A.shape[0] - kernel_size)//stride + 1,
(A.shape[1] - kernel_size)//stride + 1,
A.shape[2])
kernel_size = (kernel_size, kernel_size)
print((stride*A.strides[0],stride*A.strides[1],stride*A.strides[2]) + A.strides[0:2])

A_w = as_strided(A, shape = output_shape + kernel_size,
strides = (stride*A.strides[0],
stride*A.strides[1],
stride*A.strides[2]
) + A.strides[0:2])
A_w = A_w.reshape(-1, *kernel_size)

# Return the result of pooling
if pool_mode == 'max':
return A_w.max(axis=(1,2)).reshape(output_shape)
elif pool_mode == 'avg':
return A_w.mean(axis=(1,2)).reshape(output_shape)
elif pool_mode == "med":
return np.median(A_w,axis=(1,2)).reshape(output_shape)



img = cv2.imread("./img/imori.jpeg")
height,width,channel = img.shape
karn = 3
padding = karn // 2
mean = pool2d(img, kernel_size=karn, stride=1, padding=padding, pool_mode='avg')

cv2.imwrite("./img/mean_img.jpeg",mean)

Simply changed the last part of the median filter to use the average.

Q12. Motion Filter

Implement a motion filter (3x3).

A motion filter computes the average along the diagonal direction and is defined by the following formula:

[[1/3,0,0]
[0,1/3,0]
[0,0,1/3]]

Answer

import numpy as np
import cv2
from numpy.lib.stride_tricks import as_strided

def pool2d(A, kernel_size, stride, padding, pool_mode='max'):
'''
2D Pooling

Parameters:
A: input 2D array
kernel_size: int, the size of the window
stride: int, the stride of the window
padding: int, implicit zero paddings on both sides of the input
pool_mode: string, 'max' or 'avg'
'''
# Padding
A = np.pad(A, ((padding,padding),(padding,padding),(0,0)), mode='constant')


# Window view of A
output_shape = ((A.shape[0] - kernel_size)//stride + 1,
(A.shape[1] - kernel_size)//stride + 1,
A.shape[2])
kernel_size = (kernel_size, kernel_size)
print((stride*A.strides[0],stride*A.strides[1],stride*A.strides[2]) + A.strides[0:2])

A_w = as_strided(A, shape = output_shape + kernel_size,
strides = (stride*A.strides[0],
stride*A.strides[1],
stride*A.strides[2]
) + A.strides[0:2])
A_w = A_w.reshape(-1, *kernel_size)
weight = [[1/3,0,0],[0,1/3,0],[0,0,1/3]]
weight = np.array(weight).reshape(-1,3,3)

# Return the result of pooling
if pool_mode == 'max':
return A_w.max(axis=(1,2)).reshape(output_shape)
elif pool_mode == 'avg':
return A_w.mean(axis=(1,2)).reshape(output_shape)
elif pool_mode == "med":
return np.median(A_w,axis=(1,2)).reshape(output_shape)
elif pool_mode == "motion":
return np.sum(A_w*weight,axis = (1,2)).reshape(output_shape)



img = cv2.imread("./img/imori.jpeg")
height,width,channel = img.shape
karn = 3
padding = karn // 2
motion = pool2d(img, kernel_size=karn, stride=1, padding=padding, pool_mode='motion')
print(motion.shape)

cv2.imwrite("./img/motion_img.jpeg",motion)

Created the weights and multiplied each element by the corresponding value.

Q13. MAX-MIN Filter

The MAX-MIN filter outputs the difference between the maximum and minimum pixel values within the filter window and is one of the edge detection filters.

Answer

import numpy as np
import cv2
from numpy.lib.stride_tricks import as_strided

def pool2d(A, kernel_size, stride, padding, pool_mode='max'):
'''
2D Pooling

Parameters:
A: input 2D array
kernel_size: int, the size of the window
stride: int, the stride of the window
padding: int, implicit zero paddings on both sides of the input
pool_mode: string, 'max' or 'avg'
'''
# Padding
A = np.pad(A, ((padding,padding),(padding,padding)), mode='constant')


# Window view of A
output_shape = ((A.shape[0] - kernel_size)//stride + 1,
(A.shape[1] - kernel_size)//stride + 1)
kernel_size = (kernel_size, kernel_size)
# print((stride*A.strides[0],stride*A.strides[1],stride*A.strides[2]) + A.strides[0:2])

A_w = as_strided(A, shape = output_shape + kernel_size,
strides = (stride*A.strides[0],
stride*A.strides[1],
) + A.strides)
A_w = A_w.reshape(-1, *kernel_size)
weight = [[1/3,0,0],[0,1/3,0],[0,0,1/3]]
weight = np.array(weight).reshape(-1,3,3)

# Return the result of pooling
if pool_mode == 'max':
return A_w.max(axis=(1,2)).reshape(output_shape)
elif pool_mode == 'min':
return A_w.min(axis=(1,2)).reshape(output_shape)
elif pool_mode == 'avg':
return A_w.mean(axis=(1,2)).reshape(output_shape)
elif pool_mode == "med":
return np.median(A_w,axis=(1,2)).reshape(output_shape)
elif pool_mode == "motion":
return np.sum(A_w*weight,axis = (1,2)).reshape(output_shape)
elif pool_mode == "max_min":
max_pool = A_w.max(axis=(1,2)).reshape(output_shape)
min_pool = A_w.min(axis=(1,2)).reshape(output_shape)
return max_pool - min_pool



img = cv2.imread("./img/imori.jpeg")
img = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
height,width= img.shape
karn = 3
padding = karn // 2
max_min = pool2d(img, kernel_size=karn, stride=1, padding=padding, pool_mode='max_min')
print(max_min.shape)

cv2.imwrite("./img/min_max_img.jpeg",max_min)

I converted to grayscale beforehand, but would the result be the same if I created it from a BGR image and then converted to grayscale?

Q14. Differential Filter

Implement a differential filter (3x3).

A differential filter extracts edges where rapid intensity changes occur by computing differences between adjacent pixels.

Vertical filter

[[0,0,0]
[-1,1,0]
[0,0,0]]

Horizontal filter

[[0,-1,0]
[0,1,0]
[0,0,0]]

Answer

import numpy as np
import cv2
from numpy.lib.stride_tricks import as_strided

def pool2d(A, kernel_size, stride, padding, pool_mode='max'):
'''
2D Pooling

Parameters:
A: input 2D array
kernel_size: int, the size of the window
stride: int, the stride of the window
padding: int, implicit zero paddings on both sides of the input
pool_mode: string, 'max' or 'avg'
'''
# Padding
A = np.pad(A, ((padding,padding),(padding,padding),(0,0)), mode='constant')


# Window view of A
output_shape = ((A.shape[0] - kernel_size)//stride + 1,
(A.shape[1] - kernel_size)//stride + 1,
A.shape[2])
kernel_size = (kernel_size, kernel_size)
print((stride*A.strides[0],stride*A.strides[1],stride*A.strides[2]) + A.strides[0:2])

A_w = as_strided(A, shape = output_shape + kernel_size,
strides = (stride*A.strides[0],
stride*A.strides[1],
stride*A.strides[2]
) + A.strides[0:2])
A_w = A_w.reshape(-1, *kernel_size)
weight = [[1/3,0,0],[0,1/3,0],[0,0,1/3]]
weight = np.array(weight).reshape(-1,3,3)
weight_w = [[0,0,0],[-1,1,0],[0,0,0]]
weight_h = [[0,-1,0],[0,1,0],[0,0,0]]
weight_w = np.array(weight_w).reshape(-1,3,3)
weight_h = np.array(weight_h).reshape(-1,3,3)

# Return the result of pooling
if pool_mode == 'max':
return A_w.max(axis=(1,2)).reshape(output_shape)
elif pool_mode == 'min':
return A_w.min(axis=(1,2)).reshape(output_shape)
elif pool_mode == 'avg':
return A_w.mean(axis=(1,2)).reshape(output_shape)
elif pool_mode == "med":
return np.median(A_w,axis=(1,2)).reshape(output_shape)
elif pool_mode == "motion":
return np.sum(A_w*weight,axis = (1,2)).reshape(output_shape)
elif pool_mode == "max_min":
max_pool = A_w.max(axis=(1,2)).reshape(output_shape)
min_pool = A_w.min(axis=(1,2)).reshape(output_shape)
return max_pool - min_pool
elif pool_mode == "diff_w":
return np.sum(A_w*weight_w,axis = (1,2)).reshape(output_shape)
elif pool_mode == "diff_h":
return np.sum(A_w*weight_h,axis = (1,2)).reshape(output_shape)



img = cv2.imread("./img/imori.jpeg")
height,width,channel = img.shape
karn = 3
padding = karn // 2
diff_w_img = pool2d(img, kernel_size=karn, stride=1, padding=padding, pool_mode='diff_w')
diff_h_img = pool2d(img, kernel_size=karn, stride=1, padding=padding, pool_mode='diff_h')

cv2.imwrite("./img/diff_h_img.jpeg",diff_h_img)
cv2.imwrite("./img/diff_w_img.jpeg",diff_w_img)

I found that separating the filters makes the processing straightforward.

Q20. Histogram Display

Display the histogram of imori_dark.jpg using matplotlib.

Answer

import numpy as np
import cv2
import matplotlib.pyplot as plt

img = cv2.imread("./img/imori_dark.jpeg")
gaso = np.array(img).flatten()
plt.hist(gaso,bins=255,range=(0,255),rwidth=0.8)
plt.show()

The data must be flattened to 1D before creating a histogram.

Q21. Histogram Normalization

Implement histogram normalization. Convert an image with pixel values in the range [c, d] to the range [a, b].

Answer

import numpy as np
import cv2
import matplotlib.pyplot as plt

def gray_scale_trans(img,a=0,b=255):
out = img.copy()
c = img.min()
d = img.max()

out = (b-a)/(d-c)*(out-c)+a

np.where(out < a, a, out)
np.where(b < out, b, out)

return out

img = cv2.imread("./img/imori_dark.jpeg")

trans_img = gray_scale_trans(img)

gaso = np.array(trans_img).flatten()
plt.hist(gaso,bins=255,range=(0,255),rwidth=0.8)
cv2.imwrite("./img/trans_img.jpeg",trans_img)
plt.show()

Implemented using np.where. Expressed histogram normalization as a function.

Q.22 Histogram Manipulation

Manipulate the histogram so that the mean becomes m0=128 and the standard deviation becomes s0=52. This is an operation to flatten the histogram.

Answer

import numpy as np
import cv2
import matplotlib.pyplot as plt

def hist_heitan(img,m0=128,s0=52):
out = img.copy()
s = np.std(img)
m = np.average(img)
out = s0 / s * (out - m) + m0
return out

img = cv2.imread("./img/imori_dark.jpeg")

trans_img = hist_heitan(img)
gaso = np.array(trans_img).flatten()
plt.hist(gaso,bins=255,range=(0,255),rwidth=0.8)
cv2.imwrite("./img/trans_img_1.jpeg",trans_img)
plt.show()

Implemented a function to flatten the histogram.

Q.23 Histogram Equalization

Implement histogram equalization. Histogram equalization is an operation that flattens the histogram without requiring values like the mean or standard deviation -- it equalizes the histogram values.

import numpy as np
import cv2
import matplotlib.pyplot as plt

def hist_heitan_function(img, z_max=255):
out = img.copy()
height, width, channel = img.shape
S = height * width * channel

sum_h = 0

for i in range(1, 255):
ind = np.where(img == i)
sum_h += len(img[ind])
z_prime = z_max / S * sum_h
out[ind] = z_prime

return out


img = cv2.imread("./img/imori.jpeg")

trans_img = hist_heitan_function(img)
gaso = np.array(trans_img).flatten()
plt.hist(gaso, bins=255, range=(0, 255), rwidth=0.8)
cv2.imwrite("./img/trans_img_3.jpeg", trans_img)
plt.show()

I tried to eliminate the for loop using np.where but could not get it to work.