SeaFloor Sampling and Data Demo Center- jupyter{book} migration — GSoC’21 @ IOOS

Lohithmunakala
6 min readAug 23, 2021
GSoC-21-IOOS

Overview

IOOS is a national-regional partnership working to provide new tools and forecasts to improve safety, enhance the economy, and protect our environment. As a part of GSoC’21, I worked under IOOS on two projects.

  1. Migrating IOOS Data Demo Center to jupyter{book}. Website. Link to Code.
  2. Benthic Sea Floor Sampling. Link to Code.

Migrating IOOS Data Demo Center to jupyter{book}

The main goal of this project was to migrate the whole website to jupyter{book}. jupyter{book} provides a clean view to visualize notebooks. The notebooks mainly consist of various uses of IOOSs’ tools which are open source. To have a peak at what these tools are, look here.

Here are some comparison images before and after the code migration to jupyter{book}. As always, the project looked easy to do, but the devil was in the details. With the help of my mentors, I was able to successfully complete this project based on their requirements.

The website has been deployed here: https://ioos.github.io/notebooks_demos/

The link to the code is here: https://github.com/ioos/notebooks_demos/tree/master/jupyterbook

Here are some comparison images before and after the code migration to jupyter{book}.

The previous IOOS Data Demo Data Center
The new IOOS Code Lab

An in depth version of the code for jupyter{book} can be found in these two blogs.

  1. Creating websites using Jupyter{Book}
  2. Deploying Jupyter{Book}

Contributions done during GSoC

Benthic Sea Floor Sampling

The main idea behind the problem statement can be divided into 4 parts.

  1. Blur Detection for images from a particular transect.
  2. Finding if the image is valuable.
  3. Finding the area of the sea floor being looked at.
  4. Detecting the kind of habitat in the frame.

The code for the following can be found here: https://github.com/ioos/seafloor-sampling-ml

I explain below in detail how I solved these problems using Computer Vision and Deep Learning.

Blur Detection

When the sled moves on the sea floor, ti has a camera with lasers attached to it and is being pulled by boat. If the speed of the boat is too fast or the floor is uneven, it usually results in the extracted images being too blurred. Any kind of analysis on these images is not possible. These have to manually deleted. This results in a lot of time wasted which could be used on other products.

We solve this problem using Fourier Transformation. The code to do this is written below.

def detect_blur_fit(image, size=40):
#getting the size of the image
(height, width) = image.shape
#finding the center of the image
(X_center, Y_center) = (int(width/2.0), int(height/2.0))
## implementing fft on image
# performing 2D FFT on the gray image
fft = np.fft.fft2(image)
# shifting the zero-frequenY_center component to the center of the spectrum
fftShift = np.fft.fftshift(fft)
# setting all the values of the zero-frequecny component to 0
fftShift[Y_center - size:Y_center + size, X_center - size:X_center + size] = 0
# inverting the previously performed shift
fftShift = np.fft.ifftshift(fftShift)
# inverting the un-shifted image
recon = np.fft.ifft2(fftShift)
# finding the magnitude and then the mean of the re-inverted image
magnitude = 20 * np.log(np.abs(recon))
mean = np.mean(magnitude)
# returning the mean calculated
return mean

We pass all the images in a transect through this detector and generate a set of values that are saved in a CSV file for further analysis. Any frame that is too bad/too blurry, gets a low value and is then deleted. This saves the initial time before any kind of analysis is performed on the images.

Representational Image of Benthic Explorations

Finding if the images are usable

All the images from the transect have lasers in them which help us calculate how far or close the sledge(camera) is. We can use this data to eliminate more images which are not detected properly by the blur detector.

We do this by isolating the lasers and extracting them from the frame that we are analyzing. We use HSV segmentation for this process and it proves effective in more than 80% of the images being analyzed.

The following code is run though all the images and the lasers points are extracted.

#function to extract lasers 
def laser_detection(image):
#cropping the image to only focus on laser area
image = image[:, 2000:3000]
#converting image from BGR to RGB
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
#converting the image to HSV format
hsv = cv2.cvtColor(image,cv2.COLOR_BGR2HSV)
#defining the range of red to extract the lasers
cell_hsvmin = (100,80,150)
cell_hsvmax = (150,255,255)
#showing the HSV image for visualization
hsv = cv2.cvtColor(image,cv2.COLOR_BGR2HSV)
#extracting lasers values from images
color_thresh = cv2.inRange(hsv, cell_hsvmin, cell_hsvmax)
return color_thresh

We then perform morphological operations on the extracted lasers to make sure we eliminate any kind of noise that exits. The code for it is as follows.

image_closing = cv2.morphologyEx(color_thresh, cv2.MORPH_CLOSE, kernel, iterations =5)
image_opening = cv2.morphologyEx(image_closing, cv2.MORPH_OPEN, kernel, iterations =2)

We then skeletonize the code and measure the distance between the lasers by finding out which points are parallel to each other and eliminating other noise that is present in the extracted image. The code for it is as follows.

image_opening = image_opening/255
#skeletonizing the image to only get the points in which the lasers are detected.
image_skeleton = skeletonize(image_opening)
#staking the points
points = np.column_stack(np.where(image_skeleton == True))
#defining an array to store the parallel points to find the best distance
y_coordinates_parallel = []
### going through the points in the array
for i in range (0,len(points)-1):
for j in range(i, len(points)-1):
if points[i][0] == points[j][0]:
if points[j][1] - points[i][1] > 40 and points[j][1] - points[i][1] < 200:
# print(points[i], points[j])
y_coordinates_parallel.append(points[j][1]-points[i][1])
if len(y_coordinates_parallel) == 0:
for i in range (0,len(points)-1):
for j in range(i, len(points)-1):
if points[i][0]+1 == points[j][0]:
if points[j][1] - points[i][1] > 40 and points[j][1] - points[i][1] < 200:
# print(points[i], points[j])
y_coordinates_parallel.append(points[j][1]-points[i][1])
if len(y_coordinates_parallel) != 0:
return median(y_coordinates_parallel)
else:
return 0

We then get find the median distance out of all the pixels that exist. This gives us the distance between the lasers in pixels.

Finding the Area of the Sea Floor being Analyzed

After we get the laser distance in terms of pixels, we perform HSV segmentation on the image and get the bright parts of the image out. We draw a contour around this and find the area inside the contour. This area is in terms of pixels.

We need to convert this into m². We have additional data that the distance between this pixels is 2.5 cm. We use this and the laser distance to find out how many cm² is covered by each pixel and write the area.

Habitat Classification for the Sea Floor Being Analyzed

The transects that are analyzed have more than 50% of the transect as pure sand/mud. This repetitive action is a waste of human time and resource. A project could save more than a week just by this single step.

To solve this, we train a model that is able to classify between images that are pure mud/sand and not mud/sand. We train a transfer learning model on the collected and labelled images. We used a ResNet50 to train this and trained it for close 50 epochs.

This resulted in the classification having an accuracy of ~81%.

This project was one of the most challenging project I worked on till date. It included a completely new field being integrated into computer vision. it was surely fun to work on.

Contributions done during GSoC

It was a lot of fun to build a repository from the group up and to maintain it.

Work after GSoC

The [4] Habitat Classification model that I developed is a MVP. This could be done better with better access to data and new imagery, which is the next goal for the project. With new imagery from GoPros’, the images become substantially better in quality which would result in better training of the model.

Acknowledgements

I’d like to thank Filipe, Mathew, Alex, Ben and Dalton, without whom these projects would not have been possible to do. They have been instrumental in helping me solve the necessary problem statements. This summer was filled with a lot of fun, new learning and excitement.

Looking forward to more contribution to IOOS ❤.

--

--