Preprocessing the Picture Dataset for Left Ventricle Segmentation


The human coronary heart, a posh and very important organ, has been the topic of numerous research, breakthroughs, and improvements within the area of medical analysis. One such innovation is echocardiography, a non-invasive imaging approach that has revolutionized how we visualize and assess coronary heart operate. With the arrival of superior machine studying algorithms, extracting essential data from these photographs has turn into an space of lively analysis. On this weblog put up, we’ll delve into the world of biomedical picture segmentation, specializing in the left ventricle of the center, a vital part of our circulatory system. Be a part of me as I preprocess the Cardiac Acquisitions for Multi-structure Ultrasound Segmentation (CAMUS) dataset, strolling you thru every step in Python to make sure your segmentation mannequin has a robust basis to construct upon.

A Full Python Tutorial to Study Knowledge Science from Scratch

Studying Goal:

Discover the method of preprocessing the Cardiac Acquisitions for the Multi-structure Ultrasound Segmentation (CAMUS) dataset. The creators designed the CAMUS dataset for evaluating left ventricle segmentation and ejection fraction evaluation algorithms in echocardiography, and it’s publicly obtainable. It consists of 2D echocardiographic photographs acquired from completely different views, such because the four-chamber (4ch) view. Preprocessing is important in constructing an correct segmentation mannequin. It improves the standard of enter knowledge and ensures that the mannequin is educated on constant and normalized knowledge. This tutorial will use Python and numerous libraries to preprocess the photographs and their corresponding masks.

This text was revealed as part of the Data Science Blogathon.

Desk of Contents

Dataset Overview

The Cardiac Acquisitions for Multi-structure Ultrasound Segmentation dataset might be downloaded from the next hyperlink: It incorporates 500 picture sequences with corresponding expert-drawn contours of the left ventricle. This tutorial will give attention to the 4ch view photographs and masks. The pictures are supplied in MetaImage (.mhd) format, which requires specialised libraries like SimpleITK for studying and processing.

Preprocessing Steps

  1. Mount Google Drive to entry the dataset.
  2. Set up required libraries (SimpleITK, h5py).
  3. Set dataset paths.
  4. Outline helper features for knowledge normalization, studying picture knowledge, and resizing.
  5. Visualize random photographs and masks from the dataset.
  6. Calculate picture dimensions (width and size).
  7. Resize photographs and masks to constant dimensions.
  8. Normalize picture pixel values.
  9. Save preprocessed photographs and masks in batches.

Right here is an summary of the steps within the type of a flowchart for the preprocessing of CAMUS picture datasets:

Code Walkthrough

First, we mount Google Drive to entry the dataset and set up the required libraries (SimpleITK, h5py) utilizing the !pip set up command.

Mount Google Drive to entry the dataset

from google.colab import drive
drive.mount('/content material/drive')

The code from Google.colab import drive is importing the mandatory module drive from Google.colab
bundle. This bundle offers instruments for working with Google Colaboratory, a free cloud-based coding, and knowledge evaluation platform.

The subsequent line drive.mount(‘/content material/drive’) calls the mount() operate from the drive module to mount your Google Drive account. This lets you entry information and folders saved in your Google Drive immediately out of your Colab pocket book.

Operating this code will immediate you to authorize entry to your Google Drive account by following a URL and coming into an authorization code. As soon as this step is full, your Google Drive will likely be mounted, and it is possible for you to to entry information in your Drive utilizing the file path /content material/drive/ inside your Colab pocket book.

Total, this code is establishing the mandatory configuration to allow you to entry information in your Google Drive throughout the Colab setting, which might be helpful for working with knowledge or information that you’ve got saved within the cloud.

Set up Required lLibraries(SimpleITK, h5py)

import os
import numpy as np 
import pandas as pd 
import time
import random
from contextlib import contextmanager
from functools import partial
import seaborn as sns
import SimpleITK as sitk
import matplotlib.pylab as plt
%matplotlib inline
import cv2
from tqdm.pocket book import tqdm
import h5py
from skimage.rework import resize

!pip set up SimpleITK
!pip set up h5py

The primary few strains of the code are importing obligatory Python modules like os, numpy, pandas, time, random, contextlib, functools, seaborn, SimpleITK, matplotlib, cv2, tqdm, and h5py. These modules present features and lessons for working with arrays, dataframes, plotting, picture processing, and extra.

The subsequent two strains set up the SimpleITK and h5py libraries utilizing pip, which lets you use these libraries in your code.

Total, this code imports obligatory Python modules, arrange paths to knowledge directories and defines helper features for measuring the time a code block takes. It’s establishing the mandatory configuration for working with knowledge for a cardiac picture evaluation activity.

Set Dataset Paths.

data_path = "/content material/drive/MyDrive/CAM/LVEF/CAMUS/original_data/knowledge/coaching/4ch/"

if os.path.exists(data_path):
    print(f"Path exists: {data_path}")
    print(f"Path not discovered: {data_path}")

fig_size = plt.rcParams["figure.figsize"]
fig_size[0] = 7
fig_size[1] = 9
plt.rcParams["figure.figsize"] = fig_size

def timer(identify):
    t0 = time.time()
    print(f'[{name}] finished in {time.time() - t0:.0f} s')

ROOT_PATH='/content material/drive/MyDrive/CAM/LVEF/CAMUS/original_data/knowledge/'

The subsequent block of code units the data_path variable to the situation of the coaching knowledge for a cardiac picture evaluation activity. It checks whether or not the desired path exists utilizing the os.path.exists() operate and prints a
a message to the console indicating whether or not the trail was discovered or not.

The subsequent block of code units the dimensions of the plot determine utilizing plt.rcParams[“figure.figsize”] and units up a timer operate utilizing a context supervisor. The timer operate is used to measure the time taken to run a code block.

Lastly, the code units up a number of variables with paths to completely different directories throughout the authentic knowledge folder, which is positioned in a Google Drive account (ROOT_PATH, TRAIN_PATH, and TEST_PATH). These variables will likely be used later within the code to load and course of knowledge for the cardiac picture evaluation activity.

Outline Helper Features for Knowledge Normalization, Studying Picture Knowledge, and Resizing

def data_norm(enter):
    enter = np.array(enter, dtype=np.float32)
    enter  = enter - np.imply(enter)
    output = enter / (np.std(enter) + 1e-12)
    return output

def mhd_to_array(path):
    return sitk.GetArrayFromImage(sitk.ReadImage(path, sitk.sitkFloat32))

def read_info(data_file):
  data = {}
  with open(data_file, 'r') as f:
    for line in f.readlines():
      info_type, info_details = line.strip('n').cut up(': ')
      data[info_type] = info_details
  return data

def plot_histogram(picture, title):
    plt.hist(picture.ravel(), bins=256)
    plt.xlabel('Pixel Depth')

def plot_random_image_and_mask(image_folder, mask_folder, image_files, mask_files):
    index = random.randint(0, len(image_files) - 1)
    img_path = part of(image_folder, image_files[index])
    mask_path = part of(mask_folder, mask_files[index])
    img = sitk.GetArrayFromImage(sitk.ReadImage(img_path, sitk.sitkFloat32))
    masks = sitk.GetArrayFromImage(sitk.ReadImage(mask_path, sitk.sitkFloat32))
    fig, ax = plt.subplots(1, 2, figsize=(10, 10))
    ax[0].imshow(img[0], cmap='grey')
    ax[1].imshow(masks[0], cmap='grey')

This code defines a number of features for picture processing and visualization:

  • data_norm(enter): This operate takes an enter picture as an array, normalizes it by subtracting the imply and dividing by the usual deviation, and returns the normalized picture.
  • mhd_to_array(path): This operate reads a .mhd picture file from the desired path utilizing SimpleITK and returns the picture as a NumPy array.
  • read_info(data_file): This operate reads details about the picture from the desired file and returns it as a dictionary.
  • plot_histogram(picture, title): This operate plots a histogram of pixel intensities for the desired picture with the given title.
  • plot_random_image_and_mask(image_folder, mask_folder, image_files, mask_files): This operate selects a random picture and masks from the desired folders and information, reads them utilizing SimpleITK, and shows them side-by-side in a plot.

These features are seemingly being utilized in a bigger picture processing or machine studying challenge to preprocess and visualize medical picture knowledge.

Visualize Random Pictures and Masks From the Dataset

image_files = sorted([f for f in os.listdir(TRAIN_PATH + "4ch/frames") if f.endswith('.mhd')])
mask_files = sorted([f for f in os.listdir(TRAIN_PATH + "4ch/masks") if f.endswith('.mhd')])

plot_random_image_and_mask(TRAIN_PATH + "4ch/frames", TRAIN_PATH + "4ch/masks", image_files, mask_files)#import csv
left ventricle segmentation using python

This code block creates two lists, image_files, and mask_files, containing the names of all .mhd information within the coaching set for the 4ch (4 chambers) view of the center. The sorted operate is used to kind the file names in ascending order.

Then, the plot_random_image_and_mask operate is named with the paths to the picture and masks folders (TRAIN_PATH + “4ch/frames” and TRAIN_PATH + “4ch/masks”, respectively) and the lists of file names as arguments (image_files and mask_files). This operate selects a random picture and masks from the desired folders utilizing the random module, reads them utilizing SimpleITK, and shows them side-by-side in a plot utilizing Matplotlib.

The aim of this code block is more likely to visualize a random picture and corresponding masks from the coaching set for the 4ch view, which will help to confirm that the info is being learn and processed accurately.

Calculate Picture Dimensions (Width and Size)

widths = []
lengths = []
for c in clst:
    file_list = os.listdir( part of(TRAIN_PATH, c+"/frames"))
    for i in file_list:
        if "mhd" in i:
            path=TRAIN_PATH+c+"/frames/"+ i
            w = mhd_to_array(path).form[2]
            l = mhd_to_array(path).form[1]
print('Max width : ',max(widths))
print('Min width : ',min(widths))
print('Max size : ',max(lengths))
print('Min size : ',min(lengths))

This code computes the photographs’ most and minimal width and size within the specified listing.

The record of folders to be thought of is contained within the variable clst. On this case, it solely incorporates “4ch”.

The code then iterates by way of all of the information within the specified listing for every folder in clst, and checks if the file has the extension “.mhd.” In that case, it reads the file utilizing the mhd_to_array() operate and retrieves its width and size utilizing the .form[2] and .form[1] attributes, respectively. We then append the width and size to the record’s widths and lengths.

Lastly, we print the utmost and minimal values of the widths and lengths lists utilizing the max() and min() features.

Resize Pictures and Masks to Constant Dimensions

def resize_image(picture, width, top):
    return resize(picture, (top, width), preserve_range=True, mode="mirror", anti_aliasing=True)

def preprocess_images_and_masks(image_folder, mask_folder, width, top, image_files, mask_files):
    preprocessed_images = []
    preprocessed_masks = []

    for img_file, mask_file in tqdm(zip(image_files, mask_files), whole=len(image_files)):
        img_path = part of(image_folder, img_file)
        mask_path = part of(mask_folder, mask_file)

        img = mhd_to_array(img_path)
        masks = mhd_to_array(mask_path)

        img_resized = np.zeros((img.form[0], top, width), dtype=np.float32)
        mask_resized = np.zeros((masks.form[0], top, width), dtype=np.float32)

        for i in vary(img.form[0]):
            img_resized[i] = resize_image(img[i], width, top)
            mask_resized[i] = resize_image(masks[i], width, top)

        img_normalized = data_norm(img_resized)


    return preprocessed_images, preprocessed_masks

This code defines a operate known as resize_image that resizes a picture to a specified width and top utilizing the resize operate from the skimage library. You may go three arguments to the operate: the picture you need to resize, the specified width, and the specified top. We set the preserve_range argument to True to make sure that the pixel values of the resized picture are throughout the similar vary as the unique picture. We set the mode argument to ‘mirror’ to deal with the sides of the picture, and we set anti_aliasing to True to clean out the picture.

The preprocess_images_and_masks operate takes in a folder containing photographs and a folder containing
corresponding masks, in addition to the specified width and top for resizing. It additionally takes in lists of picture and masks information. The operate then loops by way of every pair of picture and masks information. It additionally reads within the photographs and masks utilizing the mhd_to_array operate, resizes the photographs and masks utilizing the resize_image operate, and normalizes the resized photographs utilizing the data_norm operate outlined earlier. The operate appends the preprocessed photographs and masks to 2 separate lists after which returns them.

Normalize Picture Pixel Values


image_files = sorted([f for f in os.listdir(TRAIN_PATH + "4ch/frames") if f.endswith('.mhd')])
mask_files = sorted([f for f in os.listdir(TRAIN_PATH + "4ch/masks") if f.endswith('.mhd')])

preprocessed_data_path = "/content material/drive/MyDrive/CAM/CAM1/preprocessed_data/"

if not os.path.exists(preprocessed_data_path):

for batch_start in vary(0, len(image_files), BATCH_SIZE):
    batch_end = min(batch_start + BATCH_SIZE, len(image_files))
    X_batch, y_batch = preprocess_images_and_masks(
        TRAIN_PATH + "4ch/frames", TRAIN_PATH + "4ch/masks",
        image_files[batch_start:batch_end], mask_files[batch_start:batch_end]

This code preprocesses the photographs and masks for a deep studying mannequin by resizing them to a set measurement and normalizing the pixel values.

The RESIZED_WIDTH and RESIZED_LENGTH variables outline the width and top of the resized photographs, respectively. The BATCH_SIZE variable determines what number of photographs are processed at a time.

The image_files and mask_files variables are lists of file names of the enter photographs and masks, respectively. We use the sorted operate to make sure that the photographs and masks are in the identical order.

If the listing specified within the preprocessed_data_path variable doesn’t exist, the operate creates it utilizing os.makedirs. We are going to save the preprocessed knowledge right here.

The for loop iterates over the enter photographs and masks in batches of measurement BATCH_SIZE. Every batch’s preprocess_images_and_masks operate is named to resize and normalize the photographs and masks.

Save Preprocessed Pictures and Masks in Batches

        preprocessed_data_path + f"preprocessed_data_batch_{batch_start}_{batch_end}.npz",
        X=X_batch, y=y_batch

We will save the ensuing preprocessed knowledge to a NumPy archive file utilizing np.savez. The file identify of every archive file contains the batch begin and finish indices. Retaining observe of which photographs and masks are processed in that batch is useful.

On the earth of medical picture evaluation, preprocessing performs a pivotal function in enhancing the standard and interpretability of the photographs. This helps improves the understanding of human consultants. Moreover, it additionally considerably boosts the efficiency of ML algorithms. Let’s now dive deep into the facility of preprocessing. We are going to do that by analyzing its affect on the Cardiac Acquisitions for the Multi-structure Ultrasound Segmentation dataset. Get able to witness a placing transformation! I’ll unveil a side-by-side comparability of the unique and preprocessed photographs, showcasing the exceptional enhancements achieved by way of our preprocessing pipeline.


Embark on a charming exploration of the world of picture histograms.  Right here we’ll unravel the delicate nuances between authentic and preprocessed medical photographs. Here’s a gorgeous visible comparability of histograms that vividly spotlight the affect of preprocessing on the Cardiac Acquisitions for the Multi-structure Ultrasound Segmentation dataset. Witness the fascinating transformation as we delve into the realm of pixel depth distributions. We can even make clear the exceptional enhancements achieved by way of our preprocessing methods.

histogram | python |  Segmentation | preprocessing

Lastly, in our newest weblog put up, let’s witness an interaction between authentic, preprocessed photographs and their corresponding masks.

python | preprocessing

Facets To Think about

Listed below are some vital facets to contemplate when working with Cardiac Acquisitions for Multi-structure Ultrasound Segmentation datasets and picture segmentation basically:

  1. Knowledge Augmentation: We will use knowledge augmentation as a method to extend the quantity of coaching knowledge. It includes making use of numerous transformations to the present dataset. This helps in enhancing the generalization capabilities of a mannequin. For echocardiographic photographs, you need to use methods equivalent to rotation, scaling, flipping, and brightness/distinction changes. Make sure that to use the identical transformations to each the photographs and their corresponding masks.
  2. Practice-Validation Cut up: Divide your dataset into coaching and validation. This assist units to observe the mannequin’s efficiency throughout coaching and forestall overfitting. A typical ratio is 80% for coaching and 20% for validation. Be certain that you carry out the cut up randomly and in a stratified method, the place the distribution of lessons is comparable in each units.
  3. Selection of Mannequin Structure: The selection of the mannequin structure performs a big function within the efficiency of the segmentation activity. U-Web is a well-liked convolutional neural community structure for biomedical picture segmentation. Varied purposes have demonstrated its effectiveness of it. We will additionally take into account different architectures like DeepLabv3 and Masks R-CNN for segmentation duties.
  4. Loss Features: The selection of the loss operate is essential for coaching a segmentation mannequin. Generally used loss features for segmentation duties are Cube loss, Jaccard/Intersection over Union (IoU) loss, and Binary Cross-Entropy loss. You may as well experiment with a mix of those loss features to attain higher efficiency.
  5. Analysis Metrics: Use acceptable analysis metrics to measure the efficiency of your segmentation mannequin. Widespread metrics for segmentation duties are the Cube coefficient, Jaccard/Intersection over Union (IoU) rating, sensitivity, specificity, and accuracy. Observe these metrics throughout coaching to make sure that your mannequin learns the specified patterns from the info.
  6. Put up-Processing: We will apply post-processing methods to enhance the ultimate outcomes of the segmentation mannequin on its output. Some frequent post-processing methods embody morphological operations (e.g., dilation, erosion), gap filling, and contour smoothing. These methods will help refine the segmentation output and produce higher contours.


In conclusion, this weblog mentioned the significance of preprocessing the CAMUS dataset for environment friendly utilization in cardiovascular imaging evaluation. Researchers and practitioners can optimize the dataset by making use of numerous preprocessing methods. This will help develop and check fashions within the medical imaging

Key takeaways:

  • Preprocessing the CAMUS dataset is essential for efficient use in cardiovascular imaging evaluation.
  • Methods equivalent to picture resizing, normalization, and knowledge augmentation can enhance the dataset’s usability.
  • Preprocessed knowledge helps researchers and practitioners develop and check extra correct and environment friendly fashions in medical imaging.

Comply with me to remain up to date on the following steps for reaching promising leads to LV segmentation and efficiency metrics visualizations.

The media proven on this article shouldn’t be owned by Analytics Vidhya and is used on the Creator’s discretion.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button