Divebomb Functions

The following are the primary functions by divebomb to process the dives. The main function is profile_dives() and the other functions (display_dive(), cluster_dives(), and export_dives()) are used has helper functions inside profile_dives().

divebomb.clean_dive_data(data, columns={'depth': 'depth', 'time': 'time'})
Parameters:
  • data – a Pandas DataFrame consisting of a time and a depth column
  • columns – column renaming dictionary if needed
Returns:

a Pandas DataFrame with time in seconds since 1970-10-01 and depth

divebomb.cluster_dives(dives, pca_components=8, n_clusters=None, attributes=None)

This function takes advantage of sklearn and reduces the dimensionality with Principal Component Analysis, finds the optimal number of n_clusters using Gaussian Mixed Models and the Bayesion Information Criterion, then uses Agglomerative Clustering on the dives profiles to group them.

Parameters:
  • dives – a pandas DataFrame of dive attributes
  • pca_components – the number of components for dimensionality reduction. Should be fewer than the number of columns in the dataset.
  • n_clusters – An override for the number of clusters to find when clustering
  • attributes – A list of variable/columns to use during the process. This can be a subset of the columns in the data.
Returns:

the clustered dives, the PCA loadings matrix, and the PCA output matrix

divebomb.display_dive(index, data, starts, type='dive', surface_threshold=0, at_depth_threshold=0.15)

This function just takes the index, the data, and the starts and displays the dive using plotly. It is used as a helper method for viewing the dives if ipython_display is True in profile_dives().

Parameters:
  • index – the index of the dive profile to plot
  • data – the dataframe of the original dive data
  • starts – the dataframe of the dive starts
  • type – s tring that indicates using either the Dive or DeepDive class
  • surface_threshold – the calculated surface threshold based on animal length
  • at_depth_threshold – a value from 0 - 1 indicating distance from the bottom of the dive at which the animal is considered to be at depth
Returns:

a dive plot from plotly

divebomb.export_dives(dives, data, folder, is_surface_events=False)

This function exports each dive to its own netCDF file grouped by cluster

Parameters:
  • dives – a Pandas DataFrame of dive profiles to export
  • data – a Pandas dataframe of the original dive data
  • folder – a string indicating the parent folder for the files and sub folders
  • is_surface_events – a boolean indicating if the dive profiles are entirely surface events
divebomb.export_to_csv(folder, dives, loadings, pca_output_matrix, insufficient_dives=None)

Will output dive profiles, loadings, PCA Matrix, and inssufficent dive into the indicated folder as CSVs.

Parameters:
  • folder – the path to export all files to, the folder will be overwritten
  • dives – a Pandas DataFrame of the dive profiles and clusters, usually generated from cluster_dives()
  • loadings – a Pandas DataFrame of the Principle Component Analysis loadings from cluster_dives()
  • pca_output_matrix – a Pandas DataFrame of the Principle Component Analysis results from cluster_dives()
  • insufficent_dives – a Pandas DataFrame of dives that could not be profiled from cluster_dives()
divebomb.export_to_netcdf(folder, data, dives, loadings, pca_output_matrix, insufficient_dives=None)

Will output dive profiles, loadings, PCA Matrix, and inssufficent dive into the indicated folder as netCDF files. Additionally subfolders will be output by cluster with separate files for each dive.

Parameters:
  • folder – the path to export all files to, the folder will be overwritten
  • dives – a Pandas DataFrame of the dive profiles and clusters, usually generated from cluster_dives()
  • loadings – a Pandas DataFrame of the Principle Component Analysis loadings from cluster_dives()
  • pca_output_matrix – a Pandas DataFrame of the Principle Component Analysis results from cluster_dives()
  • insufficent_dives – a Pandas DataFrame of dives that could not be profiled from cluster_dives()
divebomb.get_dive_starting_points(data, dive_detection_sensitivity, is_surfacing_animal=True, minimal_time_between_dives=120, surface_threshold=0, columns={'depth': 'depth', 'time': 'time'})
Parameters:
  • data – a dataframe needing a time and a depth column
  • is_surfacing_animal – a boolean indicating whether it’s an animal that is gaurantedd to surface between dives
  • dive_detection_sensitivity – a value bteween 0 and 1 indicating the peak detection threshold, the lower the value the deeper the threshold
  • minimal_time_between_dives – the minimum time in seconds that needs to occur before there can be a new dive segement
  • surface_threshold – the threshold at which is considered surface for surfacing animals, default is 0
  • columns – column renaming dictionary if needed
divebomb.profile_cluster_export(data, folder=None, columns={'depth': 'depth', 'time': 'time'}, is_surfacing_animal=True, dive_detection_sensitivity=None, minimal_time_between_dives=120, surface_threshold=0, at_depth_threshold=0.15)

Calls profile_dives, cluster_dives, and export_to_netcdf

Parameters:
  • data – a dataframe needing a time and a depth column
  • folder – a parent folder to write out to
  • columns – column renaming dictionary if needed
  • is_surfacing_animal – a boolean indicating whether it’s an animal that is gauranteed to surface between dives
  • dive_detection_sensitivity – a value bteween 0 and 1 indicating the peak detection threshold, the lower the value the deeper the threshold
  • minimal_time_between_dives – the minimum time in seconds that needs to occur before there can be a new dive segement
  • surface_threshold – the threshold at which is considered surface for surfacing animals, default is 0
Returns:

two dataframes for the dive profiles and the original data

divebomb.profile_dives(data, columns={'depth': 'depth', 'time': 'time'}, is_surfacing_animal=True, dive_detection_sensitivity=None, minimal_time_between_dives=120, surface_threshold=0, ipython_display_mode=False, at_depth_threshold=0.15)

Calls the other functions to split and profile each dive. This function uses the divebomb.Dive or divebomb.DeepDive class to profile the dives.

Parameters:
  • data – a dataframe needing a time and a depth column
  • columns – column renaming dictionary if needed
  • is_surfacing_animal – a boolean indicating whether it’s an animal that is gauranteed to surface between dives
  • dive_detection_sensitivity – a value bteween 0 and 1 indicating the peak detection threshold, the lower the value the deeper the threshold
  • minimal_time_between_dives – the minimum time in seconds that needs to occur before there can be a new dive segement
  • surface_threshold – the threshold at which is considered surface for surfacing animals, default is 0
  • ipython_display_mode – whether or not to display the dives
Returns:

two dataframes for the dive profiles, inssufficient dives, and the original data