About Algal blooms:
An algal bloom is the rapid increase or accumulation in the population of algae such as phytoplankton in freshwater or seawater and oftentimes disrupts the ecosystem.Freshwater algal blooms are the result of an excess of nutrients, which may originate from fertilizers that are applied to land for agricultural purposes. Algae is short-lived, and the result is a high concentration of dead organic matter which starts to decay. The decay process consumes oxygen in the water, resulting in hypoxic conditions. Without sufficient oxygen in the water, animals and plants may die off in large numbers. In seawater, besides hypoxia, algal blooms may cause the production of neurotoxins lethal to fish, seabirds, sea turtles, and marine mammals. Human illness or death via consumption of seafood or drinkable water contaminated by toxic algae is also possible.
The severity of this problem can be seen in the news.
List of toxic microalgae: http://www.marinespecies.org/hab/aphia.php?p=taxlist
Our Solution
Our algorithm is based on the ID3 algorithm ( invented by J. Ross Quinlan ) through normalization of data by bucketing and the use of random forest which is an ensemble learning method.
Technically, our tool can predict the amount of toxic phytoplanktons populations when there are many different factors which affect the quality of water. Then it visualizes in form of decision trees (with the help of an API) which of the factors are most important for the previous prediction. And also it gives a percentage of accuracy of these decision trees.
We believe that after the data analysis from our algorithm, we can find links to be researched between factors that were never found before, if we regularly add different variables each time, even from data we thought previously that were unrelatable. This way human actions that can create the ideal environment for algal blooms can be more easily detected and therefore prevented. As a result our ecosystem can be preserved more effectively.
For example, after using our software written in JAVA, we found a direct link:
Between Akashiwo Sanguinea and Phaeophytin
https://drive.google.com/file/d/1rFfIyij0Mfrjid4VD...
Between Prorocenrtum spp. and Water Temperature
https://drive.google.com/file/d/1rFfIyij0Mfrjid4VD...
And between Alexandrium spp. and Water Temperature
https://drive.google.com/file/d/1K7hIWuHfjZoRbc5Kp...
The dataset included 1930 lines and 22 columns.
The algorithm learned from a part of the dataset and it could predict the other part with an accuracy of greater than 80 %.
The factors we tested are:
https://drive.google.com/file/d/1bDLfAGX9J-fahZpBl...
The dataset used in our algorithm included metrics from Cal Poly Pier, Santa Cruz Wharf and Scripps Pier in Southern California.
Datasets retrieved from:
http://marine.copernicus.eu/services-portfolio/access-to-products/?option=com_csw&view=details&product_id=INSITU_GLO_NRT_OBSERVATIONS_013_030
https://neo.sci.gsfc.nasa.gov/view.php?datasetId=MY1DMM_CHLORA
http://www.sccoos.org/query/?project=Harmful%20Algal%20Blooms&study[]=Scripps%20Pier
https://neo.sci.gsfc.nasa.gov/view.php?datasetId=MYD28M
https://modis.gsfc.nasa.gov/data/dataprod/chlor_a.ph
https://open.canada.ca/data/en/dataset/ccbffcfc-d81d-438a-ab1c-08df0e7525d8
Other references:
https://dblp.uni-trier.de/pers/hd/q/Quinlan:J=_Ros...
https://earth.esa.int/web/guest/missions/esa-opera...
https://www.epa.gov/enviroatlas/forms/enviroatlas-...
https://www.epa.gov/water-research/cyanobacteria-a...
https://www.nasa.gov/feature/goddard/2019/nasa-hel...
https://earthdata.nasa.gov/learn/sensing-our-plane...
https://earthobservatory.nasa.gov/images/88311/blo...
https://myfwc.com/research/redtide/general/cyanobacteria/