Welcome to episode 10 of New voices in AI. This time we hear from Srija Chakraborty about her work using ML with large data sets to understand what happens on Earth at night.
I am a researcher at the Earth from Space Institute, Universities Space Research Association, working on applied machine learning techniques for remote sensing datasets with the Black Marble Science team. Before this, I completed my doctoral studies at Arizona State University studying machine learning and statistical signal processing approaches for remote sensing applications and then held a NASA Postdoctoral Program Fellowship at Goddard Space Flight Center, working on machine learning techniques for nighttime remote sensing.
I currently work with the Black Marble dataset, which captures the Earth at night from space with the VIIRS instrument (Visible Infrared Imaging Radiometer Suite) onboard the Suomi-NPP and NOAA-20 satellites. With Black Marble, we can see nighttime snapshots of the Earth daily in seven different bands or channels.
The Day/Night Band (DNB), in particular, captures nighttime lights from both natural and artificial sources creating a unique record of nocturnal phenomena ranging from city lights, nighttime fires, and smoke, gas flaring, shipping vessels, to clouds, aurorae, and gravity waves. Nighttime lights find applications in monitoring and studying the impact of disasters on electric grids, urbanization, detecting nighttime emissions, and are also used in creating repositories of natural phenomena such as clouds, aurorae, and aerosols. The science team also creates several derived higher-level products that are more suited for targeted analyses. Black Marble data records start in 2012 and we currently have a decade-long dataset, mapping the nighttime processes globally, daily.
As a result, conducting targeted scientific analyses that require detecting and extracting an event of interest is similar to a needle in a haystack situation. Added to this is the geographic and seasonal variation which means any targeted event shows a high degree of variation in its signature. I am currently exploring machine learning techniques that allow us to leverage the information embedded in the large data volume and monitor these occurrences with an emphasis on anomaly detection and time-series analysis. Additionally, in remote sensing, we also have to tackle the scarcity of labeled datasets to train machine learning algorithms and we are also creating methods that automate data labeling to reduce annotation effort from domain experts. At present, I am exploring these for detecting combustion, and analyzing nighttime light time series for power outages, electrification, urbanization, and other urban process changes.
From a remote sensing perspective, AL/ML capabilities are crucial to effectively monitor the large quantities of data collected by the ever-increasing fleet of satellites and we will likely get more reliant on ML inferences for near real-time insights and decision making.
Just like any other subfield, in earth observation, these inferences are heavily dependent on the machine learning workflow, dataset curation, label quality, and other assumptions. Transparently reporting all such design decisions as metadata and creating standards will continue to be increasingly important as ML-informed analyses will inform scientists, policy, and planning. Earth observation is also plagued by a scarcity of labeled datasets. Directions in ML that are geared towards minimizing the need for expert supervision will provide opportunities to monitor the growing dataset volume promptly. Additionally, keeping scientists in the loop will be essential to incorporate domain context. Deriving meaningful representations of domain knowledge from experts can be challenging but may offer pathways to generate physically interpretable results that may also be more useful for stakeholders. Finally, evaluating the impact of studies on science and society may also be essential to ensure that the emphasis is not just on what the model is trained to do, but also on why is it trained to do so to ensure that the study objectives are ethical and reducing scope of misuse of machine learning capabilities.
I am studying applied machine learning techniques for nighttime remote sensing that measures nightlights from a variety of natural and artificial sources. These provide timely insights into different events and changes that can be seen or measured from space and creates a global, daily record to study the Earth at night with use in both near real-time and long-term studies. Satellites have a unique vantage point when it comes to monitoring because of how often we get data even from remote locations and my research focuses on using machine learning methods to extract relevant signals to map and inform domain scientists and accelerate derived analyses. These include improving the detection of possible emitting sources using nightlights to better inform emission studies, and track adherence to mitigation policies.
We are also using daily global nighttime light time-series records to study city lights and track changes in urban areas due to disaster-induced outages, inform recovery efforts, and study urbanization, electricity access, and socio-economic trends, to name a few. The expected outcome from our research is higher-level targeted datasets that have been extracted from daily observations with large variabilities, to then inform further scientific analyses, planning, and stakeholders, who may not necessarily have the expertise to extract these relevant patterns otherwise. With increasing climate-related uncertainty, remote sensing datasets will play a vital role in studying changes and adapting to shifting weather regimes and is an exciting area of study for its broad impact on science and society.
I particularly enjoy working on machine learning for scientific applications, particularly remotely sensed datasets. The exciting aspect of AI/ML for science is the impact it can have on discoverability, especially for relatively less explored datasets. Often while looking at scientific datasets, we have little idea what to look for – we may not have a very good representation of interesting signals that buried in the data. ML is uniquely suited for such situations to parse through the dataset often with multiple dimensionalities, and point domain experts to interesting and potentially unknown instances with high scientific significance. ML inferences of Earth observation datasets are also informative for long-term studies, monitoring changes from space, assisting with decision making and policy development, and have a tremendous scope to be used for social good for studying remote areas and the impact of changing climate particularly on vulnerable communities. Implications of ML and earth observation for science and society is the most exciting and rewarding aspect for me.
I would be interested in using remote sensing observations and physical models for forecasting studies to analyze likely future pathways for informing planning and adaptation strategies.
I would hope that my work in producing satellite-derived insights to study global environmental change is accessible not just to scientists, but also to decision-makers and stakeholders for leveraging the growing volume of satellite observations effectively in a wide range of applications.