In work presented at the 32nd International Joint Conference on Artificial Intelligence (IJCAI 2023), Kshitiz, Sonu Shreshtha, Ramy Mounir, Mayank Vatsa, Richa Singh, Saket Anand, Sudeep Sarkar and Sevaram Mali Parihar investigate using computer vision techniques to monitor large flocks of birds. In this interview, Kshitiz tells us more about this research.
In our work, Long-term Monitoring of Bird Flocks in the Wild, published in IJCAI 2023, we delve into developing and applying computer vision techniques and datasets tailored for non-invasive monitoring and analysis of migratory bird flocks in their natural habitats. The aim is to understand the behavior and ecology of migratory birds through automated video analysis with minimal human intervention, thereby bolstering conservation initiatives.
The core technical challenges associated with wildlife monitoring arise from the uncontrolled, outdoor nature of the imagery (both images and videos) capturing large flocks of migratory birds over several months. The inherent variabilities in that footage depict true monitoring conditions, including illumination, weather, complex motion, and birds’ poses on the ground and in flight. Therefore, to solve those challenges, we intended to collect extensive high-resolution video data at sites, primarily Khichan, which hosts thousands of Demoiselle Cranes over winter months. Furthermore, additional sample data featuring diverse bird species from the UNESCO World Heritage site of Keoladeo National Park bird sanctuary have been collected.
The research aims to curate bird samples labeled under human supervision to aid researchers in this area. To analyze the annotated imagery, the paper benchmarks and seeks to improve computer vision techniques like crowd counting, segmentation, detection, and tracking. Preliminary results led to a new annotated image and video dataset with unique challenges compared to existing ones. Experiments showed contemporary methods’ limitations, especially on densely populated flocks. This highlights the need for specialized techniques adept at real-world wildlife monitoring. There are plans to expand the video dataset annotations and develop new algorithms inspired by self-supervised learning, active learning, and cognitive science. These algorithms aim to understand bird behavior and interactions better over time.
The research aims to develop techniques to analyze bird behaviors from imagery collected in natural environments automatically. This capability can enable large-scale ethograming of wildlife without human intervention, providing unbiased insights into intricate ecological behaviors. However, creating comprehensive ethograms requires tracking movements without relying on manual labeling. Further, acquiring these datasets is time-intensive and demands considerable human effort, making the process inefficient and often impractical for large-scale applications. This is challenging given their complex motion, postures, occlusions, cluttered backgrounds, and lighting variability. Understanding specific behaviors from automated monitoring helps gain insights into:
This research is crucial considering the alarming worldwide bird population decline driven by threats like habitat loss, climate change, and urbanization. However, developing these systems requires developing advanced vision techniques that can help extract nuanced contextual information from the complex natural environment.
The research methodology utilizes a comprehensive approach combining data collection, advanced computer vision techniques, and collaborative efforts with local experts to monitor and analyze the behavior of migratory birds in their natural habitats. The study aims to address the existing challenges of non-invasive wildlife monitoring and provide critical insights to develop informed conservation and mitigation policies. A substantial component of the methodology involved collecting extensive data representing true monitoring conditions across several months. The objective was to compile footage (images and videos) that could subsequently be analyzed to derive meaningful conclusions. A key acknowledgment in the development of the algorithm is that occlusion poses a significant challenge that must be mitigated to obtain better results. Therefore, we curated a unique high-resolution (image and video) dataset of the migratory cranes that travel to western India every year during winter, with images of up to 4K quality showcasing flock density under diverse real-world conditions such as variable lighting and perspectives. The research also introduces an end-to-end pipeline that accepts images as input and is further analyzed using several tasks, including crowd counting/density estimation and segmentation, to get cues regarding the collective behavior of avian flocks. Additionally, to overcome manual annotation challenges, we aim to leverage active and self-supervised learning techniques for accurate flock estimation.
One major finding of our research involved curating a novel bird monitoring dataset comprising high-resolution images and videos with point annotations. The dataset contained highly dense flocks of birds in their natural habitat. The research identified that existing tools, such as the megadetector toolkit, which was trained on the largest public and private diverse wildlife datasets, struggled to detect all birds in the images from the new dataset. This underscores the need to develop specialized computer vision techniques tailored for wildlife datasets, especially those with high-density subjects like flocks of birds. Upon conducting experiments for several vision tasks on the proposed dataset, we observed that pre-training models allowed for improved performance in specific tasks like bird counting, as the models could learn relevant features that were then refined upon fine-tuning. The research highlighted the shortcomings of several state-of-the-art algorithms when applied to the proposed dataset. In the case of segmentation, results from the recent Segment Anything model also showcased limitations in our cases, depicting the dataset’s challenging nature and the inherent challenges that occur with wildlife datasets.
The research is part of a broader project on Video Analytics for Wildlife Conservation under the Indo-US collaboration. By collaborating with local experts, especially those familiar with the birds at the Khichan sites, the team gains insights that are invaluable for understanding these birds’ ecological and evolutionary processes. The insights derived from this research can be pivotal in predicting future impacts on bird species and in formulating informed conservation and mitigation policies.
Our research team, in pursuit of advancing wildlife monitoring, is exploring multiple approaches in designing novel algorithms that address the unique challenges in non-invasive wildlife monitoring. We are keen on improving the crowd counting and density estimation techniques, enhancing semantic segmentation and species identification with fewer samples. To overcome the manual annotation challenges, especially with the dense flocks, we are also using synthetic data generation along with leveraging unlabeled data through the potential of self-supervised learning. Overall, the future work aims to push the current boundaries, enabling better understanding of bird behavior and ecology.
Kshitiz, is a final-year Computer Science undergraduate student at IIT Jodhpur. His areas of interest include machine learning, deep learning, and computer vision. |
Long-term Monitoring of Bird Flocks in the Wild, Kshitiz, Sonu Shreshtha, Ramy Mounir, Mayank Vatsa, Richa Singh, Saket Anand, Sudeep Sarkar, Sevaram Mali Parihar.