RESEARCH

In August 2022, I began working on a mentored research project titled ‘Deep Multimodal Learning for Surveillance’ to investigate gathered annotated security footage in public spaces and detect security threats in areas such as shops, banks, and schools. The project was being undertaken by a team of researchers as part of the CS229: Machine Learning course convened by Professor Andrew Ng at Stanford University to mitigate instances of gun violence in the United States. I worked on the open source neural network CLIP to write a program that would help classify objects within the collected video footage which would then help in ascertaining potential security threats and sound an alarm to relevant authorities. The dataset employed consisted of publicly available footage of 200 images of vandalism, riots and shootings, which each class annotated with 5 human captions.

The primary objectives of the research project were to build and curate a dataset of surveillance images across categories and assess state-of-the-art image captioning models on the surveillance dataset to build a baseline. Following this, the algorithm introduced a SIM SCORE as a new metric to compare image captioning results and consequently improved on existing state-of-the-art models baseline on this dataset in terms of screening security footage. In December 2022, I helped author a final report capturing the research at the end of the project.

I have submitted my paper to various outlets including the National High School Data Science Competition by Veritas AI in January, 2024. So far, this paper has been accepted by the Synopsys Science Fair’s Review Committee in January, 2024 and was presented at the Synopsys Science Fair Championship in March, 2024. Additionally, it was also published in the International Journal of Science and Research (IJSR) in May, 2024.