Do Van Minh

I am currently a Research Engineer at the Singapore-MIT Alliance for Research and Technology (SMART), a research center of the Massachusetts Institute of Technology (MIT) in Singapore. At SMART MIT, I work under the supervision of Professor Sanjay Sarma and Professor Archan Misra on the M3S project, where I develop AI models in computer vision and natural language processing (NLP) to enhance collaboration between AI, humans, and machines, focusing particularly on education and robotics.

Prior to joining SMART MIT, I worked as a Research Associate at the College of Computing and Data Science at Nanyang Technological University (NTU), where I focused on computer vision and deep learning for embedded computing platforms. I also earned a Master of Engineering in Computer Science from NTU, where my dissertation, supervised by Professor Lam Siew Kei, addressed multi-camera tracking for smart urban mobility.

Email: vmdo@mit.edu
CV / Linkedin

Research

My research is centered on developing lightweight Computer Vision and Deep Learning algorithms tailored for resource-constrained embedded platforms, with a particular focus on enhancing Intelligent Transportation Systems in Smart Cities. This involves creating solutions that are not only innovative but also practical for real-world applications.

In this realm, I have honed my skills across various technical frameworks, including OpenCV, PyTorch, TensorFlow, ONNX, NCNN and ROS. My expertise covers a wide range of areas from data preparation and model optimization techniques like quantization and pruning, to diverse learning methodologies such as transfer learning, knowledge distillation, and a mix of supervised, self-supervised, and unsupervised techniques. My practical experience is evidenced by my success in complex tasks like change detection, image classification, object detection, segmentation, and multiple object tracking, with a special emphasis on deploying these advanced models on embedded devices.

If you are interested in Computer Vision, Deep Learning, and Embedded Systems, feel free to contact me for mentorship in these areas.

Work Experience

Autonomous Drone in Maze

College of Computing and Data Science, NTU, Nov. 2023 - Dec.2024

In the Autonomous Drone in Maze project, I built autonomous drone system focused on real-time sensing, mapping, planning, and navigation within a maze environment. My work involved creating a seamless simulation environment through the integration of Hardware-in-the-loop (HIL) pipelines and the implementation of real-time algorithms. This setup allowed drone flight simulation on a workstation while running SLAM, mapping, planning, navigation, and control modules in real-time on the Jetson. I optimized and allocated computing resources across the embedded CPU-DLA-GPU, which enhanced the performance of SLAM, mapping, planning, navigation, and object detection. These efforts enabled the drone to achieve efficient flight speeds and navigate through dense obstacle environments, showcasing the system's capability to operate effectively in challenging conditions on the Jetson.

Infrastructure to Vehicle (I2V) Communication for Driving Assistance

School of Computer Science and Engineering, NTU, Nov. 2021 - Nov. 2023

In the Infrastructure to Vehicle (I2V) Communication for Driving Assistance project, I led the development of real-time, resource-efficient computer vision algorithms to facilitate urban traffic management through embedded systems, optimizing for both cost and power consumption. Leveraging object detection and tracking techniques, I spearheaded the analysis of traffic flows across 19 camera feeds, significantly contributing to smarter city infrastructure. My role also encompassed the design and refinement of a lightweight AI model capable of vehicle tracking and crowd counting. This innovation empowered four Odroid N2+ edge devices to simultaneously process live feeds from nine cameras and establish seamless communication with Autonomous Vehicles. Furthermore, I guided and mentored a team of 15 student assistants in the nuances of AI data handling, including meticulous preparation and annotation.

Sensing and Management for Agile Transport

School of Computer Science and Engineering, NTU, Nov. 2016 - Nov. 2021

In the Sensing and Management for Agile Transport project, I played a pivotal role in setting up a comprehensive camera network, configuring 27 cameras across 8 key locations to capture high-quality video data for traffic analytics. My work involved developing and refining lightweight computer vision algorithms, training, and optimizing AI models on embedded platforms for real-time applications such as traffic density estimation, illegal parking and entry detection, crowd counting, and implementing vision-based smart traffic light systems. I also led the supervision of algorithmic enhancements and conducted extensive field testing on various embedded platforms including Odroid XU4 and Odroid N2+, significantly contributing to traffic management advancements both at NTU and across Singapore.

Simultaneous Localization and Mapping (SLAM)

School of Computer Science and Engineering, NTU, Aug 2020 - Dec 2022

In my work on Simultaneous Localization and Mapping (SLAM), I enhanced the robustness of visual SLAM by advancing dynamic outlier removal techniques within RGB-D and stereo data, which significantly improved pose estimation and map precision. I also progressed loop closure detection for long-term SLAM by integrating location semantics, reducing false matches and efficiently deploying these improvements on embedded systems..

Projects

Autonomous Drone in Maze
NTU, Singapore, Nov 2023 - present

Teammates: Van Minh Do, Zhang Dongshuo, Siew-Kei Lam

This project aims to develop a comprehensive autonomous drone system capable of real-time sensing, mapping, planning, and navigation within a maze to avoid obstacles and detect individuals. The integrated system will operate in real-time on an embedded computing platform, enabling the drone to fly at high speeds.

Shoplifting Behavior Detection
NTU, Singapore, May 2023 - present

Teammates: Van Minh Do, Wu Meiqing, Siew-Kei Lam, Thambipillai Srikanthan

This project focuses on leveraging computer vision and deep learning technologies to detect shoplifting behaviors in retail environments in real-time. By analyzing video streams, the system distinguishes between typical customer actions and potential theft activities. Embedded systems facilitate swift processing and response, allowing for immediate alerts or preventive measures. The objective is to minimize retail losses by implementing an advanced, automated surveillance solution.

Infrastructure to Vehicle (I2V) Communication for Driving Assistance
NTU, Singapore, Nov 2021 - Nov 2023

Teammates: Van Minh Do, Wu Meiqing, Siew-Kei Lam

This project aims to study the viability of I2V communication for driving assistance in public transportation facilities. In particular, visual analytics performed on the edge computing devices mounted on the public transportation infrastructure will be communicated to vehicles (such as Autonomous Vehicles) to enhance their navigation in the vicinity.

Vision based Smart Traffic Light
NTU, Singapore, Nov. 2021

Teammates: Van Minh Do, Alok Prakash, Siew-Kei Lam

The Smart Traffic Light field trial is geared towards reducing the waiting time of vehicles at traffic light stops. Executed at the company site of our industry collaborator, the project has achieved success in developing a lightweight computer vision algorithm and an optimized deep learning model for real-time vehicle detection and tracking. The system operates continuously on resource-constrained embedded computing platforms, ensuring 24/7 effectiveness. Notably, it maintains its efficiency amidst changing illumination and diverse weather conditions in Singapore.

Virtual Right of Way
NTU, Singapore, Nov. 2020

Teammates: Van Minh Do, Nirmala Ramakrishnan, Alok Prakash, Siew-Kei Lam, Thambipillai Srikanthan

The project has successfully developed a lightweight computer vision algorithm and an optimized deep learning model tailored for real-time operations on a low-cost embedded computing platform. These advancements ensure dependable, real-time performance on computing platforms with constrained resources, leading to efficient power conservation. The technology excels in detecting and tracking various bus types across diverse conditions, encompassing different weather patterns, illumination fluctuations, and assorted traffic scenarios in Singapore.

Vision based Crowd Counting
NTU, Singapore, Sept. 2019

Teammates: Wu Meiqing, Van Minh Do, Alok Prakash, Siew-Kei Lam, Thambipillai Srikanthan

This project is dedicated to accurately counting individuals waiting at bus stops, aiming to estimate bus demand and enhance the efficiency of the bus schedule system. Achieving this goal involves the development of a lightweight computer vision system and an optimized deep learning model. These innovations enable real-time and resilient performance on resource-constrained embedded computing platforms. Notably, the project achieves significant reductions in power consumption, facilitating continuous operation 24/7. This capability ensures precise people counting even amidst fluctuations in illumination and varying degrees of crowded scenarios in Singapore.

Vision based Illegal Parking Detection
NTU, Singapore, Sept. 2018

Teammates: Kratika Garg, Van Minh Do, Alok Prakash, Siew-Kei Lam, Thambipillai Srikanthan

The primary objective of this project is to identify instances of illegal vehicle parking in prohibited areas. To achieve this goal, the project has implemented a lightweight computer vision system and an optimized deep learning model. These innovations enable real-time detection, tracking, and evidence collection for illegal parking on resource-constrained embedded computing platforms. Operating 24/7, the system is resilient to changes in illumination and varying weather conditions in Singapore.

Vision based Illegal Entry Detection
NTU, Singapore, Oct. 2017

Teammates: Wu Meiqing, Van Minh Do, Alok Prakash, Siew-Kei Lam, Thambipillai Srikanthan

This project is designed to identify instances of illegal vehicle entry into the bus lane during prohibited durations. To achieve this, the project has developed a lightweight computer vision system and an optimized deep learning model. These technological advancements enable real-time detection, tracking, and evidence collection for non-bus unauthorized entries. The system operates continuously on resource-constrained embedded computing platforms, ensuring 24/7 functionality even in the presence of changing illumination and various weather conditions in Singapore.

Vision based Traffic Density Estimation
NTU, Singapore, Nov. 2016

Teammates: Kratika Garg, Nirmala Ramakrishnan, Van Minh Do, Alok Prakash, Siew-Kei Lam, Thambipillai Srikanthan

The primary goal of this project is to estimate traffic density across different lanes on the road. To achieve this, the project has developed a lightweight computer vision algorithm specifically designed to detect the percentage of lane occupancy. This algorithm operates seamlessly on resource-constrained embedded computing platforms, ensuring continuous functionality 24/7. It is adept at handling changes in illumination and various weather conditions, providing reliable traffic density estimates on roads in Singapore.

Theme from Jon Barron. Developed using Jekyll.