Duration

Nov 2025 - Dec 2025

Overview

Developed a real-time stereo vision system that computes object distances using two synchronized ESP32-CAMs and YOLO11m-seg on a FastAPI server.

Instead of traditional dense disparity, the system performs sparse stereo matching only on YOLO-detected object regions, enabling efficient and semantically meaningful depth estimation at 8-9 FPS on a GTX1650 GPU.

What was built

  • Dual ESP32-CAM synchronization pipeline
  • FastAPI-based inference and processing server
  • GPU segmentation with YOLO11m-seg
  • Sparse stereo disparity on segmented object regions

Outcome

Built a fully standalone pipeline from embedded camera synchronization to GPU-based segmentation and disparity computation, enabling real-time spatial awareness on low-cost hardware.

Depth pipeline demo

Depth pipeline in action: synchronized capture, segmentation, and sparse disparity output.