Duration
Nov 2025 - Dec 2025
Overview
Developed a real-time stereo vision system that computes object distances using two synchronized ESP32-CAMs and YOLO11m-seg on a FastAPI server.
Instead of traditional dense disparity, the system performs sparse stereo matching only on YOLO-detected object regions, enabling efficient and semantically meaningful depth estimation at 8-9 FPS on a GTX1650 GPU.
What was built
- Dual ESP32-CAM synchronization pipeline
- FastAPI-based inference and processing server
- GPU segmentation with YOLO11m-seg
- Sparse stereo disparity on segmented object regions
Outcome
Built a fully standalone pipeline from embedded camera synchronization to GPU-based segmentation and disparity computation, enabling real-time spatial awareness on low-cost hardware.
Depth pipeline demo
Depth pipeline in action: synchronized capture, segmentation, and sparse disparity output.