Self-calibrating cross-camera homography for real-time ghost prediction in multi-camera person tracking[P]

Our take

In multi-camera person tracking, maintaining accuracy when one camera loses sight of an individual poses significant challenges due to differing coordinate systems. This approach introduces self-calibrating cross-camera homography for real-time ghost prediction, utilizing simultaneous observations to establish foot-point correspondences. By applying the cv2.findHomography() method alongside RANSAC, the system dynamically adapts its mapping between cameras. It offers three fallback paths for reliable tracking, ensuring precision while accommodating varying camera angles. Explore this innovative solution further at the provided GitHub repository.

The problem: In multi-camera tracking, when camera A loses track of a person but camera B still sees them, naive approaches extrapolate pixel coordinates linearly. This fails immediately because cameras have completely different coordinate systems. A person at pixel (400, 300) on camera B might be at (800, 500) on camera A, depending on relative position and angle.

Approach: When both cameras simultaneously observe the same person (matched via 64-dim HSV appearance descriptors, L2-normalized, EMA-smoothed at alpha=0.3), we record foot-point correspondence pairs. Bottom-center of the bounding box in each view projects to the same physical ground-plane point.

After 4+ such pairs, cv2.findHomography() + RANSAC gives a 3x3 matrix H mapping camera B pixel space to camera A. System auto-relearns every 5 new pairs and monitors reprojection error, flushing H if it spikes (camera moved).

Three fallback paths:

Path A (H-PROJ, green): homography projection from any source camera with valid H. Most accurate.
Path B (EXTRAP, red): pixel extrapolation with adaptive budget min(250px, 80 + 40*t). Last resort.
Path C (WORLD, orange): world-coordinate pinhole projection from fused 3D Kalman state. Always available.

Costs:

Homography re-estimation: < 0.1ms (called every 5 new pairs)
Per-prediction projection: < 0.001ms

Tracking: Hungarian assignment with 0.6 * IoU + 0.4 * cosine appearance cost. DeepSORT (MobileNet) as primary, falls back to Hungarian (scipy), then centroid.

Sensor trust: Each camera earns trust [0.1, 1.0] via consistency. High-innovation measurements get down-weighted. Kalman measurement noise R scales per update based on confidence, bbox area, and sensor trust.

Full implementation: github.com/mandarwagh9/overwatch. 57 unit tests covering Kalman, homography, tracking. CI on GitHub Actions.

Limitations: ground-plane homography breaks for elevated cameras with steep angles. Re-ID via HSV histograms is weak for people in similar clothing at close spatial proximity.

Curious if anyone has tackled non-ground-plane cross-camera projection or used learned embeddings instead of HSV histograms for re-ID at this inference budget.

submitted by /u/Straight_Stable_6095
[link] [comments]

Tagged with

#financial modeling with spreadsheets#natural language processing for spreadsheets#generative AI for data analysis#Excel alternatives for data analysis#rows.com#cloud-based spreadsheet applications#real-time data collaboration#real-time collaboration#self-service analytics tools#self-service analytics#cross-camera homography#real-time ghost prediction#multi-camera tracking#pixel coordinates#64-dim HSV appearance descriptors#RANSAC#foot-point correspondence#3x3 matrix H#reprojection error#Hungarian assignment