Multi-Object Tracking in Complex Scenes

Extending human perception through innovative tracking algorithms

"The real voyage of discovery consists not in seeking new landscapes, but in having new eyes." - Marcel Proust

🧩 Problem Statement

Overview
Key Challenges
Research Impact

Multi-Object Tracking (MOT) is a fundamental computer vision challenge that requires tracking multiple objects simultaneously throughout a video sequence. This is essential for applications like autonomous driving, surveillance systems, and sports analytics.

"The real voyage of discovery consists not in seeking new landscapes, but in having new eyes." - Marcel Proust

My research mission is to extend human perception through intelligent visual tracking systems β€” enabling machines to see, reason, and remain aware where human attention cannot.

Occlusion challenges in tracking

πŸ‰ DragonTrack

Overview
Method
Results
DragonTrack Architecture

DragonTrack: Transformer-Enhanced Graphical Multi-Person Tracking in Complex Scenarios

Published in WACV 2025

DragonTrack introduces a novel approach to multi-object tracking by combining transformer-based detection with graph modeling for inter-object relationships. Our framework achieves state-of-the-art performance on standard benchmarks while maintaining efficient computational requirements.

Key innovations:

  • Transformer-based detection for spatial and appearance features
  • Graph modeling for inter-object relationships
  • Robust identity preservation in crowded scenes

πŸ”Ž MOTE (Multi-Object Tracking Enhancement)

Overview
Method
Results

MOTE: More Than Meets the Eye

Optical Flow-Based Multi-Object Tracking with Prolonged Occlusion Handling

Under Review, International Conference on Machine Learning (ICML), 2025

MOTE builds upon traditional tracking frameworks by incorporating optical flow and softmax splatting for disocclusion features. This approach significantly reduces identity switches in scenes with prolonged occlusions.

Key contributions:

  • Optical flow estimation for tracking through occlusions
  • Softmax splatting for disocclusion-aware representations
  • Enhanced track embedding module for identity continuity
MOTE framework illustration

🌐 UniTrack

Overview
Method
Results
UniTrack architecture diagram

UniTrack: A Differentiable Graph-Based Loss for Robust Multi-Object Tracking

Under Review, International Conference on Computer Vision (ICCV), 2025

UniTrack presents a unified tracking framework with a graph-based differentiable loss function that can be integrated with existing tracking architectures. This approach eliminates the need for scenario-specific tracking systems.

UniTrack delivers the following advantages:

  • Unified loss function for tracking optimization
  • Seamless integration with existing architectures
  • Addresses post-occlusion, temporal, and cross-subject errors

πŸ‘€ Look Around and Pay Attention (LAPA) LAPA Logo

LAPA is our current research focus for NeurIPS, introducing a novel attention mechanism for multi-object tracking that combines local and global context information to improve tracking accuracy.

Key innovations in LAPA include:

  • Dual-attention mechanism for context awareness
  • Long-range dependency modeling
  • Adaptive feature aggregation

Current Status: Under development for NeurIPS submission

LAPA concept illustration
Preliminary LAPA results

πŸ† Research Progress

Completed
In Progress
4 of 8 key milestones achieved

Achievements

Occlusion Handling

MOTE preserves tracking during occlusions using optical flow techniques

Crowded Scene Tracking

DragonTrack reduces identity switches in dense environments

Temporal Consistency

UniTrack maintains stable tracking over extended video sequences

Framework Integration

Plug-and-play optimization for MOTR, Trackformer, and FairMOT

Current Challenges

Residual ID Switches

Exploring attention mechanisms for ambiguous re-identification cases

Long-Term Memory

Improving identity retention across extended temporal gaps

Similar Appearance

Integrating motion patterns to distinguish visually similar subjects

Multi-Camera Scalability

Robust identity association across multiple camera views with minimal calibration

🌟 Real-World Applications

AI Overhead View
πŸ‘Ά

Multi-Toddler Tracking

Monitoring multiple toddlers in daycare environments to ensure safety and analyze play patterns. Our algorithms can distinguish between similar-looking children even during fast movements and complex interactions.

Similar Triplet Challenge
🧩

Early Autism Detection

Using movement and interaction tracking to identify early signs of autism in young children. Our fine-grained motion analysis can detect subtle behavioral patterns that may indicate neurodevelopmental differences.

Small Sleep
πŸ’€

Infant Sleep Tracking

Non-invasive monitoring of infant sleep positions and movements to prevent SIDS and analyze sleep patterns. Our contactless tracking technology provides peace of mind to parents while collecting valuable health data.

🚦

Traffic Safety

Advanced intersection monitoring that can predict potential conflicts between vehicles, cyclists, and pedestrians. Our systems process complex urban traffic flows in real-time to prevent accidents before they happen.

⚽

Sports Analytics

Tracking player movements, team formations, and ball possession in team sports without requiring wearable sensors. Our technology is revolutionizing coaching strategies and broadcast analytics with unprecedented precision.

Autonomous Safety
πŸš—

Autonomous Driving

Enhancing pedestrian safety through robust tracking in challenging conditions like darkness, rain, and crowds. Our algorithms enable vehicles to anticipate human movement intentions for safer urban mobility.

Our research creates bridges between cutting-edge computer vision technology and life-changing applications that impact healthcare, safety, and human potential.