I'm an upcoming PhD student in the Electrical and Computer Engineering department at Northeastern University's College of Engineering. I work under the supervision of Prof. Sarah Ostadabbas in the Augmented Cognition Lab (ACLab). My research centers on motion-centric video understanding and reasoning — building systems that don't merely extrapolate visual patterns, but deduce causal structure from observed motion. I'm particularly interested in how vision-language models can move beyond statistical pattern completion toward genuine physical and spatial reasoning. This connects to broader work in multi-object tracking, human-machine interaction, and medical AI.
Bishoy M. Galoaa
PhD Student • Machine Learning Researcher • Engineer
"The real question is not whether machines think but whether men do. The mystery which surrounds a thinking machine already surrounds a thinking man."
– B.F. Skinner
About Me
Motion-Centric Reasoning & Vision-Language Models
A core challenge I'm pursuing: most current VLMs act as stochastic extrapolators. Trained on massive visual corpora, they bias toward the statistically probable — a pendulum seen swinging to Point B is predicted to return to Point A, because that completes a symmetric arc. But this ignores the physics: initial conditions, energy dissipation, non-conservative forces. The model pattern-matches; it doesn't reason.
The Extrapolator vs. The Visual Observer
Extrapolator (current VLMs): sees the arc → assumes a periodic function → predicts completion. Prioritizes visual symmetry over physical constraints.
Visual Observer (the goal): observes initial state, recognizes constraints (fixed pivot, gravity, friction), accounts for hidden dissipative variables → deduces that the return height must be hfinal < hinitial.
Walter Lewin famously put his nose — and his life — on the line to demonstrate this principle:
Prof. Walter Lewin releases a 15 kg pendulum from his chin. He trusts physics — a true push would smash his face. Energy is conserved, not extrapolated.
My work on motion-centric systems aims to bridge this gap: from pattern completion to causal deduction.
- Motion-Centric Video Understanding: Query-free motion discovery and description systems that autonomously identify and describe events in videos, and text-to-motion generation (Lang2Motion) enabling natural language control of motion synthesis.
- Multi-Object Tracking: Transformer-enhanced and graph-based algorithms for tracking in complex environments — including occlusions, crowded scenes, and multi-camera setups.
- Spatial Reasoning in VLMs: Learning structured spatial and counting reasoning from pedagogically-organized video content.
- Human-Machine Interaction: Motion analysis and uncertainty-aware anomaly detection for exoskeleton control and rehabilitation.
- Medical AI: Personalized prognostic models for oncology via interpretable machine learning.
Publications
2026
-
MoReGen: Multi-Agent Motion-Reasoning Engine for Code-based Text-to-Video Synthesis
X. Bai, H. Liang, B. Galoaa, et al. — CVPR 2026 New -
UniTrack: Differentiable Graph Representation Learning for Multi-Object Tracking
B. Galoaa, X. Bai, U. Nandi, S. Amraee, S. Ostadabbas — ICLR 2026 Nectar Track · Spotlight Oral · 3DV 2026 -
K-Track: Kalman-Enhanced Tracking for Accelerating Deep Point Trackers on Edge Devices
B. Galoaa, P. Closas, S. Ostadabbas — WACVW 2026 -
LAPA: Look Around and Pay Attention — Multi-camera Point Tracking Reimagined with Transformers
B. Galoaa, X. Bai, S. Moezzi, et al. — 3DV 2026 Oral Best Paper Nominee
2025
- SPARTAN: Spatiotemporal Pose-Aware Retrieval for Text-Guided Autonomous Navigation
X. Bai, S. A. Sreeramagiri, B. Galoaa, et al. — BMVC 2025 - More Than Meets the Eye: Enhancing Multi-Object Tracking with Softmax Splatting and Optical Flow
B. Galoaa, S. Amraee, S. Ostadabbas — ICML 2025 - Dragontrack: Transformer-Enhanced Graphical Multi-Person Tracking
B. Galoaa, S. Amraee, S. Ostadabbas — WACV 2025 - Classification of Infant Sleep–Wake States from Natural Overnight In-Crib Videos
S. Moezzi, M. Wan, B. Galoaa, et al. — WACVW 2025 - Advancing Prognostics in Oncology: ML Models for Predicting Survival in Undifferentiated Pleomorphic Sarcoma
A. G. Girgis, B. M. Galoaa, et al. — Annals of Surgical Oncology, 2025 - Predicting Long-Term Survival in Myxofibrosarcoma
S. Rampam, A. G. Girgis, B. M. Galoaa, et al. — Surgical Oncology, 2025 - Extraskeletal Osteosarcoma: MicroRNA Patterns
S. A. Lozano-Calderon, B. M. Galoaa, et al. — CTOS 2025 - Real-Time Uncertainty Detection for Safe, Adaptive Exoskeleton Control
B. Galoaa et al. — ICRA Workshops 2025
2024
- Multiple Toddler Tracking in Indoor Videos
S. Amraee, B. Galoaa, et al. — WACVW 2024 - A Personalized Predictive Model for Salivary Gland Cancer
A. Girgis, B. Galoaa, A. Devaiah — COSM 2024 - A Novel AI Model for Optimizing Treatment of Salivary Gland Malignancies
A. Girgis, B. Galoaa, A. Devaiah — AAO-HNSF 2024 - Bias or Best Fit? SEER vs. NCDB in ML for Osteosarcoma Survival
A. G. Girgis, B. M. Galoaa, et al. — Clinical Orthopaedics and Related Research, 2024 - Machine Learning–Assisted Decision Making in Orthopaedic Oncology
P. A. Rizk, M. R. Gonzalez, B. M. Galoaa, et al. — accepted for publication
Preprints & Under Review
- Structured Over Scale: Learning Spatial Reasoning from Educational Video
B. Galoaa, X. Bai, and S. Ostadabbas — under review, 2026 Preprint - Track and Caption Any Motion: Query-Free Motion Discovery and Description in Videos
B. Galoaa and S. Ostadabbas — under review, 2026 Preprint - Lang2Motion: Bridging Language and Motion Through Joint Embedding Spaces
B. Galoaa, X. Bai, and S. Ostadabbas — under review, 2026 Preprint - Uncertainty-Aware Ankle Exoskeleton Control
F. M. Tourk, B. Galoaa, S. Shajan, A. J. Young, M. Everett, and M. K. Shepherd — arXiv preprint arXiv:2508.21221, 2025 arXiv - Cognitive Learning through Hierarchical Prototypes and Dynamic Focus
B. Galoaa, S. Ostadabbas — under review, 2025 Preprint - ML Algorithms for Survival Prediction in Synovial Sarcoma
J. O. Werenski, S. Rampam, B. Galoaa, et al. — under review Preprint
30-Day Novel Ideas Challenge
This section highlights a creative experiment where I aimed to build one novel idea per day over a month.
- Inattention NotaBene – A novel regularization method that strategically "forgets" less important features through a stacked dropout mechanism, offering an alternative to traditional attention mechanisms.
- ROCKET – An innovative path planning system that identifies collision paths first to find optimal trajectories in complex environments using inverse collision sampling.
- Secretary Template Matching – An online template matching algorithm inspired by the Secretary Problem, dynamically adjusting thresholds based on observed data patterns for improved real-time decision-making.
- 2F1B – A novel optimization technique introducing controlled oscillation in neural network training by alternating two forward steps with one backward step, enhancing the optimization trajectory.
- Knock-Knock – An optimization algorithm inspired by bat echolocation, emitting "echo signals" to navigate complex loss landscapes effectively.
Originally shared as a public challenge on LinkedIn.
Awards
- Best Paper Award Nominee — International Conference on 3D Vision (3DV 2026)
Nominee
For "Look Around and Pay Attention: Multi-camera Point Tracking Reimagined with Transformers" - Best Poster Presentation Award — COSM-AHNSF (Spring 2025)
- COE Outstanding Graduate Student Award — Northeastern University (2025)
- COE Outstanding Graduate Student Award — Northeastern University (2024)
- Best of Scientific Orals — AAO-HNSF Annual Meeting & OTO EXPO (2024)