This thesis developed a real-time system for detecting, classifying, and locating sound events using only audio data. A network of 16 microphones and deep learning techniques achieved 96% classification accuracy and average localization error of 1.4 meters, demonstrating that sound-based analysis can effectively replace vision in monitoring applications.