Difference between revisions of "Sound source localization for robots"

From AIRWiki
Jump to: navigation, search
Line 17: Line 17:
 
ILD measures the difference in pressure level (dB) between the two ears. As for ITD, ILD depends on the frequency content of the sound wave, on the incident angle and on physical structure of the human perceiving. For these reasons, the most useful IPD measurements are at high frequency (many state-of-the-art system use the same threshold as ITD, 1500Hz).
 
ILD measures the difference in pressure level (dB) between the two ears. As for ITD, ILD depends on the frequency content of the sound wave, on the incident angle and on physical structure of the human perceiving. For these reasons, the most useful IPD measurements are at high frequency (many state-of-the-art system use the same threshold as ITD, 1500Hz).
  
 
+
==Essential bibliography==
----
+
 
+
 
+
Essential bibliography:
+
 
+
 
Haykin, Simon, and Zhe Chen. "The cocktail party problem." Neural computation 17.9 (2005): 1875-1902 [http://www.mitpressjournals.org/doi/abs/10.1162/0899766054322964#.Vi5BNux_Oko]
 
Haykin, Simon, and Zhe Chen. "The cocktail party problem." Neural computation 17.9 (2005): 1875-1902 [http://www.mitpressjournals.org/doi/abs/10.1162/0899766054322964#.Vi5BNux_Oko]
  

Revision as of 10:44, 10 December 2015


Title: Sound source localization for robots
TESI.png
Description: Analysis of the state-of-the-art of binaural systems for audio detection and localization in robotics
Tutor: MatteoMatteucci
Start: 2015/10/08
Number of students: 1
CFU: 20


Analysis of the state-of-the-art of sound source localisation algorithms. Application and improvement of existing systems developping a binaural models to apply on a robot, initially based on a stereo acquisition, then using different type of sensors (microphone array, MEMS, etc...).

State-of-the-art

Robotics audition and active perception of sound source are nowadays a hard task. The common starting point of different existing algorithm is the knowledge of some psychoacoustics measurement. These main features are ITD (Interaural Time Difference) and ILD (Interaural Level Difference), and they describe how humans can localize a sound source.

ITD measures the difference in time-of-arrival of a sound wave between the two ears (Its unity of measure is ms). It depends on the distance between the ears, the frequency content of the sound wave and the angle of incidence of the incoming sound. For geometric reasons, ITD gives useful measurements below the frequency threshold of 1500Hz. In some systems, it could be useful to make this analysis in frequency domani instead of time domain; the equivalent measure in this world is the IPD (Interaural Phase Differene).

ILD measures the difference in pressure level (dB) between the two ears. As for ITD, ILD depends on the frequency content of the sound wave, on the incident angle and on physical structure of the human perceiving. For these reasons, the most useful IPD measurements are at high frequency (many state-of-the-art system use the same threshold as ITD, 1500Hz).

Essential bibliography

Haykin, Simon, and Zhe Chen. "The cocktail party problem." Neural computation 17.9 (2005): 1875-1902 [1]

Deleforge, Antoine, and Radu Horaud. "The cocktail party robot: Sound source separation and localisation with an active binaural head." Proceedings of the seventh annual ACM/IEEE international conference on Human-Robot Interaction. ACM, 2012. [2]

Antoine Deleforge, Radu Horaud, Yoav Y. Schechner, Laurent Girin. Co-Localization of Audio Sources in Images Using Binaural Features and Locally- Linear Regression. IEEE Transactions on Audio, Speech and Language Pro-cessing, Institute of Electrical and Electronics Engineers (IEEE), 2015, 23 (4), pp.718-731. [3]

Schwarz, Andreas, and Walter Kellermann. "Coherent-to-Diffuse Power Ratio Estimation for Dereverberation." Audio, Speech, and Language Processing, IEEE/ACM Transactions on 23.6 (2015): 1006-1018. [4] Willert, Volker, et al. "A probabilistic model for binaural sound localization." Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on 36.5 (2006): 982-994. [5]

Lu, Yan-Chen, and Martin Cooke. "Binaural estimation of sound source distance via the direct-to-reverberant energy ratio for static and moving sources." Audio, Speech, and Language Processing, IEEE Transactions on 18.7 (2010): 1793-1805. [6]

Deleforge, Antoine, and Radu Horaud. "Learning the direction of a sound source using head motions and spectral features." (2011): 29. [7]

Le Roux, Jonathan, and Emmanuel Vincent. "A categorization of robust speech processing datasets." (2014). [8]