Teaching computers to see and hear
Meet Dr. Tali Dekel
New scientists
Dr. Tali Dekel
Developing an algorithm for reconstructing the scene’s geometry from 2D image data always come with the limitation that the subject must be observed from at least two different viewpoints at the same time. This requires either the subject or the camera to be stationary. Interpreting the three-dimensionality of the scene when both camera and object are in motion has remained a fundamental stumbling block in the field of computer vision.
Today, as a Senior Research Scientist at Google in Cambridge, Massachusetts, Dr. Tali Dekel has developed a solution to this challenge through combining fundamental knowledge in computer vision and deep learning. Deep learning—a subset of artificial intelligence where machines can learn directly by observing large amounts of data—is revolutionizing computer vision and graphics.
To “teach” a computer to predict the physical dimensions of a human from 2D images, Dr. Dekel discovered a surprising new source of data: thousands of YouTube videos of the “Mannequin Challenge”—a viral Internet trend in which people imitate mannequins by freezing in place while a moving camera films them. Because the subjects in the videos are stationary, it is possible to accurately estimate the geometry of the entire scene, including the people in it. Using a large collection of such videos as training examples, Dr. Dekel designed the first deep learning-based model that takes an input video with a moving camera and accurately interprets the geometry of any moving human in it. Future applications of this technology are numerous—from advanced computer graphic effects to revealing obstructed objects in videos.
Looking to listen
Even in a noisy setting, humans are able to tune their attention to one particular voice while filtering out the surrounding, extraneous noise. Dr. Dekel took on the challenge of achieving this “cocktail party effect” computationally and developed a model for separating overlapping audio signals in videos.
The key idea was to use visual signals available in an ordinary video to process the audio signal. Dr. Dekel designed a deep learning-based model that analyzes the visual facial movement of a person speaking (e.g., mouth movements) and associates it with the sound it produces. The model could then successfully differentiate between overlapping audio signals of multiple speakers in a video and produce a clean speech track for each person in the video. This model has the potential to unlock latent properties of human voices and faces, and could be useful in a variety of applications such as biometric identification, surveillance, and hearing aid improvement.
Dr. Dekel earned her BSc and MSc in electrical engineering, both cum laude, from Tel Aviv University (in 2007 and 2009). After interning at the Disney Research lab at ETH Zürich (the Swiss Federal Institute of Technology), she returned to Tel Aviv University to complete her PhD in electrical engineering and computer vision in 2015. With the support of a Rothschild Postdoctoral Fellowship and the Israel National Postdoctoral Award Program for Advancing Women in Science, in which she was a Clore Fellow, Dr. Dekel worked in the MIT Computer Science and Artificial Intelligence Lab (CSAIL) under Prof. William T. Freeman from 2014 until 2016. She is a Senior Research Scientist at Google’s Machine Perception Group in Cambridge, Massachusetts. She will join the faculty of the Department of Computer Science and Applied Mathematics at the Weizmann Institute in September 2020.
Dr. Dekel was awarded the Prof. Norman W. Rosenberg Memorial Prize at Tel Aviv University in 2013, and an Excellence Scholarship from the Tel Aviv University School of Electrical Engineering in 2009. She won a scholarship from RAD Data Communications, Ltd. in 2005. Dr. Dekel is married and the mother of two boys.
Dr. Dekel is supported by the Clore Israel Foundation.