|
| |
Invited Talk
|
 |
|
"Scene Understanding and Activity
Recognition" |
|
Dr François Brémond |
|
PULSAR, INRIA, Sophia Antipolis,
Nice,
France
|
| |
|
|
|
Scene understanding is the process of perceiving, analyzing and
elaborating an interpretation of the 3D dynamic scene observed through a network
of sensors. This process consists mainly in matching information from the
sensors observing the scene with models. Thus, to understand a scene is to both
adding and extracting semantics from the sensor data data describing a scene.
This scene can contain a number of physical objects of various types (e.g.
people, |
|
vehicle) interacting
with each others or with their environment (e.g. equipment) more or less structured. The scene can last few instants (e.g. the fall of a person) or few months (e.g. the depression of an elderly
person), can be limited to a laboratory slide observed through a microscope or go beyond the size of a city. Sensors include mostly cameras (e.g. omni directional, infrared), but also may include microphones and other sensors (e.g. optical cells, contact sensors, physiological sensors, radars, smoke detectors).
Scene understanding is influenced by cognitive vision and it requires at least the melding of three areas: computer vision, cognition and software engineering. Scene understanding can achieve four levels of generic computer vision functionality of detection, localization, recognition and understanding. But scene understanding systems go beyond the detection of visual features such as corners, edges and moving regions to extract information related to the physical world which is meaningful for human operators. Its requirement is also to achieve more robust, resilient, adaptable computer vision functionalities by endowing them with a cognitive faculty: the ability to learn, adapt, weigh alternative solutions, and develop new strategies for analysis and interpretation. The key characteristic of a scene understanding system is its capacity to exhibit robust performance even in circumstances that were not foreseen when it was designed. Furthermore, a scene understanding system should be able to anticipate events and adapt its operation accordingly. Ideally, a scene understanding system should be able to adapt to novel variations of the current environment to generalize to new context and application domains and interpret the intent of underlying behaviors to predict future configurations of the environment, and to communicate an understanding of the scene to other systems, including humans.
To make this approach concrete, my talk will address the challenges associate with the following main themes:
perception for scene understanding - the perceptual world; maintenance of the 3D coherency throughout time - the
physical world; event recognition - the semantic world; and evaluation, control and learning -
autonomous systems. |
|
Biography |
|
François Brémond is a
researcher in the PULSAR team at INRIA Sophia Antipolis. He obtained his
Master degree in 1992 at ENS Lyon. He has conducted research works in video
understanding since 1993 both at Sophia-Antipolis and at USC (University of
Southern California), LA. In
1997 he obtained his PhD degree at INRIA in video understanding and François
Brémond pursued his research work as a post doctorate at USC on the
interpretation of videos taken from UAV (Unmanned Airborne Vehicle) in DARPA
project VSAM (Visual Surveillance and Activity Monitoring). In
2007 he obtained his HDR degree (Habilitation à Diriger des Recherches) at
Nice University on Scene Understanding.
Dr Brémond designs and develops generic systems for dynamic scene interpretation.
The targeted class of applications is the automatic interpretation of indoor
and outdoor partially structured scenes observed in particular with
monocular colour cameras. These systems detect and track mobile objects,
which can be either humans or vehicles, and recognize their behaviours. He
is particularly interested in filling the gap between sensor information
(pixel level) and behaviour recognition (semantic level).
François Brémond
is author or co-author of more than 60 scientific papers published in
international journals or conferences in video understanding. He is reviewer
for several international journals (IJHCS, IEEE Transations on Neural
Networks, IEEE Systems, Man and Cybernetics, PAAJ, Eurasip JASP) and
conferences (CVPR, ICVS,…).
He has also participated to six European projects (PASSWORDS, ADVISOR,
AVITRACK, SERKET, CARETAKER, CoFriend), one DARPA project, seven industrial
research contracts and several international cooperations (USA, Taiwan, UK,
Belgium) in video understanding. For instance, he has managed to recognize a
large variety of scenarios in different
applications: 1) fighting, abandoned luggage, graffiti, fraud, crowd
behavior in metro stations, on roads and onboard trains, 2) aircraft
arrival, aircraft refueling, luggage loading/unloading on airport aprons, 3)
bank attack in bank agencies, office behavior for ambient intelligence, 4)
access control in buildings and 5) wasp monitoring for biological
application. |
|