Publications

Copyright Notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.


  • Multimodal Computational Attention for Scene Understanding and Robotics

    Boris Schauerte
    Springer, 2016; ISBN 978-3-319-33794-4.

    Springer Bib

    Abstract: This book presents state-of-the-art computational attention models that have been successfully tested in diverse application areas and can build the foundation for artificial systems to efficiently explore, analyze, and understand natural scenes. It gives a comprehensive overview of the most recent computational attention models for processing visual and acoustic input. It covers the biological background of visual and auditory attention, as well as bottom-up and top-down attentional mechanisms and discusses various applications. In the first part new approaches for bottom-up visual and acoustic saliency models are presented and applied to the task of audio-visual scene exploration of a robot. In the second part the influence of top-down cues for attention modeling is investigated.
    Keywords: Computational Intelligence, Robotics and Automation, Artificial Intelligence, Image Processing and Computer Vision,Pattern Recognition, Machine Learning

  • Mobile Interactive Image Sonification for the Blind

    Torsten Woertwein, Boris Schauerte, Karin Mueller, Rainer Stiefelhagen
    International Conference on Computers Helping People with Special Needs (ICCHP), Linz, Austria, July, 2016.

    PDF Bib

    Abstract: Web-based, mobile sonification offers a highly flexible way to give blind users access to graphical information and to solve various everyday as well as job-related tasks. The combination of a touch screen, image processing, and sonification allows the user to hear the content of every image region that he or she indicates with his/her finger position on a tablet or mobile phone. In this paper, we build on and expand our previous work in this area and evaluate how six blind participants can perceive mathematical graphs, bar charts, and floor maps with our sonifications and tactile graphics.
    Keywords: Sonification; Web, Mobile; Blind, Visual Impairment
    Related publications:

  • Interactive Web-based Image Sonification for the Blind

    Torsten Woertwein, Boris Schauerte, Karin Mueller, Rainer Stiefelhagen
    International Conference on Multimodal Interaction (ICMI), Seattle, Washington, USA, November, 2015.

    PDF Bib

    Abstract: In this demonstration, we show a web-based sonification platform that allows blind users to interactively experience various information using two nowadays widespread technologies: modern web browsers that implement high-level JavaScript APIs and touch-sensitive displays. This way, blind users can easily access information such as, for example, maps or graphs. Our current prototype provides various sonifications that can be switched depending on the image type and user preference. The prototype runs in Chrome and Firefox on PCs, smart phones, and tablets.
    Keywords: Sonification; Web, Mobile; Blind, Visual Impairment
    Related publications:

  • Multimodal Public Speaking Performance Assessment

    Torsten Woertwein, Mathieu Chollet, Boris Schauerte, Louis-Philippe Morency, Rainer Stiefelhagen, Stefan Scherer
    International Conference on Multimodal Interaction (ICMI), Seattle, Washington, USA, November, 2015.

    Bib

    Abstract: The ability to speak proficiently in public is essential for many professions and in everyday life. Public speaking skills are difficult to master and require extensive training. Recent developments in technology enable new approaches for public speaking training that allow users to practice in engaging and interactive environments. Here, we focus on the automatic assessment of nonverbal behavior and multimodal modeling of public speaking behavior. We automatically identify audiovisual nonverbal behaviors that are correlated to expert judges’ opinions of key performance aspects. These automatic assessments enable a virtual audience to provide feedback, that is essential for training, during a public speaking performance. We utilize multimodal ensemble tree learners to automatically approximate expert judges’ evaluations to provide post-hoc performance assessments to the speakers. Our automatic performance evaluation is highly correlated with the experts’ opinions with r = 0.745 for the overall performance assessments. We compare multimodal approaches with single modalities and find that the multimodal ensembles consistently outperform single modalities.
    Keywords: Multimodal Speaker Analysis, Face Analysis, Speech Analysis; Public Speaking Training, Virtual Human

  • A Web-based Platform for Interactive Image Sonification

    Boris Schauerte, Torsten Woertwein, Rainer Stiefelhagen
    Accessible Interaction for Visually Impaired People (AI4VIP), Stuttgart, Germany, September, 2015.

    PDF Bib

    Abstract: We present a web-based sonification platform that allows blind users to interactively experience a wide range of information using two nowadays widespread technologies: modern web browsers that implement high-level JavaScript APIs and touch-sensitive displays such as provided by, e.g., tablets. This way, blind users can easily access information such as, e.g., maps or graphs. Our current prototype provides a variety of different sonifications that can be switched depending on the image type and user preference. The prototype runs in Chrome and Firefox on PCs, smart phones, and tablets.
    Keywords: Sonification; Web, Mobile; Blind, Visual Impairment
    Related publications:

  • Color Decorrelation Helps Visual Saliency Detection

    Boris Schauerte, Torsten Woertwein, Rainer Stiefelhagen
    International Conference on Image Processing (ICIP), Quebec City, Canada, September, 2015.

    PDF Bib

    Abstract: We present how color decorrelation allows visual saliency models to achieve higher performance when predicting where people look in images. For this purpose, we decorrelate the color information of each image, which leads to an image-specific color space with decorrelated color components. This way, we are able to improve the performance of several well-known visual saliency algorithms such as, for example, Itti and Koch’s model and Hou and Zhang’s spectral residual saliency. We show the advantage of color decorrelation on three eye-tracking datasets (Kootstra, Toronto, and MIT) with respect to three evaluation measures (AUC, CC, and NSS).
    Keywords: Visual Saliency; Color Space, Decorrelation, PCA, ZCA; Human Perception, Sensory Coding, Efficient Coding; Attention; Human Gaze, Eye-Tracking; Bruce-Tsotsos (Toronto), Judd (MIT), and Kootstra-Schomacker data set
    Code: Related publications:

  • On the Distribution of Salient Objects in Web Images and its Influence on Salient Object Detection

    Boris Schauerte, Rainer Stiefelhagen
    PLOS ONE (Public Library of Science), 2015.

    PDF Bib

    Abstract: It has become apparent that a Gaussian center bias can serve as an important prior for visual saliency detection, which has been demonstrated for predicting human eye fixations and salient object detection. Tseng et al. have shown that the photographer's tendency to place interesting objects in the center is a likely cause for the center bias of eye fixations. We investigate the influence of the photographer's center bias on salient object detection, extending our previous work. We show that the centroid locations of salient objects in photographs of Achanta and Liu's data set in fact correlate strongly with a Gaussian model. This is an important insight, because it provides an empirical motivation and justification for the integration of such a center bias in salient object detection algorithms and helps to understand why Gaussian models are so effective. To assess the influence of the center bias on salient object detection, we integrate an explicit Gaussian center bias model into two state-of-the-art salient object detection algorithms. This way, first, we quantify the influence of the Gaussian center bias on pixel- and segment-based salient object detection. Second, we improve the performance in terms of F1 score, Fb score, area under the recall-precision curve, area under the receiver operating characteristic curve, and hit-rate on the well-known data set by Achanta and Liu. Third, by debiasing Cheng et al.'s region contrast model, we exemplarily demonstrate that implicit center biases are partially responsible for the outstanding performance of state-of-the-art algorithms. Last but not least, as a result of debiasing Cheng et al.'s algorithm, we introduce a non-biased salient object detection method, which is of interest for applications in which the image data is not likely to have a photographer's center bias (e.g., image data of surveillance cameras or autonomous robots).
    Keywords: Salient Object Detection, Spatial Distribution, Photographer Bias, Center Bias
    Code: Related publications:

  • Small k-pyramids and the complexity of determining k

    Boris Schauerte, Carol T. Zamfirescu
    Journal of Discrete Algorithms (Elsevier), 2015.

    PDF Bib

    Abstract: Motivated by the computational complexity of determining whether a graph is hamiltonian, we study under algorithmic aspects a class of polyhedra called k-pyramids, introduced in [Zamfirescu and Zamfirescu, 2011], and discuss related applications. We prove that determining whether a given graph is the 1-skeleton of a k-pyramid, and if so whether it is belted or not, can be done in polynomial time for k<=3. The impact on hamiltonicity follows from the traceability of all 2-pyramids and non-belted 3-pyramids, and from the hamiltonicity of all non-belted 2-pyramids. The algorithm can also be used to determine the outcome for larger values of k, but the complexity increases exponentially with k. Lastly, we present applications of the algorithm, and improve the known bounds for the minimal cardinality of systems of bases called foundations in graph families with interesting properties concerning traceability and hamiltonicity.
    Keywords: Graph Theory; Algorithms, Computational Complexity
    Related publications:

  • Way to Go! Detecting Open Areas Ahead of a Walking Person

    Boris Schauerte, Daniel Koester, Manuel Martinez, Rainer Stiefelhagen
    ECCV Workshop on Assistive Computer Vision and Robotics (ACVR), Zurich, Switzerland, September, 2014.

    PDF Bib

    Abstract: We determine the region in front of a walking person that is not blocked by obstacles. This is an important task when trying to assist visually impaired people or navigate autonomous robots in urban environments. We use conditional random fields to learn how to interpret texture and depth information for their accessibility. We demonstrate the effectiveness of the proposed approach on a novel dataset, which consists of urban outdoor and indoor scenes that were recorded with a handheld stereo camera.
    Keywords: Conditional Random Fields, Machine Learning; Visually Impaired, Computer Vision, Obstacle Detection, Guidance, Navigation
    Related publications:

  • Look at this! Learning to Guide Visual Saliency in Human-Robot Interaction

    Boris Schauerte, Rainer Stiefelhagen
    International Conference on Intelligent Robots and Systems (IROS), Chicago, IL, USA, September, 2014.

    PDF Bib

    Abstract: We learn to direct computational visual attention in multimodal (i.e., pointing gestures and spoken references) human-robot interaction. For this purpose, we train a conditional random field to integrate features that reflect low-level visual saliency, the likelihood of salient objects, the probability that a given pixel is pointed at, and - if available - spoken information about the target object's visual appearance. As such, this work integrates several of our ideas and approaches, ranging from multi-scale spectral saliency detection, spatially debiased salient object detection, computational attention in human-robot interaction to learning robust color term models. We demonstrate that this machine learning driven integration outperforms the previously reported results on two datasets, one dataset without and one with spoken object references. In summary, for automatically detected pointing gestures and automatically extracted object references, our approach improves the rate at which the correct object is included in the initial focus of attention by 10.37% in the absence and 25.21% in the presence of spoken target object information.
    Keywords: Conditional Random Fields, Machine Learning; Visual Search, Object Detection; Shared Attention, Joint Attention; Multi-Modal Interaction, Gestures, Pointing, Language; Human-Robot Interaction
    Related publications:

  • Manifold Alignment for Person Independent Appearance-based Gaze Estimation

    Timo Schneider, Boris Schauerte, Rainer Stiefelhagen
    International Conference on Pattern Recognition (ICPR), Stockholm, Sweden, August, 2014.

    PDF Bib

    Abstract: We show that dually supervised manifold embedding can improve the performance of machine learning based person-independent and thus calibration-free gaze estimation. For this purpose, we perform a manifold embedding for each person in the training dataset and then learn a linear transformation that aligns the individual, person-dependent manifolds. We evaluate the effect of manifold alignment on the recently presented Columbia dataset, where we analyze the influence on 6 regression methods and 8 feature variants. Using manifold alignment, we are able to improve the person-independent gaze estimation performance by up to 31.2% compared to the best approach without manifold alignment.
    Keywords: Gaze Estimation, Manifold Alignment, Calibration-Free, Machine Learning

  • Cognitive Evaluation of Haptic and Audio Feedback in Short Range Navigation Tasks

    Manel Martinez, Angela Constantinescu, Boris Schauerte, Daniel Koester, Rainer Stiefelhagen
    International Conference on Computers Helping People with Special Needs (ICCHP), Paris, France, July, 2014.

    PDF Bib

    Abstract: Assistive navigation systems for the blind commonly use speech to convey directions to their users. However, this is problematic for short range navigation systems that need to provide fine but diligent guidance in order to avoid obstacles. For this task, we have compared haptic and audio feedback systems under the NASA-TLX protocol to analyze the additional cognitive load that they place on users. Both systems are able to guide the users through a test obstacle course. However, for white cane users, auditory feedback results in a 22 times higher cognitive load than haptic feedback. This discrepancy in cognitive load was not found on blindfolded users, thus we argue against evaluating navigation systems solely with blindfolded users.
    Keywords: Computer Vision for the Blind, Assistive Technologies; Haptic Feedback, Sonification, Cognitive Load
    Related publications:

  • Important Stuff, Everywhere! Activity Recognition with Salient Proto-Objects as Context

    Lukas Rybok, Boris Schauerte, Ziad Al-Halah, Rainer Stiefelhagen
    IEEE Winter Conference on Applications of Computer Vision (WACV), Steamboat Springs, CO, USA, March, 2014.

    PDF Bib

    Abstract: Object information is an important cue to discriminate between activities that draw part of their meaning from context. Most of current work either ignores this information or relies on specific object detectors. However, such object detectors require a significant amount of training data and complicate the transfer of the action recognition framework to novel domains with different objects and object-action relationships. Motivated by recent advances in saliency detection, we propose to use proto-objects to detect object candidate regions in videos without any need of prior knowledge. Our experimental evaluation on three publicly available data sets shows that the integration of proto-objects and simple motion features substantially improves recognition performance, outperforming the state-of-the-art.
    Keywords: Activity Recognition, Action Recognition, Context, Salient Proto-Objects, Visual Saliency
    Related publications:

  • How the Distribution of Salient Objects in Images Influences Salient Object Detection

    Boris Schauerte, Rainer Stiefelhagen
    International Conference on Image Processing (ICIP), Melbourne, Australia, September, 2013.

    PDF Bib

    Abstract: We investigate the spatial distribution of salient objects in images. First, we empirically show that the centroid locations of salient objects correlate strongly with a centered, half-Gaussian model. This is an important insight, because it provides a justification for the integration of such a center bias in salient object detection algorithms. Second, we assess the influence of the center bias on salient object detection. Therefore, we integrate an explicit center bias into Cheng’s state-of-the-art salient object detection algorithm. This way, first, we quantify the influence of the Gaussian center bias on salient object detection, second, improve the performance with respect to several established evaluation measures, and, third, derive a state-of-the-art unbiased salient object detection algorithm.
    Keywords: Salient Object Detection, Spatial Distribution, Photographer Bias, Center Bias
    Code: Related publications:

  • BAM! Depth-based Body Analysis in Critical Care

    Manel Martinez, Boris Schauerte, Rainer Stiefelhagen
    International Conference on Computer Analysis of Images and Patterns (CAIP), York, UK, August, 2013.

    PDF Bib

    Abstract: We investigate computer vision methods to monitor Intensive Care Units (ICU) and assist in sedation delivery and accident prevention. We propose the use of a Bed Aligned Map (BAM) to analyze the patient’s body.We use a depth camera to localize the bed, estimate its surface and divide it into 10 cm x 10 cm cells. Here, the BAM represents the average cell height over the mattress. This depth-based BAM is independent of illumination and bed positioning, improving the consistency between patients. This representation allow us to develop metrics to estimate bed occupancy, body localization, body agitation and sleeping position. Experiments with 23 subjects show an accuracy in 4-level agitation tests of 88% and 73% in supine and fetal positions respectively, while sleeping position was recognized with a 100% accuracy in a 4-class test.
    Keywords: Depth Camera, Critical Care, Monitoring, Agitation, Sleeping Position

  • Accessible Section Detection For Visual Guidance

    Daniel Koester, Boris Schauerte, Rainer Stiefelhagen
    IEEE/NSF Workshop on Multimodal and Alternative Perception for Visually Impaired People (MAP4VIP), San Jose, CA, USA, July, 2013.

    PDF Bib

    Abstract: We address the problem of determining the accessible section in front of a walking person. In our definition, the accessible section is the spatial region that is not blocked by obstacles. For this purpose, we use gradients to calculate surface normals on the depth map and subsequently determine the accessible section using these surface normals. We demonstrate the effectiveness of the proposed approach on a novel, challenging dataset. The dataset consists of urban outdoor and indoor scenes that were recorded with a handheld stereo camera.
    Keywords: Visually Impaired, Computer Vision, Obstacle Detection, Guidance, Navigation
    Code: Related publications:

  • Wow! Bayesian Surprise for Salient Acoustic Event Detection

    Boris Schauerte, Rainer Stiefelhagen
    International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vancouver, Canada, May, 2013.

    PDF Bib

    Abstract: We extend our previous work and present how Bayesian surprise can be applied to detect salient acoustic events. Therefore, we use the Gamma distribution to model each frequencies spectrogram distribution. Then, we use the Kullback-Leibler divergence of the posterior and prior distribution to calculate how "unexpected" and thus surprising newly observed audio samples are. This way, we are able to efficiently detect arbitrary, unexpected and thus surprising acoustic events. Complementing our qualitative system evaluations for (humanoid) robots, we demonstrate the effectiveness and practical applicability of the approach on the CLEAR 2007 acoustic event detection data.
    Keywords: Acoustic Event Detection, Acoustic Saliency, Salient Event Detection
    Code: Presentation material: Related publications:

  • Quaternion-based Spectral Saliency Detection for Eye Fixation Prediction

    Boris Schauerte, Rainer Stiefelhagen
    European Conference on Computer Vision (ECCV), Firenze, Italy, October, 2012.

    PDF Bib

    Abstract: In recent years, several authors have reported that spectral saliency detection methods provide state-of-the-art performance in predicting human gaze in images. We systematically integrate and evaluate quaternion DCT- and FFT-based spectral saliency detection, weighted quaternion color space components, and the use of multiple resolutions. Furthermore, we propose the use of the eigenaxes and eigenangles for spectral saliency models that are based on the quaternion Fourier transform. We demonstrate the outstanding performance on the Bruce-Tsotsos (Toronto), Judd (MIT), and Kootstra-Schomacker eye-tracking data sets.
    Keywords: Spectral Saliency, Quaternion; Multi-Scale, Color Space, Quaternion Component Weight, Quaternion Axis; Attention; Human Gaze, Eye-Tracking; Bruce-Tsotsos (Toronto), Judd (MIT), and Kootstra-Schomacker data set
    Code: Presentation material: Related publications:

  • Multimodal Saliency-based Attention: A Lazy Robot’s Approach

    Benjamin Kuehn, Boris Schauerte, Kristian Kroschel, Rainer Stiefelhagen
    International Conference on Intelligent Robots and Systems (IROS), Algarve, Portugal, October, 2012.

    PDF Bib

    Abstract: We extend our work on an integrated object-based system for saliency-driven overt attention and knowledge-driven object analysis. We present how we can reduce the amount of necessary head movement during scene analysis while still focusing all salient proto-objects in an order that strongly favors proto-objects with a higher saliency. Furthermore, we integrated motion saliency and as a consequence adaptive predictive gaze control to allow for efficient gazing behavior on the ARMAR-III robot head. To evaluate our approach, we first collected a new data set that incorporates two robotic platforms, three scenarios, and different scene complexities. Second, we introduce measures for the effectiveness of active overt attention mechanisms in terms of saliency cumulation and required head motion. This way, we are able to objectively demonstrate the efffectiveness of the proposed multicriterial focus of attention selection.
    Keywords: Exploration Path, Optimization, Active Perception, Saliency-based Overt Attention, Scene Exploration
    Related publications:

  • Learning Robust Color Name Models from Web Images

    Boris Schauerte, Rainer Stiefelhagen
    International Conference on Pattern Recognition (ICPR), Tsukuba, Japan, November, 2012.

    PDF Bib

    Abstract: We use images that have been collected using an Internet search engine to train color name models for color naming and recognition tasks. Considering color histogram bands as being words of an image and the color names as classes, we use the supervised latent Dirichlet allocation to train our model. To pre-process the training data, we use state-of-the art salient object detection and a Kullback–Leibler divergence based outlier detection. In summary, we achieve state-of-the-art performance on the eBay data set and improve the similarity between labels assigned by our model and human observers by approximately 14%.
    Keywords: Color Terms; Color Naming; Web-based Learning; Supervised Latent Dirichlet Allocation (SLDA); Human-Robot Interaction
    Code: Data: Presentation material: Related publications:

  • A Modular Audio-Visual Scene Analysis and Attention System for Humanoid Robots

    Benjamin Kuehn, Boris Schauerte, Kristian Kroschel, Rainer Stiefelhagen
    International Symposium on Robotics (ISR), Taipei, Taiwan, August, 2012.

    PDF Bib

    Abstract: We present an audio-visual scene analysis system, which is implemented and evaluated on the ARMAR-III robot head. The modular design allows a fast integration of new algorithms and adaptation on new hardware. Further benefits are automatic module dependency checks and determination of the execution order. The integrated world model manages and serves the acquired data for all modules in a consistent way. The system has a state of the art performance in localization, tracking and classification of persons as well as exploration of whole scenes and unknown items. We use multimodal proto-objects to model and analyze salient stimuli in the environment of the robot to realize the robots’ attention.
    Keywords: Scene Analysis, Hierarchical Entity-based Exploration, Audio-Visual Saliency-based Attention, World Model; Humanoid Robot
    Related publications:

  • An Assistive Vision System for the Blind that Helps Find Lost Things

    Boris Schauerte, Manel Martinez, Angela Constantinescu, Rainer Stiefelhagen
    International Conference on Computers Helping People with Special Needs (ICCHP), Linz, Austria, July, 2012.

    PDF Bib

    Abstract: We present a computer vision system that helps blind people find lost objects. To this end, we combine color- and SIFT-based object detection with sonification to guide the hand of the user towards potential target object locations. This way, we are able to guide the user’s attention and effectively reduce the space in the environment that needs to be explored. We verified the suitability of the proposed system in a user study.
    Keywords: Computer Vision for the Blind, Assistive Technologies; Object Recognition, Color Naming; Sonification; Interactive Object Search, Scene Exploration
    Related publications:

  • Predicting Human Gaze using Quaternion DCT Image Signature Saliency and Face Detection

    Boris Schauerte, Rainer Stiefelhagen
    IEEE Workshop on the Applications of Computer Vision (WACV), Breckenridge, CO, USA, January, 2012.

    Best Student Paper Award

    PDF Bib

    Abstract: We combine and extend the previous work on DCT-based image signatures and face detection to determine the visual saliency. To this end, we transfer the scalar definition of image signatures to quaternion images and thus introduce a novel saliency method using quaternion type-II DCT image signatures. Furthermore, we use MCT-based face detection to model the important influence of faces on the visual saliency using rotated elliptical Gaussian weight functions and evaluate several integration schemes. In order to demonstrate the performance of the proposed methods, we evaluate our approach on the Bruce-Tsotsos (Toronto) and Cerf (FIFA) benchmark eye-tracking data sets. Additionally, we present evaluation results on the Bruce-Tsotsos data set of the most important spectral saliency approaches. We achieve state-of-the-art results in terms of the well-established area under curve (AUC) measure on the Bruce-Tsotsos data set and come close to the ideal AUC on the Cerf data set - with less than one millisecond to calculate the bottom-up QDCT saliency map.
    Keywords: Spectral Saliency, Quaternion, DCT Image Signatures; MCT Face Detection; Attention; Human Gaze, Eye-Tracking; Bruce-Tsotsos (Toronto) and Cerf (FIFA) data set
    Code: Data: Related publications:

  • SIFT-based Camera Localization using Reference Objects for Application in Multi-Camera Environments and Robotics

    Hanno Jaspers, Boris Schauerte, Gernot A. Fink
    International Conference on Pattern Recognition Applications and Methods (ICPRAM), Vilamoura, Algarve, Portugal, February, 2012.

    PDF Bib

    Abstract: The usage of robots in non-industrial environments has attracted a considerable research interest. However, these environments present several challenges due to their unstructured nature, e.g. self-localization and a good perception of what is currently happening in the environment. In this contribution, we present a unified approach to improve the localization and the perception of a robot in a new environment by using already installed cameras. Using our approach we are able to localize arbitrary cameras in multi-camera environments while automatically extending the camera network in an online, unattended, real-time way. This way, all cameras can be used to improve the perception of the scene, and additional cameras can be added in real-time, e.g., to remove blind spots. To this end, we use the Scale-invariant feature transform (SIFT) and at least one arbitrary known-size reference object to enable camera localization. Then we apply non-linear optimization of the relative pose estimate and we use it to iteratively calibrate the camera network as well as to localize arbitrary cameras, e.g. of mobile phones or robots, inside a multi-camera environment. We performed an evaluation on synthetic as well as real data to demonstrate the applicability of the proposed approach.
    Keywords: Camera Pose Estimation, Camera Calibration; Scale Ambiguity; SIFT; Multi-Camera Environment, Smart Room, Robot Localization
    Code: Data: Related publications:

  • Multimodal Saliency-based Attention for Object-based Scene Analysis

    Boris Schauerte, Benjamin Kuehn, Kristian Kroschel, Rainer Stiefelhagen
    International Conference on Intelligent Robots and Systems (IROS), San Francisco, CA, USA, September, 2011.

    PDF Bib

    Abstract: Multimodal attention is a key requirement for humanoid robots in order to navigate in complex environments and act as social, cognitive human partners. To this end, robots have to incorporate attention mechanisms that focus the processing on the potentially most relevant stimuli while controlling the sensor orientation to improve the perception of these stimuli. In this paper, we present our implementation of audio-visual saliency-based attention that we integrated in a system for knowledge-driven audio-visual scene analysis and object-based world modeling. For this purpose, we introduce a novel isophote-based method for proto-object segmentation of saliency maps, a surprise-based auditory saliency definition, and a parametric 3-D model for multimodal saliency fusion. The applicability of the proposed system is demonstrated in a series of experiments.
    Keywords: Multimodal, Audio-Visual Attention; Auditory Surprise; Isophote-based Visual Proto-Objects; Parametric 3-D Saliency Model and Fusion; Object-based Inhibition of Return; Object-based Scene Exploration and Hierarchical Analysis
    Code: Related publications:

  • Web-based Learning of Naturalized Color Models for Human-Machine Interaction

    Boris Schauerte, Gernot A. Fink
    International Conference on Digital Image Computing: Techniques and Applications (DICTA), Sydney, Australia, December, 2010.

    PDF Bib

    Abstract: In recent years, natural verbal and non-verbal human-robot interaction has attracted an increasing interest. Therefore, models for robustly detecting and describing visual attributes of objects such as, e.g., colors are of great importance. However, in order to learn robust models of visual attributes, large data sets are required. Based on the idea to overcome the shortage of annotated training data by acquiring images from the Internet, we propose a method for robustly learning natural color models. Its novel aspects with respect to prior art are: firstly, a randomized HSL transformation that reflects the slight variations and noise of colors observed in real-world imaging sensors; secondly, a probabilistic ranking and selection of the training samples, which removes a considerable amount of outliers from the training data. These two techniques allow us to estimate robust color models that better resemble the variances seen in real-world images. The advantages of the proposed method over the current state-of-the-art technique using the training data without proper transformation and selection are confirmed in experimental evaluations. In combination, for models learnt with pLSA-bg and HSL, the proposed techniques reduce the amount of mislabeled objects by 19.87% on the well-known E-Bay data set.
    Keywords: Color Terms; Color Naming; Web-based Learning; Natural vs. Web-based Image Statisticsi; Domain Adaptation; Probabilistic HSL Model; Probabilistic Latent Semantic Analysis (pLSA); Human-Robot Interaction
    Data: Presentation material: Related publications:

  • Focusing Computational Visual Attention in Multi-Modal Human-Robot Interaction

    Boris Schauerte, Gernot A. Fink
    International Conference on Multimodal Interfaces (ICMI), Beijing, China, November, 2010.

    Doctoral Spotlight ACM/Google Travel Grant

    PDF Bib

    Abstract: Identifying verbally and non-verbally referred-to objects is an important aspect of human-robot interaction. Most importantly, it is essential to achieve a joint focus of attention and, thus, a natural interaction behavior. In this contribution, we introduce a saliency-based model that reflects how multi-modal referring acts influence the visual search, i.e. the task to find a specific object in a scene. Therefore, we combine positional information obtained from pointing gestures with contextual knowledge about the visual appearance of the referred-to object obtained from language. The available information is then integrated into a biologically-motivated saliency model that forms the basis for visual search. We prove the feasibility of the proposed approach by presenting the results of an experimental evaluation.
    Keywords: Modulatable Neuron-based, Phase-based, Spectral Whitening Saliency; Attention; Visual Search; Objects; Color; Shared Attention, Joint Attention; Multi-Modal Interaction, Gestures, Pointing, Language; Deictic Interaction; Spoken Human-Robot Interaction
    Related publications:

  • Saliency-based Identification and Recognition of Pointed-at Objects

    Boris Schauerte, Jan Richarz, Gernot A. Fink
    International Conference on Intelligent Robots and Systems (IROS), Taipei, Taiwan, October, 2010.

    PDF Bib

    Abstract: When persons interact, non-verbal cues are used to direct the attention of persons towards objects of interest. Achieving joint attention this way is an important aspect of natural communication. Most importantly, it allows to couple verbal descriptions with the visual appearance of objects, if the referred-to object is non-verbally indicated. In this contribution, we present a system that utilizes bottom-up saliency and pointing gestures to efficiently identify pointed-at objects. Furthermore, the system focuses the visual attention by steering a pan-tilt-zoom camera towards the object of interest and thus provides a suitable model-view for SIFT-based recognition and learning.
    Keywords: Spectral Residual Saliency, Spectral Whitening Saliency; Joint/Shared Attention, Pointing Gestures; Object Detection and Learning; Maximally Stable Extremal Regions (MSER); Scale-Invariant Feature Transform (SIFT); Active Pan-Tilt-Zoom Camera; Human-Robot Interaction
    Related publications:

  • Multi-Modal and Multi-Camera Attention in Smart Environments

    Boris Schauerte, Jan Richarz, Thomas Ploetz, Christian Thurau, Gernot A. Fink
    International Conference on Multimodal Interfaces (ICMI), Cambridge, MA, USA, November, 2009.

    Outstanding Student Paper Award Finalist MERL/NSF Travel Grant

    PDF Bib

    Abstract: This paper considers the problem of multi-modal saliency and attention. Saliency is a cue that is often used for directing attention of a computer vision system, e.g., in smart environments or for robots. Unlike the majority of recent publications on visual/audio saliency, we aim at a well grounded integration of several modalities. The proposed framework is based on fuzzy aggregations and offers a flexible, plausible, and efficient way for combining multi-modal saliency information. Besides incorporating different modalities, we extend classical 2D saliency maps to multi-camera and multi-modal 3D saliency spaces. For experimental validation we realized the proposed system within a smart environment. The evaluation took place for a demanding setup under real-life conditions, including focus of attention selection for multiple subjects and concurrently active modalities.
    Keywords: Multi-Camera; 3-D Spatial Saliency; Multi-Modal Saliency; Attention; Active Multi-Camera Control; Volumetric Intersection, Minimal Reconstruction Error; View Selection, Viewpoint Selection; Multi-Modal Sensor Fusion; Fuzzy; Smart Room; Human-Machine Interaction
    Code: Multimedia: Related publications:

  • A Multi-modal Attention System for Smart Environments

    Boris Schauerte, Thomas Ploetz, Gernot A. Fink
    International Conference on Computer Vision Systems (ICVS), Liege, Belgium, October, 2009.

    PDF Bib

    Abstract: Focusing their attention to the most relevant information is a fundamental biological concept, which allows humans to (re-)act rapidly and safely in complex and unfamiliar environments. This principle has successfully been adopted for technical systems where sensory stimuli need to be processed in an efficient and robust way. In this paper a multi-modal attention system for smart environments is described that explicitly respects efficiency and robustness aspects already by its architecture. The system facilitates unconstrained human-machine interaction by integrating multiple sensory information of different modalities.
    Keywords: Multi-Modal; Multi-Camera; 3-D Spatial Saliency; Attention; Active Multi-Camera Control; Volumetric Intersection; View Selection, Viewpoint Selection; Real-Time Performance; Distributed, Scalable System; Design; Smart Environment, Smart Room; Human-Machine Interaction
    Multimedia: Related publications:

  • Multi-modale Aufmerksamkeitssteuerung in einer intelligenten Umgebung (Multi-modal attention control in an intelligent environment)

    Boris Schauerte
    Diplom (MSc), TU Dortmund University.

    Bib

    Abstract: Intelligent environments are supposed to simplify the everyday life of their users. To reach this target, a multitude of sensors is necessary to create a model of the current scene. The complete processing of the resulting sensor data stream can exceed the available processing capacities and inhibit the real-time processing of the complete sensor data. A possible solution to this problem is a fast pre-selection of potentially relevant sensor data and the restriction of complex calculations on the pre-selected sensor data. That leaves the question of what is potentially relevant. ...

  • Regular graphs in which every pair of points is missed by some longest cycle

    Boris Schauerte, Carol T. Zamfirescu
    Annals of University of Craiova (33), pp. 154-173, 2006.

    PDF Bib

    Abstract: In Petersen's well-known cubic graph every vortex is missed by some longest cycle. Thomassen produced a planar graph with this property. Grünbaum found a cubic graph, in which any two vertices are missed by some longest cycle. In this paper we present a cubic planar graph fulfilling this condition.

  • Root Treatment - The dangers of rootkits

    Boris Schauerte
    Linux Magazine (UK) (19), pp. 20-23, April, 2002.

    Bib

  • Feind im Dunkeln - Wie gefährlich sind die Cracker-Werkzeuge

    Boris Schauerte
    Linux Magazin (DE) (3), pp. 44-47, February, 2002.

    Bib

  • Verborgene Gefahren - Trojanische Pferde in den Kernel laden

    Boris Schauerte
    Linux Magazin (DE) (11), pp. 60-63, October, 2001.

    Bib