WP3. Signal processing for interaction control

Description of work

T3.1 – Algorithms for gaze-based interaction with an eye-tracker (UNI KO-LD, SMI) [M8-M18, M24-M32]

The existing analysis mechanisms, such as dwelling time, number of fixations, etc (examples of what we have used before and how we compared to others are e.g. described in (Walber et al., 2013)) will be evaluated for use in the MAMEM set of algorithms in order to come up with low-level (similar to onMouseMove) and meso-level (similar to onMouseOver, ticking a box) explanations of user activity. To this end, we will conduct initial studies with control subjects employing pattern recognition and gaze-tracking algorithms to determine correlations, precision, recall and accuracy of eye activity with intended low and meso level operations of the target software. At this first stage of algorithm implementation, the requirements set from the target-user disabilities will be used to fine-tune the parameters of the eye-tracking algorithms. More specifically, retaining high accuracy in the case of PD is a big challenge for the eye tracking algorithms, which will have to be able to capture the eye at high head velocity and ensure short recovery time to cope with the eye and head tremor. As part of this task we will also consider the ease of learnability of control widgets by test subjects. The algorithms developed in this task will be wrapped as widgets that will become part of the middleware and accompanying SDK developed in WP4. The exact set of low- and meso-level controls to be implemented will be determined in the initial requirements phase of this task. At a second stage, the feedback from the first-stage experiments conducted within WP6 will be incorporated to further improve the performance and implement the final version of the eye-tracking algorithms.

T3.2 – Algorithms for mind-based interaction with an EEG recorder (CERTH, EBNeuro) [M8-M18, M24-32]

The goal of this task will be to develop robust algorithms for the translation of raw EEG data to meaningful mental commands. The selective attention (relying on the P300 and SSVEP) and motor imagery techniques (relying on ERD/ERS) will be evaluated to find out which of these methods best fit to MAMEM requirements. Additionally, ERPs analysis will be conducted to explore the contribution of other potentials, e.g., the N400 component, to the interpretation of brain activity to mental commands. Additional EEG features appearing in the time and frequency domain will also be employed as complementary elements to enhance the overall performance of the algorithms. MAMEM will use self-paced brain-computer communication techniques to discriminate between intentional control and non-control state (Scherer et al., 2008). For the development of self-paced BCI it is essential to train the user to reliably induce distinctive brain patterns and the BCI to detect those patterns. The non-control state will be automatically detected through a “brain switch” triggered at the absence of motor-imagery related activity. These techniques will provide natural communication channels and enforce synchronization of BCI with time-dependent media. Above all, the real-time requirement and the hardware limitations (as specified in WP2) will set the guidelines to the extent of the feature extraction and classification techniques that will be employed.

In order to develop reliable results, this task will be developed in two stages. In the first stage, an initial algorithm version will be implemented to explore the feasibility of extracting low and meso level mental commands (i.e., zoom, rotate, crop, right, left, etc) from brain signals. Additionally, the user individual characteristics and the effect of these on the way the brain is functioning will be studied extensively. For instance, it has been reported that age and gender affects the amplitude and latency of the P300 component (Gong et al., 2011). Regarding the MAMEM specific target groups, i.e., the PD and the MD group, the effect of the user medication on the brain activity (e.g., in the PD case, Levodopa induces modulation of beta and theta oscillations (Alonso-Frech et al., 2006)) will be explored to calibrate the parameters of the EEG signal processing algorithms. During the second stage, the parameters of the algorithms will be fine-tuned according to the results of the first stage experiments conducted with patients in WP6 towards the enhancement of the algorithms’ robustness.

T3.3 – Algorithms for stress detection via bio-measurements (CERTH, UNI KO-LD) [M8-M18, M24-M32]

Stress context will be used in two ways: a) triggering mechanism to alleviate users through an assistive dialogue, or b) system evaluator. In the first case, assistive dialogues will be presented when the patient is recognized as being stressed while trying to perform an action. The questions in the dialogue will have either the “Yes-No” form for example to undo the last action, e.g. “undo last click?”, or the form of a multiple choice dialogue to perform a difficult action from a set of predefined difficult actions, “Do you mean  drag-n-drop or zoom-out?”. The form of the trigger will depend on the event related potentials of the brain signal after or before an action is taken. In the second case, the system evaluator will be a service that evaluates the several prototypes of the project. By measuring stress levels, we will be able to have an unbiased measurement about how well each prototype performs and which interaction paradigm is preferable.

T3.4 – Novel paradigms for multi-modal interaction (UNI KO-LD, CERTH) [M12-M18, M28-M32]

In this task, MAMEM will develop semantic widgets of two kinds. First, we will work on fusion techniques to integrate the different sensory inputs to increase the robustness of standard human-computer controls at a meso level, such as the ones described in T3.1 and T3.2. These meso level controls will be necessary for environments that need to be used as they are. For this task, we need to integrate and reconcile different sensory input in order to discover the intended meaning. For instance, dwelling with one’s eyes on an item may not be sufficient to “click” on it, but additional EEG input may finalize this task. For software environments that are highly malleable, such interaction paradigms will much better support the ultimate objectives of the target end users. Additionally, for the integration of modalities MAMEM plans to utilize multimodal fusion techniques, such as the ones used by hybrid-BCIs (Gürkök and Nijholt 2012). The multimodal fusion will be executed in two main levels, i.e., the feature and the decision level. On the feature level, EEG data will be combined with bio-feedback data, e.g. the HR and GSR, to infer the mental and emotional state of the user using naive Bayes and Fisher’s discriminant classifiers. On the decision level, the scores of the EEG, the eye-tracker and the bio-measurement classifiers will be weighted in proportion to the reliability and combined. Data from each modality will be modeled separately (using a model such as the Hidden Markov Model) and the resulting models will be fused together using a probabilistic fusion model that is optimal according to the maximum entropy principle and a maximum mutual information criterion (Zeng et al. 2009).

The second kind will incorporate controls at a higher level, such as select-n-out-of-m items, or predict and suggest the sequence of actions that are required to complete a task based on previous usages logs (Pickhardt et al., 2014). These interaction paradigms will be devised to be closer to the way humans plan their course of actions than the machine way of functioning that involves meaningless repetitions. For this case, we will have to analyze the most typical operations performed in the context of multimedia management and authoring tasks, so as on the one hand to dismantle the operation into its constituent basic controls steps (i.e., low-level and meso-level controls) and on the other hand to discover if the same process can be articulated using more elaborated control patterns (e.g., high-level controls consisting of a sequence of elementary control steps). Finally, in this task, contextual sensory input coming from bio-measurements will be used to determine whether the user is in a normal use situation or whether he had further intentions that were either beyond controlling the software or that changed the characteristics of his software use.

The exact set of novel interaction paradigms will be initially specified during the first months of this task and will be later on revised based on the results of the first stage experiments. Some of the novel interaction paradigms that have been already identified (cf. Section 1.1.2) include, gaze-based zooming and attention focus, concentration and mind-based activity selection, lazy response on abrupt and flickering signal-commands, auto-complete suggestions based on previous usage logs and select-n-out-of-m items.