Document Actions

Eye movements on natural videos: Predictive power of different low-level features

by Michael Dorr — last modified 2007-11-02 14:22

Presented at the European Conference on Visual Perception 2007

Eleonora Vig, Michael Dorr, Thomas Martinetz, and Erhardt Barth

We used eye movements recorded for 54 subjects, who viewed two high-resolution videos of outdoor scenes, to define an empirical saliency (ES) measure as the density of saccade landing points. We used ES to label a dataset of local movie blocks (17x17x8 pixels extracted from the original videos) as "salient" and "non-salient" (1000 samples per class). We then computed different representations: Laplacian, colour opponency, motion, and spatio-temporal curvature K. Next, we used two different classifiers (maximum likelihood on feature-vector length, k-nearest-neighbour on full feature vectors) to classify the movie blocks into the two classes for all representations.

The error rates reflect the predictive power of the different representations. Under all conditions, K produced the lowest error rates. For the movie with many moving objects, motion was second best, but it was worst on the other movie. We conclude that simple low-level predictors can make predictions with only 15% errors, which can be reduced to 9% with a more complex classifier, but this improvement does not generalize to a different movie.

Poster in pdf format.

Sections

Personal tools

Document Actions

Eye movements on natural videos: Predictive power of different low-level features