Abstract. Every second millions of users enjoy content streaming on diverse video players (e.g., Web, Apps, social networks) and create billions of interactions within online video, such as play, pause, seek/scrub. This collective intelligence of video viewers might be leveraged into useful information for improved video navigation. For example, we can accurately detect and retrieve interesting scenes through the analysis of the aggregated users’ replay interactions with the video player. Effective crowdsourcing of video interactions is grounded on previous work in multimedia, user modeling, and controlled user experiments. These research issues are described for the case of user-based detection of video thumbnails that stand for the semantics of the video. Moreover, we demonstrate the respective experimental environment with a focus on educational and user generated (e.g., how-to, lecture) videos.