Efficiently unleash cross-media information?