We perceive an image all at once, as the millions of neural cells in the visual cortex reduce our sensation to objects and shapes we can then recognize by matching them to patterns in our memory.
It occured to me the other day that this is also the purpose of short-term memory (STM), except that it collects sensations (like speech) that occur over a range of time, rather than space. At any given time, STM is sensed in the same way visual information is sense, and is also recognized.