Perceptual grouping constrains inhibition in time-based visual selection

Zupan, Zorana; Watson, Derrick G.

doi:10.3758/s13414-019-01892-4

Perceptual grouping constrains inhibition in time-based visual selection

40 Years of Feature Integration: Special Issue in Memory of Anne Treisman
Open access
Published: 24 December 2019

Volume 82, pages 500–517, (2020)
Cite this article

Download PDF

You have full access to this open access article

Attention, Perception, & Psychophysics Aims and scope Submit manuscript

Perceptual grouping constrains inhibition in time-based visual selection

Download PDF

1338 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

In time-based visual selection, task-irrelevant, old stimuli can be inhibited in order to allow the selective processing of new stimuli that appear at a later point in time (the preview benefit; Watson & Humphreys, 1997). The current study investigated if illusory and non-illusory perceptual groups influence the ability to inhibit old and prioritize new stimuli in time-based visual selection. Experiment 1 showed that with Kanizsa-type illusory stimuli, a preview benefit occurred only when displays contained a small number of items. Experiment 2 demonstrated that a set of Kanizsa-type illusory stimuli could be selectively searched amongst a set of non-illusory distractors with no additional preview benefit obtained by separating the two sets of stimuli in time. Experiment 3 showed that, similarly to Experiment 1, non-illusory perceptual groups also produced a preview benefit only for a small number of number of distractors. Experiment 4 demonstrated that local changes to perceptually grouped old items eliminated the preview benefit. The results indicate that the preview benefit is reduced in capacity when applied to complex stimuli that require perceptual grouping, regardless of whether the grouped elements elicit illusory contours. Further, inhibition is applied at the level of grouped objects, rather than to the individual elements making up those groups. The findings are discussed in terms of capacity limits in the inhibition of old distractor stimuli when they consist of perceptual groups, the attentional requirements of forming perceptual groups and the mechanisms and efficiency of time-based visual selection.

Searching for illusory motion

Article 22 May 2019

Ian M. Thornton & Sunčica Zdravković

Exogenous attention during perceptual group formation and dissolution

Article 10 November 2016

Fahrettin F. Gonen & Haluk Ogmen

Testing the effects of perceptual grouping on visual search in older adults

Article Open access 10 November 2022

Li Jingling & Sung-Nan Lai

Introduction

Perceptual grouping enables humans to perceive discrete components as parts of a single object by establishing an interrelation of elements to form a particular shape. According to the seminal work of Gestalt psychologists in the early 20^th century (e.g., Koffka, 1935), perceptual grouping is likely to occur within the early stages of the visual system and requires few, if any, attentional resources (see Kimchi & Peterson, 2008; Kimchi & Razpurker-Apfeld, 2004; Moore & Egeth, 1997; Shomstein, Kimchi, Hammer, & Behrmann, 2010). However, other work has suggested that attentional resources are required for the formation and perception of certain types perceptual groups (e.g., Driver, Davis, Russell, Turatto, & Freeman, 2001; Trick & Enns, 1997; Li, Cave, & Wolfe, 2008). One implication of this is that the allocation of cognitive resources when grouping stimulus elements together might impair other processes that also require resources for their operation. In the current study, we examine for the first time whether the requirements for perceptual grouping of stimulus elements may compromise the ability to ignore such irrelevant stimuli and efficiently prioritize new stimuli in time-based visual selection (Watson & Humphreys, 1997). This is an important question given the large number of grouping cues that can exist in real-world scenes that may impact how visual search operates in temporal contexts.

Time-based visual selection refers to the ability to enhance visual search efficiency when distractor stimuli are temporally separated (Watson & Humphreys, 1997). The operation of this ability can be demonstrated using a visual search task (e.g., Treisman & Gelade, 1980) in which one set of distractors is presented before a second search set which contains the target. This condition is called the preview condition, reflecting the fact that an initial set of irrelevant distractors is previewed before new stimuli are added (see also Treisman, Kahneman & Burkell, 1983, who examined the influence of pre-existing distractors on the processing of a single new item). Search efficiency in the preview condition is typically compared with that in a full-element baseline (FEB) condition in which all stimuli are presented simultaneously and to a half-element baseline (HEB) which is equivalent to searching through only the newly arriving search set. If preview search efficiency is significantly better than FEB search efficiency, this means that the old items have been excluded and the new items have been prioritized. In addition, preview search efficiency can be similar to that in the HEB indicating that all the old (previewed) items could be excluded (Watson & Humphreys, 1997, 1998; Theeuwes, Kramer, & Atchley, 1998, but see also Gibson & Jiang, 2001; Blagrove & Watson, 2010: Zupan, Blagrove, & Watson 2018, for conditions in which partial preview benefits are found).

The mechanisms underlying the preview benefit have been a topic of some debate predominantly in the first decade since its report. Originally, according to the visual marking account, Watson and Humphreys (1997) proposed that old items are intentionally inhibited by the observer, in a flexible and goal-oriented fashion. Stimuli that are currently present within a scene can be encoded into an online, temporary representation. This representation is then used to coordinate inhibition towards those items which, in turn, provides a selection advantage for subsequently appearing ‘new’ stimuli (Watson, Humphreys & Olivers, 2003).

Stationary stimuli are inhibited via object locations while moving stimuli are inhibited via their features, with both said to rely on the generation and maintenance of a temporary memory representation and capacity-limited resources (Watson & Humphreys, 1997, 1998; Humphreys, Watson, & Jolicoeur, 2002; see also Andrews, Watson, Humphreys, & Braithwaite, 2011). In contrast, Donk and colleagues (e.g., Donk & Theeuwes, 2001, 2003, Donk, 2006, Donk & Verburg, 2004; Donk, 2017) have argued that the preview benefit is a result of automatic capture by luminance transients generated by the newly arriving items. Finally, Jiang, Chung and Marks (2002a) have suggested that time-based visual selection emerges because of the temporal asynchrony between the old and new items, allowing attention to be applied to a single temporal group. Recently, Al-Aidroos, Emrich, Ferber, and Pratt (2012) have also shown that at small display sizes, time-based visual selection may be supported by, or indeed reliant on visual working memory (VWM) processes.

The emerging view is that numerous mechanisms likely play some role in generating a preview benefit. For example, bottom-up accounts suggest that the preview benefit is due to automatic attentional capture by abrupt luminance transients. This is based in part on the finding that the preview benefit is abolished when stimuli are isoluminant with their background and do not therefore generate abrupt luminance signals (e.g., Donk & Theeuwes, 2001, 2003 but see also Braithwaite, Hulleman, Watson, & Humphreys, 2006; von Mühlenen, Watson, & Gunnell, 2013). In contrast, a role for a limited capacity inhibitory mechanism comes from: (1) Findings evidencing inefficient search in preview conditions when performing a dual-task (Watson & Humphreys, 1997; Humphreys et al., 2002) and during the attentional blink (Olivers & Humphreys, 2002); (2) Experiments in which detecting a probe-dot is more difficult if it falls at the location of an old item compared with falling at the location of a new item (Watson & Humphreys, 2000; Humphreys, Stalmann, & Olivers, 2004; Osugi, Kumada, & Kawahara, 2009; but see also Agter & Donk, 2005); (3) Results showing that location or feature-based changes to old items can destroy the preview benefit (Watson & Humphreys, 1997; Zupan, Watson, & Blagrove, 2015) unless the semantic meaning of the objects are maintained (Osugi, Kumada, & Kawahara, 2010), 4) Evidence for the carry-over of feature-based inhibition from old items to new items that share a common property (Braithwaite, Humphreys, & Hodsoll, 2003, 2004; Andrews et al., 2011; see also Donk, 2017), and 5) the finding of flexible modulation of time-based visual selection when it is inconsistent with the goal state (Watson & Humphreys, 2000) or when contextual factors, such as the presence of highly salient targets, do not require it (Zupan et al., 2015). The most likely position is that the active inhibition of old stimuli helps to amplify the signals associated with bottom-up mechanisms related to the appearance of new items.

Despite being a resource-limited mechanism, past RT-based studies have demonstrated that time-based visual selection has the capacity to exclude at least 30 old items (Jiang, Chun, & Marks, 2002b), with no upper limit established yet. Furthermore, up to at least 15 new items can be given priority (Theeuwes et al., 1998). However, other work has uncovered limits with respect to some performance measures. For example, Emrich, Ruppel, Al-Aidroos, Pratt, and Ferber (2008) found that eye movements were prioritized only for approximately four new items after which they became just as likely to be made to both old and new items. This eye movement-based limit was apparent despite RTs indicating a standard, full preview benefit (see also Watson & Inglis, 2007). Watson and Kunar (2012) found that the capacity to prioritize and respond to all new items was about 6–7 items. Moreover, this depended on the color and shape homogeneity of the displays. Specifically, when all the old items were the same color or shape, the capacity for prioritizing multiple new items increased. However, note that this feature-based grouping benefit was observed with relatively simple stimuli not requiring perceptual grouping, with grouping applied at the level of stimuli within the display. It is not known how time-based visual selection is impacted when grouping is applied at the level of stimulus perception.

Aims of the present study

Our main aim was to examine the influence of perceptual grouping on the occurrence and efficiency of time-based visual selection. On the one hand, perceptual grouping might allow a greater number of old items to be ignored by allowing them to be grouped and suppressed as a single entity (see Duncan & Humphreys, 1989) rather than as many individual elements. In this case, we would expect time-based visual selection to operate similarly, or perhaps be more efficient than when the search displays are comprised of single elements that cannot be grouped. On the other hand, allocating resources to perceptual grouping processes (e.g., Trick & Enns, 1997; Li et al., 2008) might reduce the resources available for top-down inhibition which would result in a reduced or eliminated preview benefit (given that active, time-based inhibition of old items is a resource-demanding process; Watson & Humphreys, 1997).

To assess these possibilities, in Experiment 1 we examined search performance in time-based visual selection in conditions in which individual elements could be grouped to form illusory stimuli (i.e., Kanizsa-type illusory contours). Experiment 2 examined search performance in time-based visual selection when the previewed stimuli did not form illusory contours, but the newly added items did. In this situation, any reduction in preview search efficiency would be the result of illusory contour formation during the active search part of the task and not during the preview period. Experiment 3 assessed search performance in time-based visual selection for pacman grouped on the basis of spatial proximity that did not produce illusory contours in either the preview or search displays. Finally, Experiment 4 evaluated the extent to which illusory contour stimuli could be suppressed when changes to the individual elements of the perceptual group were made. Small local changes in the elements might be disruptive if the identity of the entire illusory object is vital for the inhibitory template or have no effect if inhibition is based on individual elements and is insensitive to more global properties (cf. Watson & Humphreys, 2002, 2005; Watson, Braithwaite, & Humphreys, 2008).

Experiment 1: Time-based visual selection with illusory stimuli

Kanizsa-type illusory contours are one of the best demonstrations of how the human visual system groups separate elements into coherent objects and induces a subjective experience of a solid shape from an incomplete, fragmented stimulation (Fahle & Koch, 1995). The main aim of Experiment 1 was to determine to what extent perceptual groups that induce a subjective experience, such as Kanizsa-type illusory contours, can be effectively inhibited in time-based visual selection. Following Li and colleagues (2008), we used a visual search task consisting of a vertical target and horizontal distractor Kanizsa-type rectangles. Similar to past time-based selection studies, there were three main conditions: a HEB, a FEB, and a preview condition. In the preview condition, half of the distractors were presented before the second set was added. The target was only ever present in the second set. Performance in this preview condition was compared with that from the associated HEB and FEB conditions.

Method

Participants

Participants were 18 undergraduates (17 female) from the University of Warwick who received course credit or payment for participating. Their ages ranged from 18–25 years (M = 20.17, SD = 2.18). All participants reported normal or corrected to normal visual acuity in this and the remaining experiments.

Stimuli and apparatus

Stimuli were presented on a 22” LCD panel at a resolution of 1680 × 1050 pixels. A custom written computer program generated the stimuli and recorded participants’ responses. The target was a vertical rectangle defined by Kanizsa-type illusory contours and the distractors were horizontal Kanizsa-type rectangles displayed against the white background of the computer monitor. Four black pacman shapes formed the Kanizsa-type rectangles that measured 25.2 × 37.8 mm (2.53° × 3.79° of visual angle). Each pacman had a diameter of 16.8 mm (1.69°)^{Footnote 1}. Search displays were generated by placing the stimuli randomly into the cells of an invisible 6 × 6 matrix, with an equal number of Kanizsa-type rectangles presented on the left and right side of the display. The final search displays of the preview and FEB conditions contained 4, 8, or 16 illusory items (i.e., the number of Kanizsa-type rectangles). An example of a preview search trial is illustrated in Fig. 1. The HEB contained 2, 4, or 8 items. The target, when present, never fell in the center two columns of the display (i.e., it only every appeared in columns 1, 2, 5, or 6). This ensured that the location of the target was always unambiguously to the left or right of the display center. The monitor was positioned at eye level at a viewing distance of approximately 60 cm, although participants’ head movements were not constrained.

Design and procedure

There were three main conditions: a half-element baseline (HEB), a full-element baseline (FEB) and a preview condition. A trial in the FEB condition consisted of a blank screen for 500 ms, followed by a fixation cross for 750 ms, after which a search display of 4, 8, or 16 items was presented. Search displays remained visible until the participant indicated the location of the target by pressing the Z key if the target was on the left side of the display, or the M key if it was on the right side of the display, on a standard computer keyboard (see e.g., Blagrove & Watson, 2010, 2014; Zupan et al., 2015, for previous uses of this approach). The participant’s response triggered the next trial. Incorrect responses were indicated by displaying the word ‘incorrect’ as visual feedback. The HEB was essentially the same as the FEB, but consisted of display sizes of 2, 4, or 8 items. In the preview condition, half of the stimuli for a particular display size were presented for 1000 ms before the remaining half (which would contain the target) were added. In all conditions, the fixation cross remained visible throughout the trial other than during the blank pre-trial interval. Participants were told to try and ignore the distractors presented in the preview set, as the target would always appear in the second set of items.

Each condition contained 120 target trials. There were also 12 (10%) catch trials on which there was no target (the target was replaced by a distractor). Participants responded to these trials by pressing the space bar on the keyboard. The purpose of the catch trials was to ensure that participants did not search only half of the display, by concluding that the target was on the opposite side if not present on the display side they searched (see e.g., Al-Aidroos et al., 2012; Blagrove & Watson, 2010; for previous uses of this method). Trials within a block were presented in a random order and condition order was counterbalanced across participants. Directly before each block of experimental trials there was a practice block consisting of 12 trials.

Results

As in previous time-based visual selection studies, search efficiency (as measured by search slopes) in the preview condition was compared with that in the two baseline conditions. In the FEB and preview conditions, slopes were calculated using the actual display size. In the HEB condition, slopes were calculated using twice the true number of items. The search rate in the HEB then represents the time that would be needed to search through only the new items in the preview condition. Therefore, if search in the preview condition corresponds to that of the HEB, the old items have been fully excluded from search. However, if search rates in the preview condition match those of the FEB, the old items have not been ignored and were included in the search.

Reaction times

Trials with RTs less than 200 ms or greater than 10 s were removed as outliers (0.01% of the data) and catch trials were also removed. Search slopes are presented in Fig. 2, and overall mean correct RTs as a function of display size are presented in Fig. 3. Initially, the data were analyzed using a 3(Condition: HEB, FEB, Preview) × 3(Display size: 4, 8, or 16 items) repeated-measures ANOVA. This revealed a significant main effect of condition, F(2,34) = 42.92, MSE = 41023.37, p < .001, display size, F(1.14,19.38) = 226.86, MSE = 47031.34, p < .001, and a significant Condition × Display Size interaction, F(3,50.99) = 10.48, MSE = 9494.61, p < .001. As shown in Fig. 2 and Fig. 3, preview search performance appeared to be similar to that of the HEB for small display sizes of four and eight items and closer to FEB search performance at display sizes of eight to 16 items.

Following previous work, we compared search efficiency in the preview condition with that in each of the two baselines in order to determine if a preview benefit had occurred. In addition, given the apparent difference between efficiency at large and small display sizes, we also assessed search efficiency at the smaller (4–8) and larger (8–16) display sizes individually (see Fig. 2 for search slopes)

HEB vs. Preview

Overall RTs were longer in the preview condition than in the HEB, F(1,17) = 9.59, MSE = 23785.68, p < .01, and increased with display size, F(1.04,17.69) = 200.98, MSE = 33157.56, p < .001. The Condition × Display Size interaction was also significant, F(1.25,21.25) =10.92, MSE = 8203.22, p < .005, indicating that search was less efficient overall in the preview condition than in the HEB. Considering only small display sizes (4–8 items), RTs were shorter for a display size of four than eight, F(1,17) = 204.63, MSE = 3556.32, p < .001, however, neither the main effect of condition, F(1,17) = 3.82, MSE = 10136.62, p = .067, nor the Condition × Display size interaction proved significant, F < 1. Considering only large display sizes (8–16 items), RTs were longer for a display size of 16 than eight items, F(1,17) =186.77, MSE = 16052.18, p < .001. Overall RTs were longer in the preview condition than in the HEB, F(1,17) = 9.47, MSE = 25057.28, p < .01, and RTs increased more from eight to 16 items in the preview condition than in the HEB, F(1,17) = 14.20, MSE = 5883.53, p < .005.

FEB vs. Preview

Overall RTs were shorter in the preview condition than in the FEB, F(1,17) = 38.29, MSE = 46404.27, p < .001, increased with display size, F(1.21,20.48) = 179.25, MSE = 44849.85, p < .001, and the Condition × Display size interaction was also significant F(2,34) = 5.09, MSE = 8742.74, p < .05. Considering small display sizes (4–8 items), RTs were faster overall in the preview condition than in the FEB, F(1,17) = 47.87, MSE = 23011.23, p < .001, and were faster for a display size of four than of eight items F(1,17) = 106.04, MSE = 12367.99, p < .001. There was also a significant Condition × Display Size interaction, F(1,17) = 11.95, MSE = 7067.09, p < .005, indicating more efficient search in the preview condition. At the large display sizes (8–16 items), overall RTs were shorter in the preview condition than in the FEB, F(1,17) = 31.16, MSE = 50391.45, p < .001, and increased between eight and 16 items, F(1,17) = 185.20, MSE = 20186.19, p < .001. However, the Condition × Display Size interaction did not approach significance, F(1,17) = 1.04, MSE = 7304.89, p = .323.

Error rates

Overall error rates were low (2.75%) and as shown in Table 1, the general pattern of errors was consistent with the RT data. A two-way repeated measures ANOVA, with condition (HEB, FEB, preview) and display size as factors revealed that there were more errors in the preview and FEB conditions than in the HEB, F(2,34) = 12.28, MSE = 5.43, p < .001, the number of errors increased with display size, F(2,34) = 24.74, MSE = 11.39, p < .001, and there was a significant Condition × Display Size interaction, F(4,68) = 8.09, MSE = 7.42, p < .001.

Table 1 Mean percentage error rates for Experiments 1–4

Full size table

Given that most errors were found in the preview condition but that different search patterns were found at small and large display sizes, we conducted an analysis for each display size separately to identify any speed/accuracy trade-offs. For small display sizes of four and eight items, neither the main effects of condition, F(2,34) = 2.51, MSE = 1.87, p = .096, or display size, F(1,17) = 3.83, MSE = 5.45, p = .067, nor the Condition × Display Size interaction, F < 1, reached significance. At large display sizes (8–16 items), there was a significant main effect of condition, F(2,34) = 12.22, MSE = 8.09, p < .001, display size, F(1,17) = 28.77, MSE = 11.02, p < .001, and a significant Condition × Display Size interaction, F(2,34) = 8.71 , MSE = 10.13, p < .005; there was a greater number of errors in the preview condition at the largest display size followed by the FEB and HEB conditions. Given that there was no reliable difference in (RT-based) search efficiency between FEB and preview at large set sizes, the conclusions based on the RT data were not compromised by a speed–accuracy trade-off. Indeed, the error rates further suggest the lack of a preview benefit at the larger display sizes. More errors in the preview at large display sizes in comparison to FEB may also be indicative of greater resource use in the preview condition, as a consequence of trying to suppress the previewed items as well as performing perceptual grouping.

The overall error rate on catch trials was low (2.62%), confirming that participants were searching over the whole display. Given the small number of catch trials, these data were not analyzed further.

Discussion

The search slopes in the FEB condition numerically replicate those of Li and colleagues (2008), providing a useful confirmation that illusory contour stimuli do not guide attention efficiently. However, of most interest, Experiment 1 found that search efficiency in time-based visual selection was reduced with a greater number of illusory contour distractor items, suggesting that there are capacity limitations when preview search is performed with illusory contour stimuli. Specifically, based on search slope measures, a preview benefit was present for relatively small display sizes, but absent at larger display sizes. This reduction contrasts with previous findings of preview search with simple stimuli (such as letters or simple shapes), in which a robust preview benefit has consistently been demonstrated (see Watson et al., 2003, for a review), spanning up to 30 old items (e.g., Jiang et al., 2002b).

These findings lend support to high-level accounts for both time-based visual selection (Watson & Humphreys, 1997) and the detection of illusory figures (e.g., Grabowecky & Treisman, 1989; Li et al., 2008). In terms of time-based visual selection accounts, a pure automatic onset capture account (Donk & Theeuwes, 2001, 2003) predicts that new items would attract attention irrespective of their complexity and display size. In contrast, reduced efficiency is consistent with a resource-limited visual marking account in which processes that consume attentional resources might leave fewer resources available for the generation of an inhibitory template and the coordination and application of inhibition (Watson & Humphreys, 1997; Humphreys et al., 2002). In terms of accounts relating to illusory figure detection, Li and colleagues (2008) suggested that the attentional costs of perceptual grouping and illusory object formation might be the underlying cause of the relatively slow search for illusory stimuli. Consistent with this possibility is the finding that the preview benefit was intact at small display sizes but absent at the largest. This pattern would be expected if perceptual grouping costs increase as the number of stimuli that have to be grouped increase.

Note that although the search slopes did not differ between the preview and FEB conditions at the larger display sizes, overall RTs were nonetheless shorter in the preview condition. However, such reductions in overall RTs do not necessarily reflect the exclusion of old distractors (which would produce a search slope difference). Instead, such overall differences could be the result of changes in alertness, the presence of a warning signal or arousal effects (see Watson & Humphreys, 1997). That is, the onset of the preview items might have a role in preparing and alerting subjects to the upcoming search display with a consequent overall reduction in their response initiation time.

It is also worth noting that, since items were placed at random, search efficiency might have been reduced at large display sizes in the FEB condition due to large set sizes having more proximal neighbors than small set sizes (e.g., Wolfe, Cave, & Franzel, 1989). However, given that the crucial issue here was a comparison between preview and FEB and that the final displays were the same, this would not account for the lack of difference between the two conditions. Whether or not the search rate in the FEB was more or less efficient, we would expect the preview search rate to be approximately half the rate observed in the FEB condition (e.g., Watson & Humphreys, 1997). Furthermore, even though density was not controlled, the search rates of ~ 50 ms/item in the FEB condition were similar to those of past research using Kanizsa-illusory rectangles in which density was controlled (Li et al., 2008).

To investigate whether the formation of illusory contours may have reduced preview search efficiency rather than perceptual grouping, in Experiment 2 we examined whether a preview benefit would emerge at large display sizes in conditions in which there was no opportunity to construct illusory objects during the preview period.

Experiment 2: Time-based visual selection with non-illusory perceptual groups

Experiment 1 demonstrated that only a small number of Kanizsa-type illusory contour distractors can be ignored in time-based visual selection. One possibility is that the perception of Kanizsa-type illusory contours consumes attentional resources (e.g., Li et al., 2008), thus limiting the capacity of the top-down inhibitory component of the preview benefit (Watson & Humphreys, 1997; Humphreys et al., 2002). If this is the case, eliminating illusory contours may free up those resources required for inhibition and thus enable a preview benefit to be obtained at the larger display sizes. The main aim of Experiment 2 was to test this possibility. This was achieved by setting the orientation of each stimulus pacman within the preview display to a randomly chosen angle. The items forming the second set of stimuli were the same as those of Experiment 1, consisting of Kanizsa-type illusory contour distractors and a Kanizsa-type illusory contour target.