Chapter 6 Selecting determinants

After identifying determinants and sub-determinants, the next step is to select those (sub-)determinants that are most relevant. The key reason is that resources are finite. This has an impact on the quantity and quality of intervention content that can be developed, but also delivered. The latter is especially relevant in case there are additional costs per participant (e.g., delivering an intervention in a face-to-face setting with a health professional). However, also when the additional costs per participants are low (e.g., when using a digital intervention), then there are still limits in terms of the amount of intervention content that participants can be exposed to. Although intervention content can be delivered in multiple sessions over a longer period of time, this might lead to increased levels of dropout (Rutherford et al. 2013), which also limits exposure to intervention content. So, a selection of (sub-)determinants that are targeted in an intervention is needed before developing intervention content. This selection should be based on established relevance of (sub-)determinants.

6.1 Establishing relevance

Due to a lack of clear guidelines for establishing relevance of (sub-)determinants, a variety of analytical approaches is used. For example, dichotomization of (a determinant of) behavior and then comparing means of (sub-)determinants or conducting regression analyses where (a determinant of) behavior is regressed on relevant (sub-)determinants. Use of these analytical approaches is problematic, in the context of establishing relevance of (sub-)determinans, as explained later. It is necessary to combine two types of analyses when establishing relevance: (1) assessing the univariate distribution of each (sub-)determinant and (2) assessing associations to behavior and/or determinants of behavior.

Assessing the associations of (sub-)determinants with behavior and/or determinants is important: those (sub-)determinants that are not associated to behavior and/or more proximal determinants will often be the least likely candidates to intervene upon. The univariate distributions are important because bimodal distributions may be indicative of subgroups, and strongly skewed distributions have implications for how a (sub-)determinant should be targeted. For example, if a (sub-)determinant is positively associated with behavior but left-skewed, most population members already have the desired value, so it should merely be reinforced in an intervention. Conversely, right-skewed positively associated (sub-)determinants imply a need for change, as most population members do not have the desired value yet. This latter category of sub-determinants would be more viable intervention targets as there is more room for improvement.

Before describing an analytical approach (see 6.2) that combines these two types of analyses and uses confidence intervals and visualization to establish relevance, we first describe the problems with commonly used analytical approaches, such as dichotomization and regression analyses.

6.1.1 Problems with dichotomization

Assessing associations can be done by correlation coefficients (e.g., when assuming interval level data) or by using independent-samples t-test (e.g., with Cohen’s \(d\) as effect size for differences between groups). In the latter case, differences in (sub-)determinants between participants with and without a certain outcome (e.g., behavior, intention) are compared. This dichotomization of behavior or a proximal determinant such as intention leads to information loss and underestimation of variation (Altman and Royston 2006; DeCoster, Iselin, and Gallucci 2009; MacCallum et al. 2002). So, it cannot be recommended to dichotomize an outcome and then compare (sub-)determinants between particpants.

Another reason behind this is that Cohen’s \(d\) point estimates, which are used when comparing differences between groups (e.g., intenders and non-intenders), can vary substantially from sample to sample (Peters and Crutzen 2019). This renders them unfit for determinant selection on the basis of one sample. Although to a lesser extent, the same is true for estimates of means and correlation coefficients (Moinester and Gottfried 2014). In short, accurate parameter estimation is a requirement for determinant selection (see 6.2.1), because comparison between estimates is needed. For example, the required sample for obtaining a medium-sized Cohen’s \(d\) of .5 with a desired 95% confidence interval margin of error (‘half-width’; also referred to as w) of .1 is 1585 (Peters and Crutzen 2019). The required sample for obtaining a medium-sized correlation of .3 with the same w is 320. In other words, accurate estimation of correlation coefficients requires a much smaller sample in comparison with accurate estimation of Cohen’s \(d\). This is another reason, besides information loss and underestimation of variation, to not dichotomize outcomes.

It can be, however, that the outcome of interest is really dichotomous. For example, when asking whether a particpant is vaccinated for disease X. In that case, the analytical approach described later in this chapter can still be used, but it does require a large sample. A question to be asked first is whether the outcome of interest is really dichotomous. In other words, is there an underlying discontinuity or is the outcome (conventionally) treated as such. For example, if the outcome of interest is physical activity, then participants in the study can be categorized as adhering to guidelines on physical activity with regard to the recommended minutes of moderate to vigorous physical activity (MVPA) per day. However, there is no underlying discontuinity. It is more sensible to treat minutes of MVPA per day as a continuous outcome. The same goes, for example, for smoking behavior. While this is commonly treated as a dichotomous outcome when determining success in smoking cessation trials (West et al. 2005), there is no underlying discontinuity. It is merely a dichotomization of the number of cigarettes smoked in a given period.

So, only treat the outcome of interest as being dichotomous if there is an underlying discontuinity. Otherwise it might lead information loss and underestimation of variation and a much large sample is required for accurate estimation of parameters needed for determinant selection.

6.1.2 Problems with regression analyses

Regression analyses are useful to obtain a measure of the total explained variance in an outcome (e.g., R2) based on all (sub-)determinants included in a model. This is indicative of the maximum effect that can be expected of an intervention that succesfully changes all (sub-)determinants. However, in the context of selecting determinants, the regression coefficients provide little information on relevance of a specific (sub-)determinants, because they are conditional upon the other predictors (e.g., other (sub-)determinants) in the model (Azen and Budescu 2003; Budescu 1993).

A convenient feature of regression analysis is that overlap between predictors in their explanation of the outcome is removed from the equation (quite literally, in the case of regression). Squaring a correlation coefficient always yields the proportion of explained variance: if a determinant, for example attitude, and an outcome, for example intention, have a bivariate (i.e. zero-order) correlation of r = .32, that means that they each explain .1 (i.e., .32 x .32) of each other’s variance in the sample. The 95% confidence interval runs from [0.03; 0.19], which gives some idea of how far the explained variance in the population can be expected to deviate from that sample estimate. Another determinant, for example self-identity, has a correlation of r= .47 with intention, and so this determinant explains .22 of intention.

However, attitude and self-identity also correlate with each other (r = .32). It is therefore likely that they also share explained variance in intention. In that case, simply adding together the proportion of intention’s variance they each explain (.1 + .22 = .32) would yield an overestimate of how much intention these determinants explain together.

This correction of overlap in explained variance is very useful, and enables better estimation of the variance explained by all predictors together. However, this overlap between predictors is in itself highly problematic when dealing with the separate regression coefficients of all psychological constructs used as predictors (e.g., overlap between (sub-)determinants of behavior; (Azen and Budescu 2003; Budescu 1993)).

Correlation between (sub-)determinants represents relevant information about human psychology. For example, the two (sub-)determinants may cover the same aspects of human psychology according to their definition. Or alternatively, the (sub-)determinant may be independent but causally related, either because they influence each other (directly or through one or more mediators) or are both influenced by the same third variable. It is hard to empirically distinguish between (sub-)determinants that influence or consist of each other (Peters and Crutzen 2017), and the distinction is irrelevant with respect to the problem that surfaces in regression analyses.

In this case, removing the variance representing this overlap between (sub-)determinants means removing variance that corresponds to aspects of human psychology that fall within the definition of the (sub-)determinant. In other words, removing this shared variance from a (sub-)determinant and only considering variance that is not shared with other (sub-)determinants means that the resulting data series no longer represent the (sub-)determinant as originally operationalised, and therefore, as defined, but an unknown alteration of this (sub-)determinant. Therefore, removing this shared explained variance when estimating the regression coefficients means that these regression coefficients no longer represent the association of each (sub-)determinant to the outcome Instead, they represent the association of some unknown part of each (sub-)determinant with some unknown part of the criterion.

Another way to think about this is by using the formulation often invoked when explaining regression analyses: the regression coefficient expresses the association of a predictor to the criterion holding all other predictors constant. If two predictors overlap in their definition, or, in other words, if the definitions of the constructs represented by the two predictors contain the same aspects of human psychology, then ‘holding all other predictors constant’ means ‘neglecting a part of human psychology.’ This means the resulting situation is unrealistic and can never occur. This also means that the omitted aspects of human psychology are in fact important to predicting the relevant behavior. Therefore, a predictor that represents an important (sub-)determinant of behavior may nonetheless have a small regression coefficient, because an important part of the human psychology as defined in the (sub-)determinant’s definition was omitted from the coefficient.

Thus, because estimates from regression analyses are problematic when establishing determinant relevance, it is better to base such decisions on the bivariate correlations, or more accurately, on the confidence intervals for these correlation coefficients, together with the information about the (sub-)determinants’ distributions and means. We will now illustrate an anlytical approach for efficiently inspecting all this information simultaneously: Confidence Interval-Based Estimation of Relevance (CIBER).

6.2 CIBER: Confidence Interval-Based Estimation of Relevance

CIBER is based on visulation of confidence intervals concerning both means of (sub-)determinants and their estimated association to behavior and/or more proximal determinants of behavior. Before describing how to apply CIBER (see 6.3), we first explain the importance of confidence intervals and the need for visualisation in the context of determinant selection.

6.2.1 The importance of confidence intervals

When inspecting association and distribution estimates (e.g,. correlations and means), the population values are always unknown. The only way to learn about a population is by taking a random sample and inspecting that sample. So, sampling provides a way to ‘look at’ the population, without having access to the whole population. However, sampling, by its random nature, therefore also introduces random variation. This means that whatever is observed in the sample may not reflect the population. In other words, the specic estimate arrived at on the basis of any particular sample has next to no value. It is also necessary to know how accurate the estimate is: how much it can be expected to differ between samples.

This estimation of accuracy is based on the concept of the sampling distribution: the theoretical distribution containing all potential values for any sample estimate, given its (unknown) population value and the sample size. Because the population value is always unknown (otherwise there is no need to sample in the first place), the true sampling distribution is necessarily also known. However, for many parameters that can be estimated from a sample, the shape and spread of the sampling distribution are known. This means that the sampling distribution can be constructed for any hypothetical population value.

The best known example is perhaps the sampling distribution of the mean, which is approximately normally distributed (except for extremely small samples) with a standard deviation equal to the population standard deviation divided by the square root of the sample size. Knowing the sampling distribution’s distribution shape and spread allow computation of intervals that contain, in innite repetitions of the sampling procedure, the population value in a given percentage of the samples: the confidence interval. A wide confidence interval means that the point estimate is very unreliable and can have a substantially different value in a new sample, whereas a narrow confidence interval means that a substantially different value in a new sample is less likely. These properties make them well suited for estimation of population values from sample data.

Therefore, whenever using sample data to draw conclusions for selecting determinants (or anything, really), point estimates should not be used. Instead, also considering estimate accuracy, for example by computing confidence intervals, allows taking the inevitable sampling and error variation into account.

6.2.2 The need for visualization

Adding confidence intervals to association and distribution estimates (e.g,. correlations and means) also means that determinant selection becomes almost an inhuman taks: For each (sub-)determinant, the univariate distribution and mean, as well the lower and upper confidence interval bounds would have to be inspected, as well as the correlation coefficients with behavior and perhaps a proximal determinant of behavior such as intention, again together with the lower and upper confidence interval bounds. Even with only 10 (sub-)determinants, this would mean simultaneously evaluating 60 estimates. Therefore, CIBER is based on data visualization.

Data visualization has three advantages in the context of determinant selection. First, visualization enables mapping the data onto spatial dimensions, facilitating comparison, which is necessary when making selections. Second, visualization foregoes the seeming accuracy and objectivity afforded by numbers (Peters 2017). Given the relative width of most sampling distributions and the subsequent variation that occurs in estimates over samples (Moinester and Gottfried 2014; Peters and Crutzen 2019), caution in basing decisions on the exact computed numbers seems prudent. Third, visualization enables assessing confidence intervals for means in the context of the raw data.

When applying CIBER, confidence intervals are represented using the diamond shapes commonly used for the aggregated effect size in meta-analyses (Peters 2017). Unlike error bars with whiskers, diamonds do not draw attention to the confidence interval bounds. They are an efficient method of representing both the mean and the confidence interval in one shape, allowing both stroke and fill colors, which makes it possible to use the fill color to further facilitate interpretation, and the stroke color to identify, for example, which determinant a shape represents. Another advantage is that it is not easy to see the exact values of the three estimates represented by the diamond (the mean and lower an upper confidence bounds). Although this might not seem like an advantage at first glance, this lack of clarity is consistent with the estimates’ imprecision [i.e., their variation from sample to sample (as described in 6.2.1)]. These diamond plots are then used to visualize the raw data, the point estimate and confidence interval for the mean of the (sub-)determinant, and the point estimate and confidence interval for the assocation with the outcome (e.g., correlation with behavior and/or one or several more proximal determinants of behavior). Each (sub-)determinant, the question used to assess it, as well as the anchors can be shown.

In short, CIBER acknowledges that several metrics need to be combined and interpreted in order for data to become valuable information for selecting determinants.

6.3 Applying CIBER

CIBER is a function in the R package behaviorchange and is also included in the jamovi module Tools for Behavior Change Researchers and Professionals. In this section, we start by explaining how to apply CIBER when using jamovi, as this has an easy-to-use point-and-click interface. Subsequently, we explain how to apply it when using R Studio. The latter uses scripts that allow working with more advanced settings.

One dataset is used throughout this chapter as an example. This data is collected as part of the Party Panel initiative. This semi-panel determinant study is used to map the (sub-)determinants of different nightlife-related risk behaviors each year. The data used in this chapter focuses on (sub-)determinants of protecting one’s ears when exposed to loud music in nightlife settings. Specifically, three behaviors were explored: carrying hearing protection, wearing hearing protection, and buying hearing protection if one did not carry but was exposed to loud music. The study is described in more detail in the full report.

When using jamovi, the data can be downloaded and stored locally. Subsequently, click on the hamburger button in the top-left corner and Open > This PC to use the data in jamovi. This specic dataset is also supplied with the jamovi module itself. So, it can also be opened by clicking on Hamburger button > Open > Data Library > Tools for Behavior Change Researchers and Professionals (see Section #ref(jamovi-supplied-behaviorchange-datasets)).

When using R studio, the data can be imported directly from GitLab using the syntax below.

dat <- read.csv(paste0("",

However, this specific dataset is also embedded within the R package behaviorchange. The dataset can be used by means of the syntax below.

dat <- behaviorchange::BBC_pp17.1

If you want to use your own data, then the syntax below can be used to open a dialog box and select the file containing the data.


6.3.1 Continous outcome

The behavior of interest in this section is wearing hearing protection. The first step is to explore the associations between sub-determinants and their overarching determinant; in this case how behavioral beliefs with regard to wearing hearing protection are associated with attitude towards wearing hear protection. The stem of the questions that were used to assess these behavioral beliefs started with “If I am somewhere where the music is loud, and I wear ear plugs, then….” The leaf of these questions as well as their anchors and the variable names used in the dataset are provided in the table below.

Variable name Question Left anchor Right anchor
epw_AttExpect_hearingDamage …is the chance that my hearing gets damaged… Very small Very big
epw_AttExpect_highTone …is the chance at ringing in the ears the next day… Very small Very big
epw_AttExpect_musicVolume …I hear the music… Very soft Very loud
epw_AttExpect_musicFidelity …I hear the music… Exactly the same Extremely disturbed
epw_AttExpect_loudConversation …I have … trouble with people talking loud. Not at all Much more
epw_AttExpect_musicFocus …I can focus … on the music. Much worse Much better
epw_AttExpect_musicEnjoy …I enjoy the music… Much less Much more

When using jamovi, click on Behavior Change > Confidence-Interval Based Estimation of Relevance (CIBER) to create CIBER plots.

The variables regarding the behavioral beliefs (as specified in the table) need to be dragged to the box (Sub-) determinants. The variable ‘epw_attitude’ contains the direct measurement of attitude (in line with the Reasoned Action Approach) and needs to be dragged to the box Targets. The CIBER plot will now be automatically generated and can be seen on the right side of the screen.

In the screenshot you can see that you can adjust the width of the confidence intervals for means and associations. The CIBER plot on the right will be automatically updated. You can save the CIBER plot by right-clicking on it > Image > Save… The CIBER plot can be saved in different formats: PDF, PNG, SVG, and EPS. The syntax underlying the point-and-click actions is displayed above the CIBER plot and is also automatically updated. Displaying the syntax can be switched on or off by clicking on the kebab-button (i.e., 3 dots) in the top-right corner and tick the box ‘Syntax mode.’ Editing the syntax, which allow working with more advanced settings, can only be done by using R studio.

When using R studio, the syntax below generates the CIBER plot regarding the behavioral beliefs, using the variable ‘epw_attitude’ as outcome.

CIBER plot behavioral beliefs

Figure 6.1: CIBER plot behavioral beliefs

The diamonds in the left hand panel show the item means with 99.99% confidence intervals. The fill color of the diamonds is indicative of the item means - the redder the diamonds are, the lower the item means; the greener the diamonds are, the higher the items means (blue denotes means in the middle of the scale). The dots surrounding the diamonds show the item scores of all participants with jitter added to prevent overplotting. The diamonds on the right hand panel show the association strengths (i.e., correlation coefficients with 95% confidence intervals) between individual beliefs and the direct measure of attitude. The fill color of the diamonds is indicative of the association strengths and their direction - the redder the diamonds are, the stronger and more negative the associations are; the greener the diamonds are, the stronger and more positive the associations are; the grayer the diamonds are, the weaker the associations are. The confidence intervals of the explained variance (R2) of the outcome (in this case the direct measurement of attitude) is depicted at the top of the figure and based on all (sub-)determinants that are included (in this case the behavioral beliefs).

The CIBER plot shows that participants, on average, think that chances of getting hearing damaging and riniging in the ears the next day are relatively low. However, these beliefs are not associated with the direct measurement of attitude. Participants’ expecatations regarding disturbance of hearing music due to ear plugs score on the middle of the scale, but are much stronger associated with the direct measurement of attitude. In general, the explained variance in the direct measurement of attitude by these seven behavioral beliefs is rather limited. This indicates that other sub-determinants (i.e., behavioral beliefs) might contribute to explaining variance in the direct measurement of attitude.

The syntax canbe used to adjust the CIBER plot. For example, changing width of confidence intervals, but also changing the anchors and questions, the title, and the colours used within the CIBER plot as well as the orderning of determinants. The command below opens the helpfile which contains an overview of all the arguments that can be used within the CIBER function.


Two examples of using such arguments are provided below.

  ## Argument that can be used to change width of confidence intervals
  conf.level = list(means = 0.9999, 
                    associations = 0.95)

  ## Argument that can be used to change title of CIBER plot                    
  titlePrefix = "Means and associations (r) with"

The syntax below shows which arguments can be used to save the CIBER plot in a separate file.


The syntax below generates the CIBER plot regarding the direct measurement of attitude, perceived norm, and perceived behavioral control (in line with Reasoned Action Approach) as well as the Self-Report Behavioral Automaticity Index [SRBAI; Gardner et al. (2012)], using the variables ‘epw_behavior’ and ‘epw_intention’ as outcomes. The stroke color of the diamonds (i.e., the “line color”) can be used to differentiate associations between (sub-)determinants and different outcomes. In this example, the diamonds with a purple stroke show the associations with behavior and the diamonds with a yellow stroke show the association with intention.

CIBER plot determinants

Figure 6.2: CIBER plot determinants

The first tenet of pragmatic nihilism (see 1.2) is that psychological variables are usefully considered as metaphors rather than referring to entities that exist in the mind. The overlap in definitions between many theories’ variables means that various theories can likely perform equally well in the prediction of behavior in any given situation, as long as the operationalisations of the theories’ variables cover the relevant aspects of people’s psychology. Considering psychological variables as possibly non-existent, but certainly useful metaphors, equal to their operationalisation for all practical purposes, means that distinguishing whether variables predict each other or contain each other becomes both hard and less relevant to successfully predicting and changing behavior. If experiential attitude and instrumental attitude together form attitude, changing either of them should result in a (smaller) change in attitude. Conversely, if attitude changes, depending on which specific elements of attitude change, one or both of experiential and instrumental attitude also change. Acknowledging pragmatic nihilism justifies the generation of a CIBER plot including all (sub-)determinants (in this case behavioral, normative, and control beliefs as well as the seperate aspects assessed in the SRBAI), using the variables ‘epw_behavior’ and ‘epw_intention’ as outcomes again. To facilitate interpretation of the CIBER plot, the questions and anchors are also provided.

CIBER plot various beliefs

Figure 6.3: CIBER plot various beliefs

Variable name Question Left anchor Right anchor
Injunctive norms And what do you think that these people think if you wear ear plugs if you are somewhere where the music is loud?
epw_NrmInjunct_partner My partner (girlfriend or boyfriend) Very disapproving Very approving
epw_NrmInjunct_bestFriends My best friends Very disapproving Very approving
epw_NrmInjunct_otherFriends My other friends Very disapproving Very approving
epw_NrmInjunct_partyPeople Most people at the party Very disapproving Very approving
epw_NrmInjunct_parents My parents/care takers Very disapproving Very approving
epw_NrmInjunct_siblings My brothers/sisters Very disapproving Very approving
Perceived behavioral control
epw_PBCBeliefs_recognize I find it … to recognize whether the music is so loud that ear plugs are needed. Very difficult Very easy
epw_PBCBeliefs_remember If I am somewhere and I have my ear plugs with me, I find it … to remember to wear them in time. Very difficult Very easy
epw_PBCBeliefs_fit Ear plugs fit me … most of the time. Very unpleasant Very pleasant
epw_PBCBeliefs_fallOut Ear plugs fall out… Not fast from my ears Very fast from my ears
epw_PBCBeliefs_intoxicated If I have used alcohol or something else, the chance that I remember to wear ear plugs is… Much smaller Much bigger
Habit If I go somewhere where the music is loud, then wearing ear plugs is something I do…
epw_Habit_automatic …automatically
epw_Habit_withoutThinking …without thinking
epw_Habit_beforeRealising …before realising
epw_Habit_withoutRemembering …without consicously remembering that I did it

6.3.2 Dichotomous outcome

The function ‘binaryCIBER’ is similar to ‘CIBER.’ Use of this function is only recommended if there is a real underlying discontuinity in the outcome of interest (see 6.1.1). In this case, the outcome of interest is whether participants possessed ear plugs (yes/no response). The syntax below generates a CIBER plot regarding some general beliefs concerning, using the variable ‘epPossession’ as outcome. When using jamovi, make sure to click ‘Targets are binary variables’ below the left hand panel. To facilitate interpretation of the CIBER plot, the questions and anchors are also provided.

The diamonds and dots in the left hand panel have show the the item means with confidence intervals and item scores of all participants, respectively. Two colours (both stroke and fill) are used to dinstinguish participants that scored ‘no’ and ‘yes’ on the outcome of interest (in this case; whether they possessed ear plugs). The two values regarding \(R^2\) are Nagelkerke’s \(R^2\) and Cox-Snell’s \(R^2\) respectively. Cox-Snell’s \(R^2\) is based on the log likelihood for the model compared to the log likelihood for a baseline model. However, with categorical outcomes, its theoretical maximum value is less than 1. Nagelkerke’s \(R^2\) is an adjusted version of Cox-Snell’s \(R^2\) that adjusts the scale of the statistic to cover the full range from 0 to 1 (Nagelkerke 1991).

                            categoryLabels = c('no',
CIBER plot dichotomous outcome

Figure 6.4: CIBER plot dichotomous outcome

Variable name Question Left anchor Right anchor
epGeneralBeliefs_loudnessPreference I appreciate it if music at parties and gigs is… As soft as possible As loud as possible
epGeneralBeliefs_loudnessGenre Does it depend on the music genre how loud you want to hear music at parties and gigs? Not at all Very strongly
epGeneralBeliefs_loudnessTooMuch I think that music at parties and gigs is… Never too loud Always too loud
epGeneralBeliefs_priceFoam I think foam or rubber ear plugs are… Very cheap Very expensive
epGeneralBeliefs_priceSilicon I think universal ear plugs are… Very cheap Very expensive
epGeneralBeliefs_priceCustom I think tailored ear plugs are… Very cheap Very expensive