Analysing survey data is a core focus of iNZight, as analysis of survey data is not so different from an IID sample---once you have figured out the survey design. iNZight now offers three ways of passing survey design information into iNZight, two of which make use of a (new) survey specification (based heavily on
For this demonstration, we will be using the well known
apiclus1 dataset that comes with the 'survey' package. If you're following along with iNZight, you can load the data for methods 1 and 2 from File > Example data, choosing 'survey' package, and
apiclus1 for the data:
The obvious place to specify the design is from the Dataset > Survey, which displays a set of inputs for users to select which variables are which in the survey design. Here's what the
apiclus1 specification should look like:
It might not look like much has happened, but iNZight will now use the survey design in all operations---plots, summaries, even regression modelling! Try it out: make a graph of
api00 and notice that in place of iNZight's usual dot plot, you get a histogram instead. And if you've enabled the code panel, you'll see that the call to
inzplot() is using the
design argument (instead of
You may also obtain a numerical summary of
api00 by clicking on the Get Summary button. This will display information about the variable as well as the design specification of the data. The numeric summaries are estimates of the population values, and so---unlike standard data---have associated standard errors displayed underneath (i.e., the median
api00 of the population is etimated to be 652 with a standard error of 36.56).
You may have noticed the 'Read from file' button in the survey specification window. This is an alternative method of setting the specification, which is particularly useful if you aren't familiar with the survey design (and it saves you from having to look it up each time and, likely, making a mistake!). This file uses
TOML format, which is machine- and human-readable, and specifies the variables using syntax closely resembling arguments to the
survey::svydesign() function. Here's what the file looks like for the
# apiclus1.svydesign type = "survey" ids = "dnum" weights = "pw" fpc = "fpc"
This should be given a
.svydesign extension. After clicking the Read from file button, browse for the file and load it. Again, nothing will happen (especially if you didn't remove the previous design first!), but in the background iNZight has parsed the file (using the helper function
iNZightTools::import_survey, see end of Vignette for R usage) and is now using the design.
In a typical scenario, researchers would provide a data file containing the survey data (a
.csv, perhaps), and with this they might also need to provide a
.svydesign specification file. To make importing survey data even easier, the survey specification accepts another argument: a file path. If you're following along, you'll first need to save the
apiclus1 data into the same folder as the
apiclus1.svydesign file from method 2, which can be done from File > Export data, clicking 'open' to browse for the save location, and specifying CSV for the file type. Now edit the specification file by adding an extra line:
# apiclus1.svydesign data = "apiclus1.csv" # this should be whatever you called the file when you saved it type = "survey" ids = "dnum" weights = "pw" fpc = "fpc"
Now you can close iNZight and start it up again. Now, instead of loading the example data, or even the data, we're just going to load the survey specification file by going to File > Import data and browsing for the
Now, just like that, you've loaded a complex survey data set with the design information already attached!! Go ahead and make some graphs, produce summaries, perform some simple hypothesis tests (from the Get Inference button) and, if you're feeling adventurous, fit some regression models (Advanced > Model Fitting).
iNZight knows how to handle stratified and/or clustered survey designs (i.e., what you'd load using
survey::svydesign()), and replicate designs (
survey::svrepdesign()), and can also handle post stratification or calibration. Again, there are UI windows to do all of this, but since we're here to talk about the specification format, here's what those look like.
data = "path/to/data.csv" # either relative to the specification file, or absolute ids = 1 # default 1; or "clus"; or "clus1 + clus2" strata = "stratvar" # the stratification variable weights = "wts" # the weighting variable fpc = "fpc" # any finite population correction variable(s) nest = false # or true
Replicate designs use different arguments, and most importantly specify the columns containing the replicate weights, and some other details. The replicate weights can either be a vector of names, or a regular expression (which is preferred as it is easier that typing out 20, 30, or 80 variable names!).
data = "path/to/data.csv" weights = "wt0" # the main weighting variable repweights = "^w[0-2]" # this will match any variables starting with a `w` # followed by a 0, 1, or a 2 and then anything # (e.g., w01, w14,and w20, but not w30) reptype = "other" # the method used to generate the replicate weights # (e.g., "JK1", "BRR", etc) scale = 1 # scaling constant for variance rscales = 1 # either a single number if same for all replicate weights, # or a vector for each replicate weight: # rscales = [2, 0.45, ...]
If available, this information can be automatically included in the survey by passing additional lines to the specification file. This uses some TOML-specific syntax, but should be fairly straightforward (it's also the reason we needed to use TOML and not something simpler). In this example, we calibrate using
sch.wide by passing in the raw population values (behind the scenes iNZight formats these correctly):
# apical.svydesign data = "apiclus1.csv" type = "survey" ids = "dnum" weights = "pw" fpc = "fpc" [calibrate.stype] E = 4421 H = 755 M = 1018 [calibrate."sch.wide"] # we must include quotes around variable names containing dots No = 1072 Yes = 5122
The magic all happens in the 'iNZightTools' package, which has a handy 'import_survey' function.
library(iNZightTools) (svy <- import_survey('apiclus1.svydesign')) #> Survey design specification: #> * ids: dnum #> * fpc: fpc #> * weights: pw #> * type: survey #> #> Design object: #> 1 - level Cluster Sampling design #> With (15) clusters. #> survey::svydesign(ids = ~dnum, fpc = ~fpc, weights = ~pw, data = data) summary(survey::svyglm(api00 ~ api99, design = svy$design)) #> #> Call: #> svyglm(formula = api00 ~ api99, design = svy$design) #> #> Survey design: #> survey::svydesign(ids = ~dnum, fpc = ~fpc, weights = ~pw, data = data) #> #> Coefficients: #> Estimate Std. Error t value Pr(>|t|) #> (Intercept) 95.28483 14.98802 6.357 2.51e-05 *** #> api99 0.90429 0.02361 38.301 9.38e-15 *** #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> #> (Dispersion parameter for gaussian family taken to be 768.7169) #> #> Number of Fisher Scoring iterations: 2 import_survey('apical.svydesign') #> Survey design specification: #> * ids: dnum #> * fpc: fpc #> * weights: pw #> * type: survey #> * calibrate: sch.wide + stype #> #> Design object: #> 1 - level Cluster Sampling design #> With (15) clusters. #> survey::calibrate(design_obj, ~sch.wide + stype, c(`(Intercept)` = 6194, #> sch.wideYes = 5122, stypeH = 755, stypeM = 1018))