Analysing survey data is a core focus of iNZight, as analysis of survey data is not so different from an IID sample---once you have figured out the survey design. iNZight now offers three ways of passing survey design information into iNZight, two of which make use of a (new) survey specification (based heavily on survey::svydesign()
).
For this demonstration, we will be using the well known apiclus1
dataset that comes with the 'survey' package. If you're following along with iNZight, you can load the data for methods 1 and 2 from File > Example data, choosing 'survey' package, and apiclus1
for the data:
The obvious place to specify the design is from the Dataset > Survey, which displays a set of inputs for users to select which variables are which in the survey design. Here's what the apiclus1
specification should look like:
It might not look like much has happened, but iNZight will now use the survey design in all operations---plots, summaries, even regression modelling! Try it out: make a graph of api00
and notice that in place of iNZight's usual dot plot, you get a histogram instead. And if you've enabled the code panel, you'll see that the call to inzplot()
is using the design
argument (instead of data
).
You may also obtain a numerical summary of api00
by clicking on the Get Summary button. This will display information about the variable as well as the design specification of the data. The numeric summaries are estimates of the population values, and so---unlike standard data---have associated standard errors displayed underneath (i.e., the median api00
of the population is etimated to be 652 with a standard error of 36.56).
You may have noticed the 'Read from file' button in the survey specification window. This is an alternative method of setting the specification, which is particularly useful if you aren't familiar with the survey design (and it saves you from having to look it up each time and, likely, making a mistake!). This file uses TOML
format, which is machine- and human-readable, and specifies the variables using syntax closely resembling arguments to the survey::svydesign()
function. Here's what the file looks like for the apiclus1
data:
# apiclus1.svydesign
type = "survey"
ids = "dnum"
weights = "pw"
fpc = "fpc"
This should be given a .svydesign
extension. After clicking the Read from file button, browse for the file and load it. Again, nothing will happen (especially if you didn't remove the previous design first!), but in the background iNZight has parsed the file (using the helper function iNZightTools::import_survey
, see end of Vignette for R usage) and is now using the design.
In a typical scenario, researchers would provide a data file containing the survey data (a .csv
, perhaps), and with this they might also need to provide a .svydesign
specification file. To make importing survey data even easier, the survey specification accepts another argument: a file path. If you're following along, you'll first need to save the apiclus1
data into the same folder as the apiclus1.svydesign
file from method 2, which can be done from File > Export data, clicking 'open' to browse for the save location, and specifying CSV for the file type. Now edit the specification file by adding an extra line:
# apiclus1.svydesign
data = "apiclus1.csv" # this should be whatever you called the file when you saved it
type = "survey"
ids = "dnum"
weights = "pw"
fpc = "fpc"
Now you can close iNZight and start it up again. Now, instead of loading the example data, or even the data, we're just going to load the survey specification file by going to File > Import data and browsing for the apiclus1.svydesign
file:
Now, just like that, you've loaded a complex survey data set with the design information already attached!! Go ahead and make some graphs, produce summaries, perform some simple hypothesis tests (from the Get Inference button) and, if you're feeling adventurous, fit some regression models (Advanced > Model Fitting).
iNZight knows how to handle stratified and/or clustered survey designs (i.e., what you'd load using survey::svydesign()
), and replicate designs (survey::svrepdesign()
), and can also handle post stratification or calibration. Again, there are UI windows to do all of this, but since we're here to talk about the specification format, here's what those look like.
data = "path/to/data.csv" # either relative to the specification file, or absolute
ids = 1 # default 1; or "clus"; or "clus1 + clus2"
strata = "stratvar" # the stratification variable
weights = "wts" # the weighting variable
fpc = "fpc" # any finite population correction variable(s)
nest = false # or true
Replicate designs use different arguments, and most importantly specify the columns containing the replicate weights, and some other details. The replicate weights can either be a vector of names, or a regular expression (which is preferred as it is easier that typing out 20, 30, or 80 variable names!).
data = "path/to/data.csv"
weights = "wt0" # the main weighting variable
repweights = "^w[0-2]" # this will match any variables starting with a `w`
# followed by a 0, 1, or a 2 and then anything
# (e.g., w01, w14,and w20, but not w30)
reptype = "other" # the method used to generate the replicate weights
# (e.g., "JK1", "BRR", etc)
scale = 1 # scaling constant for variance
rscales = 1 # either a single number if same for all replicate weights,
# or a vector for each replicate weight:
# rscales = [2, 0.45, ...]
If available, this information can be automatically included in the survey by passing additional lines to the specification file. This uses some TOML-specific syntax, but should be fairly straightforward (it's also the reason we needed to use TOML and not something simpler). In this example, we calibrate using stype
and sch.wide
by passing in the raw population values (behind the scenes iNZight formats these correctly):
# apical.svydesign
data = "apiclus1.csv"
type = "survey"
ids = "dnum"
weights = "pw"
fpc = "fpc"
[calibrate.stype]
E = 4421
H = 755
M = 1018
[calibrate."sch.wide"] # we must include quotes around variable names containing dots
No = 1072
Yes = 5122
The magic all happens in the 'iNZightTools' package, which has a handy 'import_survey' function.
library(iNZightTools)
(svy <- import_survey('apiclus1.svydesign'))
#> Survey design specification:
#> * ids: dnum
#> * fpc: fpc
#> * weights: pw
#> * type: survey
#>
#> Design object:
#> 1 - level Cluster Sampling design
#> With (15) clusters.
#> survey::svydesign(ids = ~dnum, fpc = ~fpc, weights = ~pw, data = data)
summary(survey::svyglm(api00 ~ api99, design = svy$design))
#>
#> Call:
#> svyglm(formula = api00 ~ api99, design = svy$design)
#>
#> Survey design:
#> survey::svydesign(ids = ~dnum, fpc = ~fpc, weights = ~pw, data = data)
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 95.28483 14.98802 6.357 2.51e-05 ***
#> api99 0.90429 0.02361 38.301 9.38e-15 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for gaussian family taken to be 768.7169)
#>
#> Number of Fisher Scoring iterations: 2
import_survey('apical.svydesign')
#> Survey design specification:
#> * ids: dnum
#> * fpc: fpc
#> * weights: pw
#> * type: survey
#> * calibrate: sch.wide + stype
#>
#> Design object:
#> 1 - level Cluster Sampling design
#> With (15) clusters.
#> survey::calibrate(design_obj, ~sch.wide + stype, c(`(Intercept)` = 6194,
#> sch.wideYes = 5122, stypeH = 755, stypeM = 1018))