R/species_mix-class.R
species_mix.Rd
Fits a mixture-of-regressions to identify species archetype models (SAMs).
species_mix( archetype_formula = NULL, species_formula = stats::as.formula(~1), all_formula = NULL, data, nArchetypes = 3, family = "bernoulli", offset = NULL, weights = NULL, bb_weights = NULL, size = NULL, power = 1.6, control = list(), inits = NULL, titbits = TRUE )
archetype_formula | an object of class "formula" (or an object that can be coerced to that class). The response variable (left hand side of the formula) needs to be either 'occurrence', 'abundance', 'biomass' or 'quantity' data. The type of response data will help specify the type of error distribution to be used. The dependent variables (the right hind side) of this formula specifies the dependence of the species archetype probabilities on covariates. For all model the basic formula structure follows something like this: cbind(spp1,spp2,spp3)~1+temperature+rainfall |
---|---|
species_formula | an object of class "formula" (or an object that can be
coerced to that class). The right hand side of this formula specifies the
dependence of the species"'" data on covariates (typically different
covariates to |
all_formula | an object of class "formula", which is meant to represent a constant single set of covariates across all species and groups, typically you might use this an alternative to an offset, where there might be some bias in the data which is relatively constant and might arise as an artefact of how the data was collected. |
data | a matrix or data.frame which contains the 'species_data' matrix, a const and the covariates in the structure of spp1, spp2, spp3, const, temperature, rainfall. dims of matrix should be nsites*(nspecies+const+covariates). |
nArchetypes | The number of archetypes (mixing components/groups) to estimate from the data. |
family | The family of statistical family to use within the ecomix models. a choice between "bernoulli", "binomial", "poisson", "ippm", "negative.binomial", "tweedie" and "gaussian" families are possible and applicable to specific types of data. |
offset | a numeric vector of length nrow(data) (n sites) that is included into the model as an offset. It is included into the conditional part of the model where conditioning is performed on the SAM. |
weights | a numeric vector of length ncol(Y) (n species) that is used as weights in the log-likelihood calculations. If NULL (default) then all weights are assumed to be identically 1. Because we are estimating the log-likelihood over species (rather than sites), the weights should be a vector n species long. |
bb_weights | a numeric vector of n species long. This is used for undertaking a Bayesian Bootstrap. See 'vcov.species_mix' for more details. |
size | The size of the sample for a binomial model (defaults to 1). |
power | The power parameter for a Tweedie model. Default is 1.6, and this is assigned to all species |
control | a list of control parameters for optimisation and calculation. See details. |
inits | NULL a numeric vector that provides approximate starting values for species_mix coefficients. These are family specific, but at a minimum you will need pi (additive_logitic transformed), alpha (intercepts) and beta (mixing coefs). |
titbits | either a boolean or a vector of characters. If TRUE (default for species_mix(qv)), then some objects used in the estimation of the model"'"s parameters are returned in a list entitled "titbits" in the model object. Some functions, for example plot.species_mix(qv) and predict.species_mix(qv), will require some or all of these pieces of information. If titbits=FALSE (default for species_mix.multifit(qv)), then an empty list is returned. If a character vector, then just those objects are returned. Possible values are:"Y" for the outcome matrix, "X" for the model matrix for the SAM model, "offset" for the offset in the model, "site_spp_weights" for the model weights, "archetype_formula" for the formula for the SAMs, "species_formula" for the formula for the species-specific model, "control" for the control arguments used in model fitting, "family" for the conditional distribution of the species data. Care needs to be taken when using titbits=TRUE in species_mix.multifit(qv) calls as titbits is created for EACH OF THE MODEL FITS. If the data is large or if nstart is large, then setting titbits=TRUE may give users problems with memory. |
species_mix is used to fit mixtures of glms to multivariate species data. The function uses BFGS to optimize the mixture likelihood. There is the option to use ECM algorithm to get appropriate starting parameters. `species_mix` acts as a wrapper for species_mix.fit that allows for easier data input. The data frames are merged into the appropriate format for the use in species_mix.fit. Minima is found using vmmin (BFGS). Currently 'bernoulli', 'binomial', 'poisson', 'ippm' (inhomogeneous Poisson point process), 'negative.binomial', 'tweedie' and 'gaussian' distributions can be fitted using the species_mix function.
# \donttest{ library(ecomix) set.seed(42) sam_form <- stats::as.formula(paste0('cbind(',paste(paste0('spp',1:20), collapse = ','),")~x1+x2")) sp_form <- ~ 1 beta <- matrix(c(-2.9,-3.6,-0.9,1,.9,1.9),3,2,byrow=TRUE) dat <- data.frame(y=rep(1,100),x1=stats::runif(100,0,2.5), x2=stats::rnorm(100,0,2.5)) dat[,-1] <- scale(dat[,-1]) simulated_data <- species_mix.simulate(archetype_formula = sam_form,species_formula = sp_form, data = dat,beta=beta,family="bernoulli")#>fm1 <- species_mix(archetype_formula = sam_form,species_formula = sp_form, data = simulated_data, family = 'bernoulli', nArchetypes=3)#>#>#>#>#>#>#>#>#>#>#>#>#>#>#> initial value 819.837658 #> iter 10 value 819.706575 #> final value 819.703416 #> converged# }