Fits a mixture-of-regressions to identify species archetype models (SAMs).

species_mix(
  archetype_formula = NULL,
  species_formula = stats::as.formula(~1),
  all_formula = NULL,
  data,
  nArchetypes = 3,
  family = "bernoulli",
  offset = NULL,
  weights = NULL,
  bb_weights = NULL,
  size = NULL,
  power = 1.6,
  control = list(),
  inits = NULL,
  titbits = TRUE
)

Arguments

archetype_formula

an object of class "formula" (or an object that can be coerced to that class). The response variable (left hand side of the formula) needs to be either 'occurrence', 'abundance', 'biomass' or 'quantity' data. The type of response data will help specify the type of error distribution to be used. The dependent variables (the right hind side) of this formula specifies the dependence of the species archetype probabilities on covariates. For all model the basic formula structure follows something like this: cbind(spp1,spp2,spp3)~1+temperature+rainfall

species_formula

an object of class "formula" (or an object that can be coerced to that class). The right hand side of this formula specifies the dependence of the species"'" data on covariates (typically different covariates to archetype_formula to avoid confusing confounding). Current the formula is set at ~ 1 by default for species-specific intercepts for the archetype models. If you include a species specific formula which has more than an intercept you will be fitting a partial species archetype model which has species specific covariates and archetype specific covariates.

all_formula

an object of class "formula", which is meant to represent a constant single set of covariates across all species and groups, typically you might use this an alternative to an offset, where there might be some bias in the data which is relatively constant and might arise as an artefact of how the data was collected.

data

a matrix or data.frame which contains the 'species_data' matrix, a const and the covariates in the structure of spp1, spp2, spp3, const, temperature, rainfall. dims of matrix should be nsites*(nspecies+const+covariates).

nArchetypes

The number of archetypes (mixing components/groups) to estimate from the data.

family

The family of statistical family to use within the ecomix models. a choice between "bernoulli", "binomial", "poisson", "ippm", "negative.binomial", "tweedie" and "gaussian" families are possible and applicable to specific types of data.

offset

a numeric vector of length nrow(data) (n sites) that is included into the model as an offset. It is included into the conditional part of the model where conditioning is performed on the SAM.

weights

a numeric vector of length ncol(Y) (n species) that is used as weights in the log-likelihood calculations. If NULL (default) then all weights are assumed to be identically 1. Because we are estimating the log-likelihood over species (rather than sites), the weights should be a vector n species long.

bb_weights

a numeric vector of n species long. This is used for undertaking a Bayesian Bootstrap. See 'vcov.species_mix' for more details.

size

The size of the sample for a binomial model (defaults to 1).

power

The power parameter for a Tweedie model. Default is 1.6, and this is assigned to all species

control

a list of control parameters for optimisation and calculation. See details.

inits

NULL a numeric vector that provides approximate starting values for species_mix coefficients. These are family specific, but at a minimum you will need pi (additive_logitic transformed), alpha (intercepts) and beta (mixing coefs).

titbits

either a boolean or a vector of characters. If TRUE (default for species_mix(qv)), then some objects used in the estimation of the model"'"s parameters are returned in a list entitled "titbits" in the model object. Some functions, for example plot.species_mix(qv) and predict.species_mix(qv), will require some or all of these pieces of information. If titbits=FALSE (default for species_mix.multifit(qv)), then an empty list is returned. If a character vector, then just those objects are returned. Possible values are:"Y" for the outcome matrix, "X" for the model matrix for the SAM model, "offset" for the offset in the model, "site_spp_weights" for the model weights, "archetype_formula" for the formula for the SAMs, "species_formula" for the formula for the species-specific model, "control" for the control arguments used in model fitting, "family" for the conditional distribution of the species data. Care needs to be taken when using titbits=TRUE in species_mix.multifit(qv) calls as titbits is created for EACH OF THE MODEL FITS. If the data is large or if nstart is large, then setting titbits=TRUE may give users problems with memory.

Details

species_mix is used to fit mixtures of glms to multivariate species data. The function uses BFGS to optimize the mixture likelihood. There is the option to use ECM algorithm to get appropriate starting parameters. `species_mix` acts as a wrapper for species_mix.fit that allows for easier data input. The data frames are merged into the appropriate format for the use in species_mix.fit. Minima is found using vmmin (BFGS). Currently 'bernoulli', 'binomial', 'poisson', 'ippm' (inhomogeneous Poisson point process), 'negative.binomial', 'tweedie' and 'gaussian' distributions can be fitted using the species_mix function.

Examples

# \donttest{ library(ecomix) set.seed(42) sam_form <- stats::as.formula(paste0('cbind(',paste(paste0('spp',1:20), collapse = ','),")~x1+x2")) sp_form <- ~ 1 beta <- matrix(c(-2.9,-3.6,-0.9,1,.9,1.9),3,2,byrow=TRUE) dat <- data.frame(y=rep(1,100),x1=stats::runif(100,0,2.5), x2=stats::rnorm(100,0,2.5)) dat[,-1] <- scale(dat[,-1]) simulated_data <- species_mix.simulate(archetype_formula = sam_form,species_formula = sp_form, data = dat,beta=beta,family="bernoulli")
#> Random alpha from normal (-1,0.5) distribution
fm1 <- species_mix(archetype_formula = sam_form,species_formula = sp_form, data = simulated_data, family = 'bernoulli', nArchetypes=3)
#> SAM modelling
#> There are 3 archetypes to group the species into
#> There are 100 site observations for 20 species
#> The model for the archetype (grouping) is ~x1 + x2
#> The model for the species is ~1
#> You are implementing a bernoulli Species Archetype Model.
#> Using ECM algorithm to find starting values; using 1 refits
#> ECM restart 1 of 1
#> Initialising starting values
#> Initial groups parameter estimates by K-means clustering
#> Iteration: 1 | New loglik -950.729 | Ratio loglik 0
#> Iteration: 2 | New loglik -829.843 | Ratio loglik 0.872849
#> Iteration: 3 | New loglik -820.246 | Ratio loglik 0.988434
#> Iteration: 4 | New loglik -819.839 | Ratio loglik 0.999504
#> initial value 819.837658 #> iter 10 value 819.706575 #> final value 819.703416 #> converged
# }