This version of species mix is useful for fitting models which have complex likelihoods. The multiple starts will enable optimisation of the loglikelihood using multiple starts.

species_mix.multifit(
  archetype_formula = NULL,
  species_formula = stats::as.formula(~1),
  all_formula = NULL,
  data,
  nArchetypes = 3,
  family = "bernoulli",
  offset = NULL,
  weights = NULL,
  bb_weights = NULL,
  size = NULL,
  power = 1.6,
  control = list(ecm_prefit = FALSE),
  inits = NULL,
  titbits = FALSE,
  nstart = 10,
  mc.cores = 1
)

# S3 method for species_mix.multifit
print(x, ...)

Arguments

archetype_formula

an object of class "formula" (or an object that can be coerced to that class). The response variable (left hand side of the formula) needs to be either 'occurrence', 'abundance', 'biomass' or 'quantity' data. The type of reponse data will help specify the type of error distribution to be used. The dependent variables (the right hind side) of this formula specifies the dependence of the species archetype probabilities on covariates. For all model the basic formula structure follows something like this: cbind(spp1,spp2,spp3)~1+temperature+rainfall

species_formula

an object of class "formula" (or an object that can be coerced to that class). The right hand side of this formula specifies the dependence of the species"'" data on covariates (typically different covariates to archetype_formula to avoid confusing confounding). Current the formula is set at ~ 1 by default for species-specific intercepts for the archetype models. If you include a species specific formula which has more than an intercept you will be fitting a partial species archetype model which has species specific covariates and archetype specific covariates.

all_formula

an object of class "formula", which is meant to represent a constant single set of covariates across all species and groups, typically you might use this an alternative to an offset, where there might be some bias in the data which is relatively constant and might arise as an artefact of how the data was collected.

data

a matrix of dataframe which contains the 'species_data' matrix, a const and the covariates in the strucute of spp1, spp2, spp3, const, temperature, rainfall. dims of matirx should be nsites*(nspecies+const+covariates).

nArchetypes

The number of mixing components (groups) to fit.

family

The family of statistical distribution to use within the ecomix models. a choice between "bernoulli", "poisson", "ippm", "negative.binomial" and "gaussian" distributions are possible and applicable to specific types of data.

offset

a numeric vector of length nrow(data) (n sites) that is included into the model as an offset. It is included into the conditional part of the model where conditioning is performed on the SAM.

weights

a numeric vector of length ncol(Y) (n species) that is used as weights in the log-likelihood calculations. If NULL (default) then all weights are assumed to be identically 1. Because we are estimating the log-likelihood over species (rather than sites), the weights should be a vector n species long.

bb_weights

a numeric vector of n species long. This is used for undertaking a Bayesian Bootstrap. See 'vcov.species_mix' for more details.

size

The size of the sample for a binomial model (defaults to 1).

power

The power parameter for a Tweedie model. Default is 1.6, and this is assigned to all species

control

a list of control parameters for optimisation and calculation. See details.

inits

NULL a numeric vector that provides approximate starting values for species_mix coefficients. These are distribution specific, but at a minimum you will need pi (additive_logitic transformed), alpha (intercepts) and beta (mixing coefs).

titbits

either a boolean or a vector of characters. If TRUE (default for species_mix(qv)), then some objects used in the estimation of the model"'"s parameters are returned in a list entitled "titbits" in the model object. Some functions, for example plot.species_mix(qv) and predict.species_mix(qv), will require some or all of these pieces of information. If titbits=FALSE (default for species_mix.multifit(qv)), then an empty list is returned. If a character vector, then just those objects are returned. Possible values are:"Y" for the outcome matrix, "X" for the model matrix for the SAM model, "offset" for the offset in the model, "site_spp_weights" for the model weights, "archetype_formula" for the formula for the SAMs, "species_formula" for the formula for the species-specific model, "control" for the control arguments used in model fitting, "family" for the conditional distribution of the species data. Care needs to be taken when using titbits=TRUE in species_mix.multifit(qv) calls as titbits is created for EACH OF THE MODEL FITS. If the data is large or if nstart is large, then setting titbits=TRUE may give users problems with memory.

nstart

for species_mix.multifit only. The number of random starts to perform for re-fitting. Default is 10, which will need increasing for serious use.

mc.cores

for species_mix.multifit only. The number of cores to spread the re-fitting over.

x

A species mix multifit object

\dots

Ignored

Examples

# \donttest{ library(ecomix) set.seed(42) sam_form <- stats::as.formula(paste0('cbind(',paste(paste0('spp',1:20), collapse = ','),")~x1+x2")) sp_form <- ~ 1 beta <- matrix(c(-2.9,-3.6,-0.9,1,.9,1.9),3,2,byrow=TRUE) dat <- data.frame(y=rep(1,100),x1=stats::runif(100,0,2.5), x2=stats::rnorm(100,0,2.5)) dat[,-1] <- scale(dat[,-1]) simulated_data <- species_mix.simulate(archetype_formula = sam_form,species_formula = sp_form, data = dat,beta=beta,family="bernoulli")
#> Random alpha from normal (-1,0.5) distribution
fm1 <- species_mix(archetype_formula = sam_form,species_formula = sp_form, data = simulated_data, family = 'bernoulli', nArchetypes=3)
#> SAM modelling
#> There are 3 archetypes to group the species into
#> There are 100 site observations for 20 species
#> The model for the archetype (grouping) is ~x1 + x2
#> The model for the species is ~1
#> You are implementing a bernoulli Species Archetype Model.
#> Using ECM algorithm to find starting values; using 1 refits
#> ECM restart 1 of 1
#> Initialising starting values
#> Initial groups parameter estimates by K-means clustering
#> Iteration: 1 | New loglik -950.729 | Ratio loglik 0
#> Iteration: 2 | New loglik -829.843 | Ratio loglik 0.872849
#> Iteration: 3 | New loglik -820.246 | Ratio loglik 0.988434
#> Iteration: 4 | New loglik -819.839 | Ratio loglik 0.999504
#> initial value 819.837658 #> iter 10 value 819.706575 #> final value 819.703416 #> converged
fmods <- species_mix.multifit(archetype_formula = sam_form, species_formula = sp_form, data=simulated_data, family = 'bernoulli', nstart = 10, nArchetypes=3)
#> SAM modelling
#> Running species_mix.multifit using 10 starts to fit 3 Archetypes to the data.
#> There are 3 archetypes to group the species into
#> There are 100 site observations for 20 species
#> The model for the archetype (grouping) is ~x1 + x2
#> The model for the species is ~1
#> You are implementing a bernoulli Species Archetype Model.
#> | | | 0%. | |======== | 11%. | |================ | 22%. | |======================= | 33%. | |=============================== | 44%. | |======================================= | 56%. | |=============================================== | 67%. | |====================================================== | 78%. | |============================================================== | 89%. | |======================================================================| 100%.
# } #Print information about a species_mix model # \donttest{ print(fmods)
#> NULL
# }