Theory and Usage¶
Theory¶
Our goal is to find optimized number of components in a mixture model. Assuming that a mixture of distributions are given as:
That each \(g_i(x;\mathbf{\theta}_i)\) is a distribution function with weight \(a_i\), \(\mathbf{\theta}_i\) is the parameter vector. Usually, a mixture model data set can be fitted by arbitary number of components \(n\), to supress overfitting, Akaike information criterion (AIC), Bayesian information criterion (BIC) and a modified AIC (AICc) is used for small sized samples to estimate the model and find out the most probable number of components \(n\).
Usage¶
Generate mixture models for fitting¶
from utils import n_func_mix, n_func_maker
Make a n-fucntion mixture from a common base¶
def n_func_maker(func: callable, n: int, known: list) -> callable:
r"""Make n-function mixture from a common base.
Arguments:
func: base function, the signature must start with `x`.
n: desired number of components.
known: a list of $n\times n_{\text{func args}}$ variables.
None is for fitting variables and values for fixed variables.
Returns:
mixture: callable.
"""
For example, suppose that a 2-component mixture is generated by the base
fuction f(x, a, b, c)
that the a
variable of 2nd function is
equal to 2, the
n_func_maker(f, 2, known=[None, None, None, 2, None, None])
generates a mixed function with signatures x, a0, b0, c0, b1, c1
.
Make mixture of functions¶
def n_func_mix(funcs: list of callables) -> callable:
r"""Mixer for defining functions mixed by base function.
For scipy.optimize.curv_fit.
Arguments:
funcs: A list of callables, and signatures of
all functions must begin with `x`.
Returns: Function that mixed n base functions.
"""
Fitting the generated models¶
from utils import FitLSQ
class FitLSQ():
def __init__(self, func: callable):
def set_bounds(self, bounds: list, known: list) -> self:
r"""Set bounds for target function.
Arguments:
bounds: 2d-list for lower and upper bounds (lb, ub) for arguments
of base function. +/-np.inf for no bounds.
n: number of parameters ofl BASE functions.
known: Known parts in functions.
Returns:
self
"""
def set_p0(self, p0: list, known: list) -> self:
r"""Set initial values for fitting.
Arguments:
p0: tuple or list for initial parameters.
known: list for known components.
Returns:
self
"""
def fit(self, x: np.ndarray, y: np.ndarray, **kwargs) -> self:
r"""Fit the model.
Arguments:
x: np.array for x
y: np.array for y
Keyword Arguments:
kwargs that fits scipy.optimize.curve_fit
Returns:
self
"""
For example, model.set_p0([0.1, 0.002, 3.7])
and
model.set_bound([[0, -np.inf, 1], [1,np.inf, 2]])
are for a mixture
of consists of 3-argument base functions with initial guess of (0.1,
0.002, 3.7) for parameters and corresponding bounds are (0, 1), (-inf,
inf) and (1, 2).
Warning: set_p0
and set_bounds
are currently supported for
the compoents in the mixture have same base function only.
Evaluate models.¶
from utils import Evaluation
class Evaluation():
def __init__(self, model: FitLSQ):
r"""Initialize with model.
Arguments:
model: a fit object
"""
def aic(self, x: np.ndarray) -> np.ndarray:
r"""Calculate AIC.
Aho, K.; Derryberry, D.; Peterson, T. (2014), "Model selection for
ecologists: the worldviews of AIC and BIC", Ecology, 95: 631–636,
doi:10.1890/13-1452.1.
AIC = 2k - 2\ln{\hat{\mathcal{L}}}, \hat{\mathcal{{L}}} is Likelihood.
Arguments:
samples: samples of (n_samples, n_features)
Returns:
aic: np.ndarray
"""
def bic(self, x: np.ndarray) -> np.ndarray:
r"""Calculate BIC.
Schwarz, Gideon E. (1978), "Estimating the dimension of a model",
Annals of Statistics, 6 (2): 461–464, doi:10.1214/aos/1176344136,
MR 0468014.
BIC = \ln{N}k - 2\ln{\hat{\mathcal{L}}}
Arguments:
samples: samples of (n_samples, n_features)
Returns:
bic: np.ndarray
"""
def aicc(self, x: np.ndarray) -> np.ndarray:
r"""Calculate AICc.
deLeeuw, J. (1992), "Introduction to Akaike (1973) information theory
and an extension of the maximum likelihood principle" (PDF),
in Kotz, S.; Johnson, N.L., Breakthroughs in Statistics I, Springer,
pp. 599–609.
AICc = AIC + \frac{2k^2+2k}{N-k-1}
Arguments:
samples: samples of (n_samples, n_features)
Returns:
aicc: np.ndarray
"""
@classmethod
def make_sample(cls, n, x, pdf):
r"""Make random sample taken from x.
Arguments:
n: int, sample size
x: np.ndarray
pdf: np.ndarray
Returns:
sample
"""
x
is a sample data set with shape of (n_samples, n_features),
samples can be generated by Evaluation.make_sample
from the fitting
data x
and y
if the fitting object is the pdf function.