Usage in custom model

Usage in custom model#

Via Inheritance#

The most straightforward way to use formulaic-contrasts with a custom model is to use FormulaicContrasts as a base class or mixin class.

As an example, let’s wrap an Ordinary Least Squares (OLS) linear model into a custom class for the use with formulaic-contrasts. The aim is to build a model that takes a pandas DataFrame and a formulaic formula as input allows to fit the model to a continuous variable from the dataframe and perform a statistical test for a given contrast.

This can be achieved with the following class definition. The constructor, the contrast() and cond() methods are inherited from the FormulaicContrasts base class:

import formulaic_contrasts
import numpy as np
import statsmodels.api as sm
import pandas as pd


class StatsmodelsOLS(formulaic_contrasts.FormulaicContrasts):
    def __init__(self, data: pd.DataFrame, design: str):
        super().__init__(data, design)

    def fit(self, variable: str):
        self.mod = sm.OLS(self.data[variable], self.design_matrix)
        self.mod = self.mod.fit()

    def t_test(self, contrast: np.ndarray):
        return self.mod.t_test(contrast)

Let’s apply our model to an example dataset. The toy data mimicks a 2-arm clinical trial (drugA vs. drugB) with Responders (responder) and Non-responders (non_responder) and a continuous biomarker that can protentially predict response of the different treatments.

df = formulaic_contrasts.datasets.treatment_response()
df
treatment response biomarker
0 drugA non_responder 6.595490
1 drugA non_responder 7.071509
2 drugA non_responder 8.537421
3 drugA non_responder 6.787991
4 drugA non_responder 10.109717
... ... ... ...
75 drugB responder 11.167627
76 drugB responder 9.493773
77 drugB responder 5.027817
78 drugB responder 9.800762
79 drugB responder 9.945963

80 rows × 3 columns

Let’s fit the model an perform the statistical test

model = StatsmodelsOLS(df, "~ treatment + response")
model.fit("biomarker")
model.t_test(
    model.contrast("response", baseline="non_responder", group_to_compare="responder")
)
<class 'statsmodels.stats.contrast.ContrastResults'>
                             Test for Constraints                             
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
c0             1.9563      0.775      2.525      0.014       0.413       3.499
==============================================================================

Via Composition#

Alternatively, if you prefer to work without inheritance, you can use FormulaicContrast as an attribute. In this case, you need to define cond()/contrast() yourself, or provide a custom way to define contrasts, calling cond() internally. A minimal implementation could look like:

import pandas as pd


class StatsmodelsOLS:
    def __init__(self, data: pd.DataFrame, design: str) -> None:
        self.data = data
        self.contrast_builder = formulaic_contrasts.FormulaicContrasts(data, design)

    def fit(self, variable: str):
        self.mod = sm.OLS(self.data[variable], self.contrast_builder.design_matrix)
        self.mod = self.mod.fit()

    def cond(self, **kwargs):
        return self.contrast_builder.cond(**kwargs)

    def contrast(self, *args, **kwargs):
        return self.contrast_builder.contrast(*args, **kwargs)

    def t_test(self, contrast: np.ndarray):
        return self.mod.t_test(contrast)

The result is the same:

model = StatsmodelsOLS(df, "~ treatment + response")
model.fit("biomarker")
model.t_test(
    model.contrast("response", baseline="non_responder", group_to_compare="responder")
)
<class 'statsmodels.stats.contrast.ContrastResults'>
                             Test for Constraints                             
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
c0             1.9563      0.775      2.525      0.014       0.413       3.499
==============================================================================

Manual usage#

Hint

Some helpful definitions for working with formulaic formulas (e.g. ~ 0 + C(donor):treatment + np.log1p(continuous)):

  • A term refers to an expression in the formula, separated by +, e.g. C(donor):treatment, or np.log1p(continuous).

  • A variable refers to a column of the data frame passed to formulaic, e.g. donor.

  • A factor is the specification of how a certain variable is represented in the design matrix, e.g. treatment coding with base level “A” and reduced rank.

You can use the lower-level interface get_factor_storage_and_materializer() to introspect formulaic models if the FormulaicContrasts class doesn’t fit your needs.

get_factor_storage_and_materializer will generate a custom materializer that, while generating the design matrix from the input data and formula, keeps track of certain metadata that are useful for defining contrasts.

factor_storage, variables_to_factors, materializer_class = (
    formulaic_contrasts.get_factor_storage_and_materializer()
)

If we apply the materializer to the example dataset from above…

design_mat = materializer_class(df, record_factor_metadata=True).get_model_matrix(
    "~ treatment + response"
)

factor_storage will keep track of factors used in the formula, while variables_to_factors will keep track of variables used in the formula. The FactorMetadata objects contain information that is useful for building custom contrasts, such as the list of categories or the base() level of categorical variables.

from pprint import pprint

pprint(factor_storage)
defaultdict(<class 'list'>,
            {'response': [FactorMetadata(name='response',
                                         reduced_rank=True,
                                         custom_encoder=False,
                                         categories=('non_responder',
                                                     'responder'),
                                         kind=<Kind.CATEGORICAL: 'categorical'>,
                                         drop_field='non_responder',
                                         column_names=('non_responder',
                                                       'responder'),
                                         colname_format='{name}[{field}]')],
             'treatment': [FactorMetadata(name='treatment',
                                          reduced_rank=True,
                                          custom_encoder=False,
                                          categories=('drugA', 'drugB'),
                                          kind=<Kind.CATEGORICAL: 'categorical'>,
                                          drop_field='drugA',
                                          column_names=('drugA', 'drugB'),
                                          colname_format='{name}[{field}]')]})
variables_to_factors
defaultdict(set, {'treatment': {'treatment'}, 'response': {'response'}})

In a slightly more complex (and argueably useless) example, the result would look like this:

factor_storage, variables_to_factors, materializer_class = (
    formulaic_contrasts.get_factor_storage_and_materializer()
)
design_mat = materializer_class(df, record_factor_metadata=True).get_model_matrix(
    "~ np.log(biomarker) + C(treatment, contr.treatment(base='drugB')) * C(response)"
)
pprint(factor_storage)
defaultdict(<class 'list'>,
            {'C(response)': [FactorMetadata(name='C(response)',
                                            reduced_rank=True,
                                            custom_encoder=True,
                                            categories=('non_responder',
                                                        'responder'),
                                            kind=<Kind.CATEGORICAL: 'categorical'>,
                                            drop_field=None,
                                            column_names=('responder',),
                                            colname_format='{name}[T.{field}]')],
             "C(treatment, contr.treatment(base='drugB'))": [FactorMetadata(name='C(treatment, '
                                                                                 "contr.treatment(base='drugB'))",
                                                                            reduced_rank=True,
                                                                            custom_encoder=True,
                                                                            categories=('drugA',
                                                                                        'drugB'),
                                                                            kind=<Kind.CATEGORICAL: 'categorical'>,
                                                                            drop_field=None,
                                                                            column_names=('drugA',),
                                                                            colname_format='{name}[T.{field}]')],
             'np.log(biomarker)': [FactorMetadata(name='np.log(biomarker)',
                                                  reduced_rank=False,
                                                  custom_encoder=False,
                                                  categories=(np.float64(-0.994572491384816),
                                                              np.float64(-0.32292867440331624),
                                                              np.float64(-0.241787230648226),
                                                              np.float64(-0.07184946402122966),
                                                              np.float64(0.5402788527682045),
                                                              np.float64(0.5588358549592866),
                                                              np.float64(0.5976024872603584),
                                                              np.float64(0.6985411914333322),
                                                              np.float64(0.8014619184523657),
                                                              np.float64(0.9353144690102372),
                                                              np.float64(0.9357602734122806),
                                                              np.float64(0.9508095585482432),
                                                              np.float64(1.0587960677139694),
                                                              np.float64(1.1984152459673467),
                                                              np.float64(1.250031921525277),
                                                              np.float64(1.288736956185549),
                                                              np.float64(1.3055160592797277),
                                                              np.float64(1.318699316319654),
                                                              np.float64(1.3691528193898392),
                                                              np.float64(1.4011909677280152),
                                                              np.float64(1.4696755716009422),
                                                              np.float64(1.4848614510372413),
                                                              np.float64(1.5042452056588211),
                                                              np.float64(1.5347612081374633),
                                                              np.float64(1.5586765895960253),
                                                              np.float64(1.5915854505280886),
                                                              np.float64(1.6133268972915464),
                                                              np.float64(1.6149859295681703),
                                                              np.float64(1.6427701026668886),
                                                              np.float64(1.6541909597093103),
                                                              np.float64(1.6706857016933832),
                                                              np.float64(1.6800363836451575),
                                                              np.float64(1.6874414702471792),
                                                              np.float64(1.7114697163161487),
                                                              np.float64(1.712291418836503),
                                                              np.float64(1.7302545130901745),
                                                              np.float64(1.744639763009614),
                                                              np.float64(1.7498086372286121),
                                                              np.float64(1.7842160982734327),
                                                              np.float64(1.7871408509738518),
                                                              np.float64(1.8077571345863106),
                                                              np.float64(1.8126986617691452),
                                                              np.float64(1.8332062854487814),
                                                              np.float64(1.8502652368191275),
                                                              np.float64(1.8819102495764126),
                                                              np.float64(1.8863860690197658),
                                                              np.float64(1.897341561353762),
                                                              np.float64(1.9117530958321944),
                                                              np.float64(1.9151549833413493),
                                                              np.float64(1.918061067186584),
                                                              np.float64(1.9188752897607209),
                                                              np.float64(1.919357972842738),
                                                              np.float64(1.9274811395788238),
                                                              np.float64(1.9290675404526416),
                                                              np.float64(1.9292525771899476),
                                                              np.float64(1.956073906066531),
                                                              np.float64(1.9839526394392006),
                                                              np.float64(1.9906039715512525),
                                                              np.float64(1.9942917406466534),
                                                              np.float64(2.027629774683785),
                                                              np.float64(2.0416152416364257),
                                                              np.float64(2.0646278053535907),
                                                              np.float64(2.1167677837306025),
                                                              np.float64(2.123298339614755),
                                                              np.float64(2.1407653917376552),
                                                              np.float64(2.1444589627935478),
                                                              np.float64(2.1600468598671436),
                                                              np.float64(2.19915225165721),
                                                              np.float64(2.250636108272607),
                                                              np.float64(2.254466878298736),
                                                              np.float64(2.2824601250615855),
                                                              np.float64(2.29716673886693),
                                                              np.float64(2.3134970542142574),
                                                              np.float64(2.413019186969858),
                                                              np.float64(2.4438604074304093),
                                                              np.float64(2.5154775572854935),
                                                              np.float64(2.516131763095562),
                                                              np.float64(2.770430770362043)),
                                                  kind=<Kind.NUMERICAL: 'numerical'>,
                                                  drop_field=None,
                                                  column_names=None,
                                                  colname_format='{name}[{field}]')]})
variables_to_factors
defaultdict(set,
            {'np.log': {'np.log(biomarker)'},
             'biomarker': {'np.log(biomarker)'},
             'treatment': {"C(treatment, contr.treatment(base='drugB'))"},
             'contr.treatment': {"C(treatment, contr.treatment(base='drugB'))"},
             'C': {'C(response)',
              "C(treatment, contr.treatment(base='drugB'))"},
             'response': {'C(response)'}})