Model Assumptions
The version of the PC algorithm implemented in causy is based on the original PC algorithm by Peter Spirtes and Clark Glymour. The algorithm is a constraint-based causal discovery algorithm.
For a discussion of the fundamental model assumptions of causal inference, like the causal Markov condition and the faithfulness assumption, see, for example, this summary by Kenneth.
Here, we want to provide some simple guidelines to find out if the current default algorithm is definitely not applicable to your data set:
😱 Hidden common causes: The PC algorithm is not applicable when your data set might not cover all variables relevant to the model. Whenever an unobserved variable is a common cause of two observed variables, we speak of hidden common causes.
☺️ Solution: Use the FCI algorithm to discover hidden common causes and even some special cases of selection bias.
😱 Non-linear relationships: The basic version of the PC algorithm assumes that the relationships between variables are linear. If you plot the data corresponding to pairs of variables and your data fulfills the linearity assumption, the scatter plot will approximately show a line. You can automate this by performing linear regressions and analyzing the error terms. If the relationships are non-linear, independence tests based on the linearity assumption–as used in our current default pipeline—are not applicable.
☺️ Solution: Use independence test for non-linear relationships. causy's modular architecture allows you to exchange independent tests easily.
😱 Time series data: The PC algorithm does not apply to time series data. Do not use the PC or FCI algorithm if your data has a time component.
☺️ Solution: First, check if your problem can be solved by standard time series (forecasting) methods. Example: Given data on how many people have visited my website daily over the last months, how will the number of visitors evolve over the following weeks? Then, check if what you want to do is causal impact estimation. Example: How did my marketing campaign impact the number of daily visitors to my website? Then, if you are sure you are interested in the inherent causal relationships in your time series data, causal discovery methods adjusted to the time series setting like PCMCI and more implemented in the tigramite package can be used.