Learn
Introduction to causal inference
All models are wrong; some are helpful
Causal inference methods can – under certain assumptions – identify causal relationships from observed data. They allow you to understand your data better, even when interventions like A/B tests are impossible, too expensive, or morally not ok. These methods are sound if all assumptions hold and statistical conditional independence tests are assumed to provide correct results. However, the assumptions may be violated in reality, and statistical tests are only asymptotically correct – they may not be truthful for finite samples.
How to use causal inference – and how not to
Examples of use cases for causal inference methods:
✅ Falsify hypothesis. You have the theory that one variable causes another and want to check if your data matches the hypothesis. For example: Do people with children cancel hotel bookings more frequently?
✅ Reduce your potential solution space. If you want to reduce a large number of possible causes to a few promising candidates, use causal discovery to identify them. Then, you can do controlled experiments with the promising candidates. You can even iterate this procedure by using the test results as pre-knowledge for the next round of causal discovery algorithms. An example is reducing a large set of substances for developing a treatment or adhesive to the most promising candidates.
✅ Carefully explore relations in your data. You want to understand the structure of your data beyond correlations. You know that the underlying undirected graph represents the results of conditional independence tests from your data. Also, you understand that the orientations of the edges indicate that some relationships may indeed be causal. However, you also know that small errors can lead to drastic changes, especially for the orientations.
Some examples of bad practice:
❌ Replace experiments and expert knowledge. Assumptions and pre-knowledge make your results better: The more information you have, the better your results. Experts and experiments are your friends.
❌ Base life-changing decisions on causal inference methods. Some packages advertise that you can base your policy decisions on causal inference methods. You may use causal inference methods to check if changing a specific regulation would theoretically change the variable you are interested in (falsifying hypothesis) or identify promising new policy ideas by identifying variables that have a significant impact (reducing your potential solution space). However, you must be aware that your results from causal inference methods can be wrong for various reasons, e.g., a wrong statistical test result, an assumption violation, or too noisy data. Do not base important decisions or recommendations for important decisions only on an algorithm with such high uncertainty. Use it as a tool, not as truth.
❌ Use fancy methods on lousy data. One book applies causal discovery to data about the female orgasm from a study from the late 70s. It captures many variables about the women in the study, such as personality traits, while omitting information about their partners' behavior. The first problem is that they choose a causal inference algorithm that cannot capture hidden confounding on data that lacks many variables—which, however, could be improved by selecting another algorithm. Nevertheless, any method used on that small data set that lacks crucial variables will be flawed. Talk to experts in the field first; let them tell you if there are methodologically sound, up-to-date quantitative studies, and use their qualitative insights.