Data Generators
Basic usage
Simulated data is useful for testing and benchmarking causal discovery algorithms and educational purposes. The following code snippet shows how to generate a dataset from a linear model with Gaussian noise with a known causal structure.
model = IIDSampleGenerator(
edges=[
SampleEdge(NodeReference("X"), NodeReference("Z"), 5),
SampleEdge(NodeReference("Y"), NodeReference("Z"), 6),
SampleEdge(NodeReference("Z"), NodeReference("V"), -3),
SampleEdge(NodeReference("Z"), NodeReference("W"), -4),
],
)
sample_size = 10000
test_data, graph = model.generate(sample_size)
SampleEdge is a dataclass that represents an edge in the graph. The first arguments is the source node and the second node is the target node. The third argument is the weight of the edge which represents the linear causal relationship between the source and target nodes. You can use that test data as the input for your causal discovery algorithm.
Advanced usage
Documentation on how to create custom data generators with nonlinear relationships or different noise distributions will follow soon.