You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is related to a previous issue. Would like to know how to use DoubleML for >1 treatments.
Suppose we have two separate treatments using two different DoubleML models with the same set of confounding, W, and same variable to compute CATEs on, X . Both W and X occur prior to treatment. The treatment and outcomes are (for the sake of an example using data from Multi-investment Attribution Software Company ):
Treatment A (T1): Major Flag (y/n)
Treatment B (T2): Tech Support (y/n)
Outcome (Y): Revenue ($)
The data prep (mostly from notebook):
from econml.dml import LinearDML
file_url = "https://msalicedatapublic.z5.web.core.windows.net/datasets/ROI/multi_attribution_sample.csv"
multi_data = pd.read_csv(file_url)
# Define estimator inputs
T1 = multi_data["Major Flag"]
T2 = multi_data["Tech Support"]
Y = multi_data["Revenue"] # amount of product purchased, or outcome
X = multi_data[["Size"]] # heterogeneity feature
W = multi_data.drop(
columns=["Tech Support", "Major Flag", "Revenue", "Size"]
) # controls
The individual models and their ATE are shown: For Major flag (T1)
model = LinearDML(discrete_treatment=True)
# Specify final stage inference type and fit model
model.fit(Y=Y, T=T1, X=X, W=W)
print(f" ATE for major flag is: {model.ate(X)} with CI {model.ate_interval(X)}")
ATE for major flag is: 2364.232844526994 with CI (1930.2699916809565, 2798.1956973730316)
For tech support (T2)
model = LinearDML(discrete_treatment=True)
# Specify final stage inference type and fit model
model.fit(Y=Y, T=T2, X=X, W=W)
print(f" ATE for tech support is: {model.ate(X)} with CI {model.ate_interval(X)}")
ATE for tech support is: 7156.214315710862 with CI (6952.461324721255, 7359.96730670047)
Questions:
Let's say that the ATE for receiving tech support (T2), when modeled separately, seems stronger than we'd expect and hypothesize it's been overestimated since its true impact might be dependent on the company being a major corporation (T1).
Can DoubleML model the sequential effect of T1 and T2 to test our hypothesis that Major Flag precedes Tech Support, assuming we have timestamps that T1 is before T2 or not ? How would the input look like?
I noticed that DoubleML can take multiple treatment, but it looks like this setup is for treatments that occur concurrently. Is this true? The concat method also doesn't seem to have the right dimensions "See Single Outcome, Multiple Treatments"
est = LinearDML()
est.fit(y, np.concatenate((T0, T1), axis=1), X=X, W=W)
How would the interpretation of the ATE change if DoubleML can do multiple treatments? For example, instead of 'Having tech support increases product revenue, on average, by $7,156', what would it be? Wondering if there are other nuances to understand too.
We actually do not know if T1 and T2 should be modeled separately or together to test the question in 1). How do we know which hypothesis is 'correct'?
Thank you!
The text was updated successfully, but these errors were encountered:
Hi,
This is related to a previous issue. Would like to know how to use DoubleML for >1 treatments.
Suppose we have two separate treatments using two different DoubleML models with the same set of confounding,
W
, and same variable to compute CATEs on,X
. BothW
andX
occur prior to treatment. The treatment and outcomes are (for the sake of an example using data from Multi-investment Attribution Software Company ):T1
): Major Flag (y/n)T2
): Tech Support (y/n)Y
): Revenue ($)The data prep (mostly from notebook):
The individual models and their ATE are shown:
For Major flag (
T1
)ATE for major flag is: 2364.232844526994 with CI (1930.2699916809565, 2798.1956973730316)
For tech support (
T2
)ATE for tech support is: 7156.214315710862 with CI (6952.461324721255, 7359.96730670047)
Questions:
Let's say that the ATE for receiving tech support (
T2
), when modeled separately, seems stronger than we'd expect and hypothesize it's been overestimated since its true impact might be dependent on the company being a major corporation (T1
).T1
andT2
to test our hypothesis that Major Flag precedes Tech Support, assuming we have timestamps that T1 is before T2 or not ? How would the input look like?T1
andT2
should be modeled separately or together to test the question in 1). How do we know which hypothesis is 'correct'?Thank you!
The text was updated successfully, but these errors were encountered: