Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interpreting treatment interactions with DoubleML #942

Open
chelsealee14 opened this issue Jan 7, 2025 · 0 comments
Open

Interpreting treatment interactions with DoubleML #942

chelsealee14 opened this issue Jan 7, 2025 · 0 comments

Comments

@chelsealee14
Copy link

Hi,

This is related to a previous issue. Would like to know how to use DoubleML for >1 treatments.

Suppose we have two separate treatments using two different DoubleML models with the same set of confounding, W, and same variable to compute CATEs on, X . Both W and X occur prior to treatment. The treatment and outcomes are (for the sake of an example using data from Multi-investment Attribution Software Company ):

  • Treatment A (T1): Major Flag (y/n)
  • Treatment B (T2): Tech Support (y/n)
  • Outcome (Y): Revenue ($)

The data prep (mostly from notebook):

from econml.dml import LinearDML

file_url = "https://msalicedatapublic.z5.web.core.windows.net/datasets/ROI/multi_attribution_sample.csv"
multi_data = pd.read_csv(file_url)

# Define estimator inputs
T1 = multi_data["Major Flag"]
T2 = multi_data["Tech Support"]
Y = multi_data["Revenue"]  # amount of product purchased, or outcome
X = multi_data[["Size"]]  # heterogeneity feature
W = multi_data.drop(
    columns=["Tech Support", "Major Flag", "Revenue", "Size"]
)  # controls

The individual models and their ATE are shown:
For Major flag (T1)

model = LinearDML(discrete_treatment=True)

# Specify final stage inference type and fit model
model.fit(Y=Y, T=T1, X=X, W=W)
print(f" ATE for major flag is: {model.ate(X)} with CI {model.ate_interval(X)}")

ATE for major flag is: 2364.232844526994 with CI (1930.2699916809565, 2798.1956973730316)

For tech support (T2)

model = LinearDML(discrete_treatment=True)

# Specify final stage inference type and fit model
model.fit(Y=Y, T=T2, X=X, W=W)
print(f" ATE for tech support is: {model.ate(X)} with CI {model.ate_interval(X)}")

ATE for tech support is: 7156.214315710862 with CI (6952.461324721255, 7359.96730670047)

Questions:

Let's say that the ATE for receiving tech support (T2), when modeled separately, seems stronger than we'd expect and hypothesize it's been overestimated since its true impact might be dependent on the company being a major corporation (T1).

  1. Can DoubleML model the sequential effect of T1 and T2 to test our hypothesis that Major Flag precedes Tech Support, assuming we have timestamps that T1 is before T2 or not ? How would the input look like?
  2. I noticed that DoubleML can take multiple treatment, but it looks like this setup is for treatments that occur concurrently. Is this true? The concat method also doesn't seem to have the right dimensions "See Single Outcome, Multiple Treatments"
est = LinearDML()
est.fit(y, np.concatenate((T0, T1), axis=1), X=X, W=W)
  1. How would the interpretation of the ATE change if DoubleML can do multiple treatments? For example, instead of 'Having tech support increases product revenue, on average, by $7,156', what would it be? Wondering if there are other nuances to understand too.
  2. We actually do not know if T1 and T2 should be modeled separately or together to test the question in 1). How do we know which hypothesis is 'correct'?

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant