Machine Learning for Uplift

Author

Duyen Tran

Model used in this project:

Logistic Regression
Random Forest
XGBoost
Neural Network

This project will provide insights into the relative strengths and weaknesses of each modeling approach in the context of direct marketing, with a particular focus on maximizing the return on investment for marketing campaigns.

Short process:

Executing a comprehensive analysis of predictive modeling techniques to refine customer targeting strategies. The techniques under comparison include logistic regression, neural networks, random forests, and XGBoost.
Implementing uplift modeling to evaluate the additional impact of targeting specific customers with marketing efforts.
Employing propensity scoring methods to estimate the probability of customer responses based on historical interaction data.
Main objective: Identify the modeling technique that most effectively pinpoints the top 30,000 customers from a pool of 120,000 who would yield the highest profit when targeted.
Uplift modeling is pivotal in understanding the causal effect of the marketing action, whereas the propensity score approach focuses on the predicted likelihood of customer behavior.
Post-identification of the top-performing model, an additional objective is to analyze and establish the most profitable segment size for targeting purposes, potentially adjusting the initial 30,000 target figure to optimize outreach and profitability.

The results from this study aim to guide marketing strategies, ensuring resource allocation is both efficient and effective.

The findings are expected to serve as a decision-making framework for marketing leaders in optimizing campaign strategies and improving customer engagement and retention.

Code

import pandas as pd
import numpy as np
import pyrsm as rsm
from sklearn.model_selection import GridSearchCV
# setup pyrsm for autoreload
%reload_ext autoreload
%autoreload 2
%aimport pyrsm

Code

## loading the organic data - this dataset must NOT be changed
cg_organic_control = pd.read_parquet("cg_organic_control.parquet").reset_index(drop=True)
cg_organic_control.head()

	converted	GameLevel	NumGameDays	NumGameDays4Plus	NumInGameMessagesSent	NumFriendRequestIgnored	NumFriends	AcquiredSpaceship	AcquiredIonWeapon	TimesLostSpaceship	TimesKilled	TimesCaptain	TimesNavigator	PurchasedCoinPackSmall	PurchasedCoinPackLarge	NumAdsClicked	DaysUser	UserConsole	UserHasOldOS
0	no	7	18	0	124	81	0	yes	no	8	0	0	4	no	yes	3	2101	no	no
1	no	10	3	2	60	18	479	no	no	10	7	0	0	yes	no	7	1644	yes	no
2	no	2	1	0	0	0	0	no	no	0	0	0	2	no	no	8	3197	yes	yes
3	no	2	11	1	125	73	217	no	no	0	0	0	0	yes	no	6	913	no	no
4	no	8	15	0	0	6	51	yes	no	0	0	2	1	yes	no	21	2009	yes	no

Code

## loading the treatment data
cg_ad_treatment = pd.read_parquet("cg_ad_treatment.parquet").reset_index(drop=True)
cg_ad_treatment.head()

	converted	GameLevel	NumGameDays	NumGameDays4Plus	NumInGameMessagesSent	NumSpaceHeroBadges	NumFriendRequestIgnored	NumFriends	AcquiredSpaceship	AcquiredIonWeapon	...	TimesCaptain	PurchasedCoinPackSmall	PurchasedCoinPackLarge	NumAdsClicked	DaysUser	UserConsole	UserHasOldOS	rnd_30k
0	no	6	16	0	0	0	0	0	yes	no	...	0	no	no	11	1827	no	no	0
1	no	2	8	0	0	0	5	4	no	no	...	8	yes	no	3	1889	no	yes	1
2	no	6	1	0	0	0	0	0	no	no	...	0	no	yes	2	1948	yes	no	0
3	yes	7	16	0	102	1	0	194	no	no	...	0	yes	yes	21	3409	yes	yes	0
4	no	10	1	1	233	0	23	0	no	no	...	5	no	yes	4	2922	yes	no	0

5 rows × 21 columns

Variable Definitions

Variable	Description
`converted`	Purchased the Zalon campain (“yes” or “no”)
`GameLevel`	Highest level of game achieved by the user
`NumGameDays`	Number of days user played the game in last month (with or without network connection)
`NumGameDays4Plus`	Number of days user played the game in last month with 4 or more total users (this implies using a network connection)
`NumInGameMessagesSent`	Number of in-game messages sent to friends
`NumFriends`	Number of friends to which the user is connected (necessary to crew together in multiplayer mode)
`NumFriendRequestIgnored`	Number of friend requests this user has not replied to since game inception
`NumSpaceHeroBadges`	Number of “Space Hero” badges, the highest distinction for gameplay in Space Pirates
`AcquiredSpaceship`	Flag if the user owns a spaceship, i.e., does not have to crew on another user’s or NPC’s space ship (“no” or “yes”)
`AcquiredIonWeapon`	Flag if the user owns the powerful “ion weapon” (“no” or “yes”)
`TimesLostSpaceship`	The number of times the user destroyed his/her spaceship during gameplay. Spaceships need to be re-acquired if destroyed.
`TimesKilled`	Number of times the user was killed during gameplay
`TimesCaptain`	Number of times in last month that the user played in the role of a captain
`TimesNavigator`	Number of times in last month that the user played in the role of a navigator
`PurchasedCoinPackSmall`	Flag if the user purchased a small pack of Zathium in last month (“no” or “yes”)
`PurchasedCoinPackLarge`	Flag if the user purchased a large pack of Zathium in last month (“no” or “yes”)
`NumAdsClicked`	Number of in-app ads the user has clicked on
`DaysUser`	Number of days since user established a user ID with Creative Gaming (for Space Pirates or previous games)
`UserConsole`	Flag if the user plays Creative Gaming games on a console (“no” or “yes”)
`UserHasOldOS`	Flag if the user has iOS version 10 or earlier (“no” or “yes”)
`rnd_30k`	Dummy variable that randomly selects 30K customers (1) and the remaining 90K (0)

Code

# Load the ad random data"
cg_ad_random = pd.read_parquet("cg_ad_random.parquet")
cg_ad_random

	converted	GameLevel	NumGameDays	NumGameDays4Plus	NumInGameMessagesSent	NumSpaceHeroBadges	NumFriendRequestIgnored	NumFriends	AcquiredSpaceship	AcquiredIonWeapon	TimesLostSpaceship	TimesKilled	TimesCaptain	TimesNavigator	PurchasedCoinPackSmall	PurchasedCoinPackLarge	NumAdsClicked	DaysUser	UserConsole	UserHasOldOS
0	no	2	8	0	0	0	5	4	no	no	0	0	8	0	yes	no	3	1889	no	yes
1	no	5	15	0	179	0	50	362	yes	no	22	0	4	4	no	no	2	1308	yes	no
2	no	7	7	0	267	0	64	0	no	no	5	0	0	0	no	yes	1	3562	yes	no
3	no	4	4	0	36	0	0	0	no	no	0	0	0	0	no	no	2	2922	yes	no
4	no	8	17	0	222	10	63	20	yes	no	10	0	9	6	yes	no	4	2192	yes	no
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
29995	no	5	1	0	0	0	0	0	no	no	0	0	0	0	yes	yes	11	2374	no	no
29996	no	9	12	0	78	0	59	1	yes	no	16	0	0	5	yes	no	2	1978	yes	no
29997	no	9	19	1	271	0	71	95	yes	no	14	0	0	3	no	no	2	2831	yes	yes
29998	no	10	23	0	76	6	20	107	no	no	38	0	1	0	no	no	9	3197	yes	no
29999	no	6	8	0	115	0	13	0	yes	no	5	0	0	4	no	no	11	2343	yes	no

30000 rows × 20 columns

Part I: Uplift Modeling Using Machine Learning

1. Prepare the data

Code

# a. Add "ad" to cg_ad_random and set its value to 1 for all rows
cg_ad_random["ad"] = 1

# b. Add "ad" to cg_organic_control and set its value to 0 for all rows
cg_organic_control["ad"] = 0

# c. Create a stacked dataset by combining cg_ad_random and cg_organic_control
cg_rct_stacked = pd.concat([cg_ad_random, cg_organic_control], axis=0)

cg_rct_stacked['converted_yes']= rsm.ifelse(
    cg_rct_stacked.converted == "yes", 1, rsm.ifelse(cg_rct_stacked.converted == "no", 0, np.nan)
)

# d. Create a training variable
cg_rct_stacked['training'] = rsm.model.make_train(
    data=cg_rct_stacked, test_size=0.3, strat_var=['converted', 'ad'], random_state = 1234)

# Check the proportions of the training variable
cg_rct_stacked.training.value_counts(normalize=True)

training
1.0    0.7
0.0    0.3
Name: proportion, dtype: float64

Code

pd.crosstab(cg_rct_stacked.converted, [cg_rct_stacked.ad, cg_rct_stacked.training]).round(2)

ad	0		1
training	0.0	1.0	0.0	1.0
converted
yes	512	1194	1174	2739
no	8488	19806	7826	18261

Code

len(cg_rct_stacked.query('training == 0 & ad == 0'))

Code

len(cg_rct_stacked.query('training == 0 & ad == 1'))

Code

# e. Check if the proportion of the training variable is similar across the ad and control groups
pd.crosstab(
    cg_rct_stacked.converted, [cg_rct_stacked.ad, cg_rct_stacked.training], normalize="columns"
).round(3)

ad	0		1
training	0.0	1.0	0.0	1.0
converted
yes	0.057	0.057	0.13	0.13
no	0.943	0.943	0.87	0.87

Using Logistic Regression

2. Train an uplift model

Code

# Assign variables to evar
evar = [
        "GameLevel",
        "NumGameDays",
        "NumGameDays4Plus",
        "NumInGameMessagesSent",
        "NumFriends",
        "NumFriendRequestIgnored",
        "NumSpaceHeroBadges",
        "AcquiredSpaceship",
        "AcquiredIonWeapon",
        "TimesLostSpaceship",
        "TimesKilled",
        "TimesCaptain",
        "TimesNavigator",
        "PurchasedCoinPackSmall",
        "PurchasedCoinPackLarge",
        "NumAdsClicked",
        "DaysUser",
        "UserConsole",
        "UserHasOldOS"
    ]

Code

lr_treatment = rsm.model.logistic(
    data = {'cg_rct_stacked': cg_rct_stacked.query("training == 1 & ad == 1")},
    rvar = 'converted',
    lev = 'yes',
    evar = evar,
)
lr_treatment.summary()

Logistic regression (GLM)
Data                 : cg_rct_stacked
Response variable    : converted
Level                : yes
Explanatory variables: GameLevel, NumGameDays, NumGameDays4Plus, NumInGameMessagesSent, NumFriends, NumFriendRequestIgnored, NumSpaceHeroBadges, AcquiredSpaceship, AcquiredIonWeapon, TimesLostSpaceship, TimesKilled, TimesCaptain, TimesNavigator, PurchasedCoinPackSmall, PurchasedCoinPackLarge, NumAdsClicked, DaysUser, UserConsole, UserHasOldOS
Null hyp.: There is no effect of x on converted
Alt. hyp.: There is an effect of x on converted

                                OR     OR%  coefficient  std.error  z.value p.value     
Intercept                    0.030  -97.0%        -3.52      0.122  -28.987  < .001  ***
AcquiredSpaceship[yes]       1.088    8.8%         0.08      0.049    1.732   0.083    .
AcquiredIonWeapon[yes]       0.917   -8.3%        -0.09      0.164   -0.533   0.594     
PurchasedCoinPackSmall[yes]  1.045    4.5%         0.04      0.046    0.960   0.337     
PurchasedCoinPackLarge[yes]  1.211   21.1%         0.19      0.049    3.930  < .001  ***
UserConsole[yes]             0.945   -5.5%        -0.06      0.058   -0.979   0.328     
UserHasOldOS[yes]            0.799  -20.1%        -0.22      0.081   -2.752   0.006   **
GameLevel                    1.059    5.9%         0.06      0.009    6.399  < .001  ***
NumGameDays                  1.015    1.5%         0.02      0.004    4.264  < .001  ***
NumGameDays4Plus             1.011    1.1%         0.01      0.006    1.674   0.094    .
NumInGameMessagesSent        1.000    0.0%         0.00      0.000    0.205   0.838     
NumFriends                   1.002    0.2%         0.00      0.000    9.255  < .001  ***
NumFriendRequestIgnored      1.000   -0.0%        -0.00      0.001   -0.484   0.628     
NumSpaceHeroBadges           1.028    2.8%         0.03      0.009    2.968   0.003   **
TimesLostSpaceship           0.993   -0.7%        -0.01      0.002   -2.964   0.003   **
TimesKilled                  1.001    0.1%         0.00      0.006    0.201   0.841     
TimesCaptain                 1.005    0.5%         0.01      0.002    2.054    0.04    *
TimesNavigator               1.001    0.1%         0.00      0.003    0.205   0.838     
NumAdsClicked                1.094    9.4%         0.09      0.003   33.156  < .001  ***
DaysUser                     1.000   -0.0%        -0.00      0.000   -0.469   0.639     

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Pseudo R-squared (McFadden): 0.096
Pseudo R-squared (McFadden adjusted): 0.094
Area under the RO Curve (AUC): 0.712
Log-likelihood: -7346.776, AIC: 14733.552, BIC: 14892.598
Chi-squared: 1568.873, df(19), p.value < 0.001 
Nr obs: 21,000

Code

lr_control = rsm.model.logistic(
    data={'cg_rct_stacked': cg_rct_stacked.query("training == 1 & ad == 0")},
    rvar = 'converted',
    lev = 'yes',
    evar = evar
)
lr_control.summary()

Logistic regression (GLM)
Data                 : cg_rct_stacked
Response variable    : converted
Level                : yes
Explanatory variables: GameLevel, NumGameDays, NumGameDays4Plus, NumInGameMessagesSent, NumFriends, NumFriendRequestIgnored, NumSpaceHeroBadges, AcquiredSpaceship, AcquiredIonWeapon, TimesLostSpaceship, TimesKilled, TimesCaptain, TimesNavigator, PurchasedCoinPackSmall, PurchasedCoinPackLarge, NumAdsClicked, DaysUser, UserConsole, UserHasOldOS
Null hyp.: There is no effect of x on converted
Alt. hyp.: There is an effect of x on converted

                                OR     OR%  coefficient  std.error  z.value p.value     
Intercept                    0.006  -99.4%        -5.18      0.193  -26.809  < .001  ***
AcquiredSpaceship[yes]       1.594   59.4%         0.47      0.072    6.472  < .001  ***
AcquiredIonWeapon[yes]       0.860  -14.0%        -0.15      0.267   -0.566   0.571     
PurchasedCoinPackSmall[yes]  1.029    2.9%         0.03      0.069    0.415   0.678     
PurchasedCoinPackLarge[yes]  1.338   33.8%         0.29      0.074    3.947  < .001  ***
UserConsole[yes]             1.148   14.8%         0.14      0.093    1.490   0.136     
UserHasOldOS[yes]            0.832  -16.8%        -0.18      0.124   -1.479   0.139     
GameLevel                    1.114   11.4%         0.11      0.014    7.527  < .001  ***
NumGameDays                  1.033    3.3%         0.03      0.005    5.954  < .001  ***
NumGameDays4Plus             1.047    4.7%         0.05      0.008    5.538  < .001  ***
NumInGameMessagesSent        1.001    0.1%         0.00      0.000    3.311  < .001  ***
NumFriends                   1.001    0.1%         0.00      0.000    4.809  < .001  ***
NumFriendRequestIgnored      0.989   -1.1%        -0.01      0.001   -8.415  < .001  ***
NumSpaceHeroBadges           1.523   52.3%         0.42      0.013   32.587  < .001  ***
TimesLostSpaceship           0.946   -5.4%        -0.06      0.006   -9.189  < .001  ***
TimesKilled                  1.006    0.6%         0.01      0.005    1.103    0.27     
TimesCaptain                 0.998   -0.2%        -0.00      0.003   -0.487   0.626     
TimesNavigator               0.989   -1.1%        -0.01      0.005   -2.365   0.018    *
NumAdsClicked                1.031    3.1%         0.03      0.004    8.114  < .001  ***
DaysUser                     1.000    0.0%         0.00      0.000    2.335    0.02    *

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Pseudo R-squared (McFadden): 0.202
Pseudo R-squared (McFadden adjusted): 0.198
Area under the RO Curve (AUC): 0.831
Log-likelihood: -3656.883, AIC: 7353.766, BIC: 7512.812
Chi-squared: 1851.927, df(19), p.value < 0.001 
Nr obs: 21,000

Create predictions

Code

cg_rct_stacked["pred_treatment"] = lr_treatment.predict(cg_rct_stacked)["prediction"]
cg_rct_stacked["pred_control"] = lr_control.predict(cg_rct_stacked)["prediction"]

Code

pred_store = pd.DataFrame({
    "pred_treatment": cg_rct_stacked.pred_treatment,
    "pred_control": cg_rct_stacked.pred_control
})

cg_rct_stacked["uplift_score"] = (
    cg_rct_stacked.pred_treatment - cg_rct_stacked.pred_control
)

3. Calculate the Uplift and Incremental Uplift

Uplift Tab

Code

uplift_tab = rsm.uplift_tab(
    cg_rct_stacked.query("training == 0"), "converted", "yes", "uplift_score", "ad", 1, qnt = 20
)
uplift_tab

	pred	bins	cum_prop	T_resp	T_n	C_resp	C_n	incremental_resp	inc_uplift	uplift
0	uplift_score	1	0.05	197	450	70	634	147.315457	1.636838	0.327368
1	uplift_score	2	0.10	309	900	99	1182	233.619289	2.595770	0.195969
2	uplift_score	3	0.15	428	1350	125	1686	327.911032	3.643456	0.212857
3	uplift_score	4	0.20	528	1800	152	2175	402.206897	4.468966	0.167007
4	uplift_score	5	0.25	594	2250	166	2684	454.842027	5.053800	0.119162
5	uplift_score	6	0.30	642	2700	183	3150	485.142857	5.390476	0.070186
6	uplift_score	7	0.35	681	3150	195	3658	513.080372	5.700893	0.063045
7	uplift_score	8	0.40	719	3600	200	4127	544.539133	6.050435	0.073783
8	uplift_score	9	0.45	756	4050	210	4577	570.179594	6.335329	0.060000
9	uplift_score	10	0.50	791	4500	231	5076	586.212766	6.513475	0.035694
10	uplift_score	11	0.55	831	4950	249	5555	609.118812	6.767987	0.051311
11	uplift_score	12	0.60	859	5400	262	6031	624.412038	6.937912	0.034911
12	uplift_score	13	0.65	892	5850	275	6486	643.965772	7.155175	0.044762
13	uplift_score	14	0.70	937	6300	288	6938	675.483713	7.505375	0.071239
14	uplift_score	15	0.75	980	6750	298	7384	707.586674	7.862074	0.073134
15	uplift_score	16	0.80	1014	7200	312	7805	726.184497	8.068717	0.042301
16	uplift_score	17	0.85	1047	7650	319	8224	750.264835	8.336276	0.056627
17	uplift_score	18	0.90	1079	8100	360	8621	740.756177	8.230624	-0.032163
18	uplift_score	19	0.95	1134	8550	432	8831	715.746122	7.952735	-0.220635
19	uplift_score	20	1.00	1174	9000	512	9000	662.000000	7.355556	-0.384484

Gain Plot

Code

fig = rsm.inc_uplift_plot(
    cg_rct_stacked.query("training == 0"), "converted", "yes", "uplift_score", "ad", 1, qnt = 20
)

The curve starts at 0% uplift when 0% of the population is targeted (as expected, because no one has been exposed to the campaign).
As the percentage of the targeted population increases, the incremental uplift also increases, suggesting that targeting more of the population is yielding positive results.
The curve rises sharply at first, indicating that initially targeting the most responsive segments of the population yields significant uplift.
After reaching a peak (which seems to be just under 80% of the population targeted), the incremental uplift begins to plateau or decrease slightly, suggesting that beyond this point, targeting additional segments of the population adds less value or could potentially include less responsive or non-responsive individuals.

Code

fig = rsm.uplift_plot(
    cg_rct_stacked.query("training == 0"), "converted", "yes", "uplift_score", "ad", 1, qnt = 20
)

The first decile (the top 10% predicted to be most responsive) shows the highest uplift, above 20%.
The uplift decreases across subsequent deciles, which is consistent with the expectation that the first deciles contain the most responsive individuals.
There is a noticeable decline in uplift as we move to higher deciles. The uplift becomes negative in the last deciles, indicating that targeting these segments would result in worse outcomes than if they were not targeted at all.
Negative uplift in the later deciles could indicate that the campaign has a counterproductive effect on these individuals or that they would have been better or equally likely to take the desired action without the campaign intervention.

4. Use the incremental_resp to calculate the profits

Code

price = 14.99
cost = 1.5

Code

target_row = uplift_tab[uplift_tab['cum_prop'] <= 0.25].iloc[-1]
target_row

pred                uplift_score
bins                           5
cum_prop                    0.25
T_resp                       594
T_n                         2250
C_resp                       166
C_n                         2684
incremental_resp      454.842027
inc_uplift                5.0538
uplift                  0.119162
Name: 4, dtype: object

Code

# Define the function to calculate the profit
def prof_calc(data, price = 14.99, cost = 1.5):
    # Given variables
    target_customers = 30000
    target_prop = 30000 / 120000

    # Calculate the scale factor
    scale_factor = 120000 / 9000

    # Calculate the expected incremental customers and profits
    target_row = data[data['cum_prop'] <= target_prop].iloc[-1]
    profit = (price*target_row['incremental_resp'] - cost * target_row['T_n']) * scale_factor
    return profit

Code

uplift_profit_logit = prof_calc(uplift_tab, 14.99, 1.5)
uplift_profit_logit

45907.75976154993

5. Calculate the uplift and Increatmental Uplift for Propensity Model

Code

propensity_tab = rsm.uplift_tab(
    cg_rct_stacked.query("training == 0"), "converted", "yes", "pred_treatment", "ad", 1, qnt = 20)
propensity_tab

	pred	bins	cum_prop	T_resp	T_n	C_resp	C_n	incremental_resp	inc_uplift	uplift
0	pred_treatment	1	0.05	204	450	80	603	144.298507	1.603317	0.320663
1	pred_treatment	2	0.10	326	900	112	1131	236.875332	2.631948	0.210505
2	pred_treatment	3	0.15	430	1350	159	1605	296.261682	3.291796	0.131955
3	pred_treatment	4	0.20	525	1800	206	1994	339.042126	3.767135	0.090288
4	pred_treatment	5	0.25	615	2250	239	2344	385.584471	4.284272	0.105714
5	pred_treatment	6	0.30	672	2700	285	2807	397.863912	4.420710	0.027315
6	pred_treatment	7	0.35	726	3150	316	3162	411.199241	4.568880	0.032676
7	pred_treatment	8	0.40	775	3600	336	3603	439.279767	4.880886	0.063537
8	pred_treatment	9	0.45	813	4050	361	4044	451.464392	5.016271	0.027755
9	pred_treatment	10	0.50	838	4500	386	4527	454.302187	5.047802	0.003796
10	pred_treatment	11	0.55	885	4950	409	4991	479.359848	5.326221	0.054875
11	pred_treatment	12	0.60	916	5400	436	5479	486.286549	5.403184	0.013561
12	pred_treatment	13	0.65	951	5850	455	5880	498.321429	5.536905	0.030396
13	pred_treatment	14	0.70	995	6300	464	6332	533.344915	5.926055	0.077866
14	pred_treatment	15	0.75	1028	6750	475	6793	556.006772	6.177853	0.049472
15	pred_treatment	16	0.80	1065	7200	483	7256	585.727674	6.508085	0.064944
16	pred_treatment	17	0.85	1091	7650	495	7649	595.935286	6.621503	0.027243
17	pred_treatment	18	0.90	1120	8100	499	8093	620.568392	6.895204	0.055435
18	pred_treatment	19	0.95	1148	8550	508	8571	641.244662	7.124941	0.043394
19	pred_treatment	20	1.00	1174	9000	512	9000	662.000000	7.355556	0.048454

Code

fig = rsm.inc_uplift_plot(
    cg_rct_stacked.query("training == 0"), "converted", "yes", "pred_treatment", "ad", 1, qnt = 20)

Code

fig = rsm.uplift_plot(
    cg_rct_stacked.query("training == 0"), 
    "converted", "yes", "pred_treatment", "ad", 1, qnt = 20)

Compare Uplift model and Propensity model

Code

fig = rsm.inc_uplift_plot(
    cg_rct_stacked.query("training == 0"),
    "converted",
    "yes",
    ["pred_treatment", "uplift_score"],
    "ad",
    1, qnt = 20
)

Uplift Model Performance: The line for the uplift_score generally lies above the line for the pred_treatment, indicating that the uplift model consistently provides a higher incremental uplift across the different percentages of the population targeted.
Diminishing Returns: Both lines show a trend of diminishing returns as more of the population is targeted, with the incremental uplift peaking and then plateauing or slightly decreasing, suggesting an optimal targeting point before 100%.
Comparison: The propensity model appears to perform better than random targeting (which would be a straight line from the origin), but the uplift model is more effective in achieving incremental gains. This suggests that while the propensity model can identify likely responders, the uplift model is better at identifying those for whom the treatment would make a difference in their behavior.

Code

fig = rsm.uplift_plot(
    cg_rct_stacked.query("training == 0"),
    "converted",
    "yes",
    ["pred_treatment", "uplift_score"],
    "ad",
    1, qnt = 20
)

Uplift Distribution: Both sets of bars show a decrease in uplift as we move through the population segments, which is expected as the most responsive individuals are often targeted first.
Model Comparison: In some segments, the uplift_score bars are higher than the pred_treatment bars, reinforcing the idea that the uplift model is more effective in certain segments.
Negative Uplift: Towards the latter segments, both models show negative uplift, but the uplift_score model tends to have less severe negative values. The uplift model places customers with high incrementality in earlier deciles. The incrementality is lower for propensity model because it targets Persuadables and Sure Things whereas the uplift model targets only the former. This suggests that the uplift model may be better at minimizing the risk of targeting individuals who would have a negative response to the treatment.

That said, the propensity model still performs well here; this is because the customers who have the best propensity also tend to have the best uplift in this data:

Code

cm = rsm.correlation(
    {"cg_rct_stacked": cg_rct_stacked.loc[cg_rct_stacked.training == 0, "pred_treatment": "uplift_score"]})
cm.summary()

Correlation
Data     : cg_rct_stacked
Method   : pearson
Cutoff   : 0
Variables: pred_treatment, pred_control, uplift_score
Null hyp.: variables x and y are not correlated
Alt. hyp.: variables x and y are correlated

Correlation matrix:
             pred_treatment pred_control
pred_control           0.28             
uplift_score           0.55        -0.65

p.values:
             pred_treatment pred_control
pred_control            0.0             
uplift_score            0.0          0.0

The positive correlation between pred_treatment and uplift_score is in line with what we would expect, as a higher predicted treatment response should correspond with a higher uplift score. The negative correlation between pred_control and uplift_score suggests that individuals who are likely to respond without any intervention (as predicted by the control model) are properly being identified as not contributing to uplift, which is a desirable feature of a good uplift model.

6. Use the incremental_resp to calculate the profits for Propensity Model

Code

propensity_profit_logit = prof_calc(propensity_tab, 14.99, 1.5)
propensity_profit_logit

32065.482935153585

Code

# Difference in profits from using uplift model and propensity model
difference_logit = uplift_profit_logit - propensity_profit_logit
difference_logit

13842.276826396348

Using Neural Network

2. Train an uplift model

Code

clf_treatment = rsm.model.mlp(
    data = {'cg_rct_stacked': cg_rct_stacked.query("training == 1 & ad == 1")},
    rvar = 'converted',
    lev = 'yes',
    evar = evar,
    hidden_layer_sizes = (1, ),
    alpha = 0.1
)
clf_treatment.summary()

Multi-layer Perceptron (NN)
Data                 : cg_rct_stacked
Response variable    : converted
Level                : yes
Explanatory variables: GameLevel, NumGameDays, NumGameDays4Plus, NumInGameMessagesSent, NumFriends, NumFriendRequestIgnored, NumSpaceHeroBadges, AcquiredSpaceship, AcquiredIonWeapon, TimesLostSpaceship, TimesKilled, TimesCaptain, TimesNavigator, PurchasedCoinPackSmall, PurchasedCoinPackLarge, NumAdsClicked, DaysUser, UserConsole, UserHasOldOS
Model type           : classification
Nr. of features      : (19, 19)
Nr. of observations  : 21,000
Hidden_layer_sizes   : (1,)
Activation function  : tanh
Solver               : lbfgs
Alpha                : 0.1
Batch size           : auto
Learning rate        : 0.001
Maximum iterations   : 10000
random_state         : 1234
AUC                  : 0.712

Raw data             :
 GameLevel  NumGameDays  NumGameDays4Plus  NumInGameMessagesSent  NumFriends  NumFriendRequestIgnored  NumSpaceHeroBadges AcquiredSpaceship AcquiredIonWeapon  TimesLostSpaceship  TimesKilled  TimesCaptain  TimesNavigator PurchasedCoinPackSmall PurchasedCoinPackLarge  NumAdsClicked  DaysUser UserConsole UserHasOldOS
         5           15                 0                    179         362                       50                   0               yes                no                  22            0             4               4                     no                     no              2      1308         yes           no
         4            4                 0                     36           0                        0                   0                no                no                   0            0             0               0                     no                     no              2      2922         yes           no
         8           17                 0                    222          20                       63                  10               yes                no                  10            0             9               6                    yes                     no              4      2192         yes           no
        10           18                 2                      0          56                        6                   2                no                no                   1            0             0               0                     no                    yes             13      2313         yes           no
        10           20                 5                     36           0                       16                   0                no                no                   0            0             0               0                     no                    yes              9      1766         yes           no

Estimation data      :
 GameLevel  NumGameDays  NumGameDays4Plus  NumInGameMessagesSent  NumFriends  NumFriendRequestIgnored  NumSpaceHeroBadges  TimesLostSpaceship  TimesKilled  TimesCaptain  TimesNavigator  NumAdsClicked  DaysUser  AcquiredSpaceship_yes  AcquiredIonWeapon_yes  PurchasedCoinPackSmall_yes  PurchasedCoinPackLarge_yes  UserConsole_yes  UserHasOldOS_yes
 -0.480555     0.361225         -0.411200               0.968856    3.186178                 0.577047           -0.371464            1.269286    -0.081624      0.347884        0.390873      -0.988082 -1.999414                   True                  False                       False                       False             True             False
 -0.842392    -1.185248         -0.411200              -0.360274   -0.525098                -0.876726           -0.371464           -0.301827    -0.081624     -0.205861       -0.201936      -0.988082  0.439831                  False                  False                       False                       False             True             False
  0.604958     0.642403         -0.411200               1.368525   -0.320055                 0.955027            4.091205            0.412315    -0.081624      1.040065        0.687277      -0.695079 -0.663421                   True                  False                        True                       False             True             False
  1.328633     0.782991          0.170373              -0.694880    0.049022                -0.702273            0.521070           -0.230413    -0.081624     -0.205861       -0.201936       0.623433 -0.480554                  False                  False                       False                        True             True             False
  1.328633     1.064168          1.042734              -0.360274   -0.525098                -0.411519           -0.371464           -0.301827    -0.081624     -0.205861       -0.201936       0.037428 -1.307237                  False                  False                       False                        True             True             False

Model Tuning

Code

hls = [(1,), (2,), (3,), (3, 3), (4, 2), (5, 5), (5,), (10,), (5,10), (10,5)]
alpha = [0.0001, 0.001, 0.01, 0.1, 1]


param_grid = {"hidden_layer_sizes": hls, "alpha": alpha}
scoring = {"AUC": "roc_auc"}

clf_cv_treatment = GridSearchCV(
    clf_treatment.fitted, param_grid, scoring=scoring, cv=5, n_jobs=4, refit="AUC", verbose=5
)

Code

clf_cv_treatment.fit(clf_treatment.data_onehot, clf_treatment.data.converted_yes)

Fitting 5 folds for each of 50 candidates, totalling 250 fits

/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/neural_network/_multilayer_perceptron.py:546: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/neural_network/_multilayer_perceptron.py:546: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)

GridSearchCV(cv=5,
             estimator=MLPClassifier(activation='tanh', alpha=0.1,
                                     hidden_layer_sizes=(1,), max_iter=10000,
                                     random_state=1234, solver='lbfgs'),
             n_jobs=4,
             param_grid={'alpha': [0.0001, 0.001, 0.01, 0.1, 1],
                         'hidden_layer_sizes': [(1,), (2,), (3,), (3, 3),
                                                (4, 2), (5, 5), (5,), (10,),
                                                (5, 10), (10, 5)]},
             refit='AUC', scoring={'AUC': 'roc_auc'}, verbose=5)

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Code

clf_cv_treatment.best_params_

{'alpha': 0.0001, 'hidden_layer_sizes': (4, 2)}

Code

clf_treatment = rsm.model.mlp(
    data = {'cg_rct_stacked': cg_rct_stacked.query("training == 1 & ad == 1")},
    rvar = 'converted',
    lev = 'yes',
    evar = evar,
    **clf_cv_treatment.best_params_
)
clf_treatment.summary()

Multi-layer Perceptron (NN)
Data                 : cg_rct_stacked
Response variable    : converted
Level                : yes
Explanatory variables: GameLevel, NumGameDays, NumGameDays4Plus, NumInGameMessagesSent, NumFriends, NumFriendRequestIgnored, NumSpaceHeroBadges, AcquiredSpaceship, AcquiredIonWeapon, TimesLostSpaceship, TimesKilled, TimesCaptain, TimesNavigator, PurchasedCoinPackSmall, PurchasedCoinPackLarge, NumAdsClicked, DaysUser, UserConsole, UserHasOldOS
Model type           : classification
Nr. of features      : (19, 19)
Nr. of observations  : 21,000
Hidden_layer_sizes   : (4, 2)
Activation function  : tanh
Solver               : lbfgs
Alpha                : 0.0001
Batch size           : auto
Learning rate        : 0.001
Maximum iterations   : 10000
random_state         : 1234
AUC                  : 0.792

Raw data             :
 GameLevel  NumGameDays  NumGameDays4Plus  NumInGameMessagesSent  NumFriends  NumFriendRequestIgnored  NumSpaceHeroBadges AcquiredSpaceship AcquiredIonWeapon  TimesLostSpaceship  TimesKilled  TimesCaptain  TimesNavigator PurchasedCoinPackSmall PurchasedCoinPackLarge  NumAdsClicked  DaysUser UserConsole UserHasOldOS
         5           15                 0                    179         362                       50                   0               yes                no                  22            0             4               4                     no                     no              2      1308         yes           no
         4            4                 0                     36           0                        0                   0                no                no                   0            0             0               0                     no                     no              2      2922         yes           no
         8           17                 0                    222          20                       63                  10               yes                no                  10            0             9               6                    yes                     no              4      2192         yes           no
        10           18                 2                      0          56                        6                   2                no                no                   1            0             0               0                     no                    yes             13      2313         yes           no
        10           20                 5                     36           0                       16                   0                no                no                   0            0             0               0                     no                    yes              9      1766         yes           no

Estimation data      :
 GameLevel  NumGameDays  NumGameDays4Plus  NumInGameMessagesSent  NumFriends  NumFriendRequestIgnored  NumSpaceHeroBadges  TimesLostSpaceship  TimesKilled  TimesCaptain  TimesNavigator  NumAdsClicked  DaysUser  AcquiredSpaceship_yes  AcquiredIonWeapon_yes  PurchasedCoinPackSmall_yes  PurchasedCoinPackLarge_yes  UserConsole_yes  UserHasOldOS_yes
 -0.480555     0.361225         -0.411200               0.968856    3.186178                 0.577047           -0.371464            1.269286    -0.081624      0.347884        0.390873      -0.988082 -1.999414                   True                  False                       False                       False             True             False
 -0.842392    -1.185248         -0.411200              -0.360274   -0.525098                -0.876726           -0.371464           -0.301827    -0.081624     -0.205861       -0.201936      -0.988082  0.439831                  False                  False                       False                       False             True             False
  0.604958     0.642403         -0.411200               1.368525   -0.320055                 0.955027            4.091205            0.412315    -0.081624      1.040065        0.687277      -0.695079 -0.663421                   True                  False                        True                       False             True             False
  1.328633     0.782991          0.170373              -0.694880    0.049022                -0.702273            0.521070           -0.230413    -0.081624     -0.205861       -0.201936       0.623433 -0.480554                  False                  False                       False                        True             True             False
  1.328633     1.064168          1.042734              -0.360274   -0.525098                -0.411519           -0.371464           -0.301827    -0.081624     -0.205861       -0.201936       0.037428 -1.307237                  False                  False                       False                        True             True             False

Code

clf_control = rsm.model.mlp(
    data = {'cg_rct_stacked': cg_rct_stacked.query("training == 1 & ad == 0")},
    rvar = 'converted',
    lev = 'yes',
    evar = evar,
    hidden_layer_sizes = (1, ),
    alpha = 0.0001
)
clf_control.summary()

Multi-layer Perceptron (NN)
Data                 : cg_rct_stacked
Response variable    : converted
Level                : yes
Explanatory variables: GameLevel, NumGameDays, NumGameDays4Plus, NumInGameMessagesSent, NumFriends, NumFriendRequestIgnored, NumSpaceHeroBadges, AcquiredSpaceship, AcquiredIonWeapon, TimesLostSpaceship, TimesKilled, TimesCaptain, TimesNavigator, PurchasedCoinPackSmall, PurchasedCoinPackLarge, NumAdsClicked, DaysUser, UserConsole, UserHasOldOS
Model type           : classification
Nr. of features      : (19, 19)
Nr. of observations  : 21,000
Hidden_layer_sizes   : (1,)
Activation function  : tanh
Solver               : lbfgs
Alpha                : 0.0001
Batch size           : auto
Learning rate        : 0.001
Maximum iterations   : 10000
random_state         : 1234
AUC                  : 0.841

Raw data             :
 GameLevel  NumGameDays  NumGameDays4Plus  NumInGameMessagesSent  NumFriends  NumFriendRequestIgnored  NumSpaceHeroBadges AcquiredSpaceship AcquiredIonWeapon  TimesLostSpaceship  TimesKilled  TimesCaptain  TimesNavigator PurchasedCoinPackSmall PurchasedCoinPackLarge  NumAdsClicked  DaysUser UserConsole UserHasOldOS
         7           18                 0                    124           0                       81                   0               yes                no                   8            0             0               4                     no                    yes              3      2101          no           no
        10            3                 2                     60         479                       18                   0                no                no                  10            7             0               0                    yes                     no              7      1644         yes           no
         2            1                 0                      0           0                        0                   0                no                no                   0            0             0               2                     no                     no              8      3197         yes          yes
         8           15                 0                      0          51                        6                   0               yes                no                   0            0             2               1                    yes                     no             21      2009         yes           no
        10           18                 0                      0           0                        0                   0                no                no                   0            0             0               0                    yes                     no              6      3288         yes           no

Estimation data      :
 GameLevel  NumGameDays  NumGameDays4Plus  NumInGameMessagesSent  NumFriends  NumFriendRequestIgnored  NumSpaceHeroBadges  TimesLostSpaceship  TimesKilled  TimesCaptain  TimesNavigator  NumAdsClicked  DaysUser  AcquiredSpaceship_yes  AcquiredIonWeapon_yes  PurchasedCoinPackSmall_yes  PurchasedCoinPackLarge_yes  UserConsole_yes  UserHasOldOS_yes
  0.283862     0.806405         -0.399092               0.485069   -0.513099                 1.509762            -0.29461            0.343880    -0.075167     -0.180445        0.397524      -0.868677 -0.787337                   True                  False                       False                        True            False             False
  1.373047    -1.292688          0.204178              -0.119399    4.411754                -0.342325            -0.29461            0.527329     1.452288     -0.180445       -0.209504      -0.333738 -1.475407                  False                  False                        True                       False             True             False
 -1.531446    -1.572567         -0.399092              -0.686088   -0.513099                -0.871493            -0.29461           -0.389917    -0.075167     -0.180445        0.094010      -0.200003  0.862827                  False                  False                       False                       False             True              True
  0.646924     0.386586         -0.399092              -0.686088    0.011259                -0.695104            -0.29461           -0.389917    -0.075167      0.052946       -0.057747       1.538548 -0.925854                   True                  False                        True                       False             True             False
  1.373047     0.806405         -0.399092              -0.686088   -0.513099                -0.871493            -0.29461           -0.389917    -0.075167     -0.180445       -0.209504      -0.467473  0.999839                  False                  False                        True                       False             True             False

Code

# Model tunning
clf_cv_control = GridSearchCV(
    clf_control.fitted, param_grid, scoring=scoring, cv=5, n_jobs=4, refit="AUC", verbose=5
)
clf_cv_control.fit(clf_control.data_onehot, clf_control.data.converted_yes)

Fitting 5 folds for each of 50 candidates, totalling 250 fits

GridSearchCV(cv=5,
             estimator=MLPClassifier(activation='tanh', hidden_layer_sizes=(1,),
                                     max_iter=10000, random_state=1234,
                                     solver='lbfgs'),
             n_jobs=4,
             param_grid={'alpha': [0.0001, 0.001, 0.01, 0.1, 1],
                         'hidden_layer_sizes': [(1,), (2,), (3,), (3, 3),
                                                (4, 2), (5, 5), (5,), (10,),
                                                (5, 10), (10, 5)]},
             refit='AUC', scoring={'AUC': 'roc_auc'}, verbose=5)

Code

clf_cv_control.best_params_

{'alpha': 1, 'hidden_layer_sizes': (3, 3)}

Code

clf_control = rsm.model.mlp(
    data = {'cg_rct_stacked': cg_rct_stacked.query("training == 1 & ad == 0")},
    rvar = 'converted',
    lev = 'yes',
    evar = evar,
    **clf_cv_control.best_params_
)
clf_control.summary()

Multi-layer Perceptron (NN)
Data                 : cg_rct_stacked
Response variable    : converted
Level                : yes
Explanatory variables: GameLevel, NumGameDays, NumGameDays4Plus, NumInGameMessagesSent, NumFriends, NumFriendRequestIgnored, NumSpaceHeroBadges, AcquiredSpaceship, AcquiredIonWeapon, TimesLostSpaceship, TimesKilled, TimesCaptain, TimesNavigator, PurchasedCoinPackSmall, PurchasedCoinPackLarge, NumAdsClicked, DaysUser, UserConsole, UserHasOldOS
Model type           : classification
Nr. of features      : (19, 19)
Nr. of observations  : 21,000
Hidden_layer_sizes   : (3, 3)
Activation function  : tanh
Solver               : lbfgs
Alpha                : 1
Batch size           : auto
Learning rate        : 0.001
Maximum iterations   : 10000
random_state         : 1234
AUC                  : 0.861

Raw data             :
 GameLevel  NumGameDays  NumGameDays4Plus  NumInGameMessagesSent  NumFriends  NumFriendRequestIgnored  NumSpaceHeroBadges AcquiredSpaceship AcquiredIonWeapon  TimesLostSpaceship  TimesKilled  TimesCaptain  TimesNavigator PurchasedCoinPackSmall PurchasedCoinPackLarge  NumAdsClicked  DaysUser UserConsole UserHasOldOS
         7           18                 0                    124           0                       81                   0               yes                no                   8            0             0               4                     no                    yes              3      2101          no           no
        10            3                 2                     60         479                       18                   0                no                no                  10            7             0               0                    yes                     no              7      1644         yes           no
         2            1                 0                      0           0                        0                   0                no                no                   0            0             0               2                     no                     no              8      3197         yes          yes
         8           15                 0                      0          51                        6                   0               yes                no                   0            0             2               1                    yes                     no             21      2009         yes           no
        10           18                 0                      0           0                        0                   0                no                no                   0            0             0               0                    yes                     no              6      3288         yes           no

Estimation data      :
 GameLevel  NumGameDays  NumGameDays4Plus  NumInGameMessagesSent  NumFriends  NumFriendRequestIgnored  NumSpaceHeroBadges  TimesLostSpaceship  TimesKilled  TimesCaptain  TimesNavigator  NumAdsClicked  DaysUser  AcquiredSpaceship_yes  AcquiredIonWeapon_yes  PurchasedCoinPackSmall_yes  PurchasedCoinPackLarge_yes  UserConsole_yes  UserHasOldOS_yes
  0.283862     0.806405         -0.399092               0.485069   -0.513099                 1.509762            -0.29461            0.343880    -0.075167     -0.180445        0.397524      -0.868677 -0.787337                   True                  False                       False                        True            False             False
  1.373047    -1.292688          0.204178              -0.119399    4.411754                -0.342325            -0.29461            0.527329     1.452288     -0.180445       -0.209504      -0.333738 -1.475407                  False                  False                        True                       False             True             False
 -1.531446    -1.572567         -0.399092              -0.686088   -0.513099                -0.871493            -0.29461           -0.389917    -0.075167     -0.180445        0.094010      -0.200003  0.862827                  False                  False                       False                       False             True              True
  0.646924     0.386586         -0.399092              -0.686088    0.011259                -0.695104            -0.29461           -0.389917    -0.075167      0.052946       -0.057747       1.538548 -0.925854                   True                  False                        True                       False             True             False
  1.373047     0.806405         -0.399092              -0.686088   -0.513099                -0.871493            -0.29461           -0.389917    -0.075167     -0.180445       -0.209504      -0.467473  0.999839                  False                  False                        True                       False             True             False

Code

cg_rct_stacked["pred_treatment_nn"] = clf_treatment.predict(cg_rct_stacked)["prediction"]
cg_rct_stacked["pred_control_nn"] = clf_control.predict(cg_rct_stacked)["prediction"]

3. Calculate the Uplift and Incremental Uplift

Code

cg_rct_stacked["uplift_score_nn"] = (
    cg_rct_stacked.pred_treatment_nn - cg_rct_stacked.pred_control_nn
)

Code

uplift_tab_nn = rsm.uplift_tab(
    cg_rct_stacked.query("training == 0"), "converted", "yes", "uplift_score_nn", "ad", 1, qnt = 20
)
uplift_tab_nn

	pred	bins	cum_prop	T_resp	T_n	C_resp	C_n	incremental_resp	inc_uplift	uplift
0	uplift_score_nn	1	0.05	198	450	71	597	144.482412	1.605360	0.321072
1	uplift_score_nn	2	0.10	354	900	113	1138	264.632689	2.940363	0.269033
2	uplift_score_nn	3	0.15	476	1350	138	1650	363.090909	4.034343	0.222283
3	uplift_score_nn	4	0.20	588	1800	174	2164	443.268022	4.925200	0.178850
4	uplift_score_nn	5	0.25	673	2250	204	2661	500.508455	5.561205	0.128527
5	uplift_score_nn	6	0.30	754	2700	233	3172	555.670870	6.174121	0.123249
6	uplift_score_nn	7	0.35	811	3150	250	3696	597.931818	6.643687	0.094224
7	uplift_score_nn	8	0.40	854	3600	259	4186	631.257525	7.013973	0.077188
8	uplift_score_nn	9	0.45	893	4050	271	4696	659.279813	7.325331	0.063137
9	uplift_score_nn	10	0.50	931	4500	282	5210	687.429942	7.638110	0.063044
10	uplift_score_nn	11	0.55	969	4950	294	5668	712.242766	7.913809	0.058244
11	uplift_score_nn	12	0.60	987	5400	299	6090	721.876847	8.020854	0.028152
12	uplift_score_nn	13	0.65	1003	5850	304	6539	731.031809	8.122576	0.024420
13	uplift_score_nn	14	0.70	1025	6300	306	6926	746.657522	8.296195	0.043721
14	uplift_score_nn	15	0.75	1037	6750	311	7387	752.818329	8.364648	0.015821
15	uplift_score_nn	16	0.80	1052	7200	319	7799	757.500705	8.416675	0.013916
16	uplift_score_nn	17	0.85	1063	7650	326	8249	760.672445	8.451916	0.008889
17	uplift_score_nn	18	0.90	1087	8100	353	8631	755.717414	8.396860	-0.017347
18	uplift_score_nn	19	0.95	1138	8550	431	8846	721.421886	8.015799	-0.249457
19	uplift_score_nn	20	1.00	1174	9000	512	9000	662.000000	7.355556	-0.445974

Code

fig = rsm.inc_uplift_plot(
    cg_rct_stacked.query("training == 0"), "converted", "yes", "uplift_score_nn", "ad", 1, qnt = 20
)

The curve starts at the origin and increases as more of the population is targeted, reaching a peak before it starts to plateau. This indicates that the campaign has diminishing returns; after a certain point, targeting additional people results in smaller incremental gains.

Code

fig = rsm.uplift_plot(
    cg_rct_stacked.query("training == 0"), "converted", "yes", "uplift_score_nn", "ad", 1, qnt = 20
)

This pattern indicates that the first segments are highly responsive to the campaign, while the later segments may have been negatively influenced by the campaign or would have been better off not being targeted at all.

4. Using the incremental_resp to calculate the profits for Uplift model

Code

uplift_profit_nn = prof_calc(uplift_tab_nn, 14.99, 1.5)
uplift_profit_nn

55034.9566328448

5. Calculate the uplift and Increatmental Uplift for Propensity Model

Code

prop_tab_nn = rsm.uplift_tab(
    cg_rct_stacked.query("training == 0"), "converted", "yes", "pred_treatment_nn", "ad", 1, qnt = 20
)
prop_tab_nn

	pred	bins	cum_prop	T_resp	T_n	C_resp	C_n	incremental_resp	inc_uplift	uplift
0	pred_treatment_nn	1	0.05	201	450	74	586	144.174061	1.601934	0.320387
1	pred_treatment_nn	2	0.10	351	900	140	1076	233.899628	2.598885	0.198639
2	pred_treatment_nn	3	0.15	475	1350	178	1561	321.060218	3.567336	0.197205
3	pred_treatment_nn	4	0.20	589	1800	220	2044	395.262231	4.391803	0.166377
4	pred_treatment_nn	5	0.25	686	2250	257	2497	454.422107	5.049135	0.133878
5	pred_treatment_nn	6	0.30	765	2700	293	2962	497.916948	5.532411	0.098136
6	pred_treatment_nn	7	0.35	833	3150	346	3352	507.850835	5.642787	0.015214
7	pred_treatment_nn	8	0.40	896	3600	396	3698	510.494321	5.672159	-0.004509
8	pred_treatment_nn	9	0.45	941	4050	444	4013	492.906305	5.476737	-0.052381
9	pred_treatment_nn	10	0.50	982	4500	477	4398	493.937244	5.488192	0.005397
10	pred_treatment_nn	11	0.55	1023	4950	486	4870	529.016427	5.877960	0.072043
11	pred_treatment_nn	12	0.60	1053	5400	491	5383	560.449378	6.227215	0.056920
12	pred_treatment_nn	13	0.65	1083	5850	496	5908	591.869330	6.576326	0.057143
13	pred_treatment_nn	14	0.70	1102	6300	503	6352	603.117758	6.701308	0.026456
14	pred_treatment_nn	15	0.75	1116	6750	505	6835	617.280176	6.858669	0.026970
15	pred_treatment_nn	16	0.80	1131	7200	506	7272	630.009901	7.000110	0.031045
16	pred_treatment_nn	17	0.85	1143	7650	507	7657	636.463497	7.071817	0.024069
17	pred_treatment_nn	18	0.90	1161	8100	509	8123	653.441216	7.260458	0.035708
18	pred_treatment_nn	19	0.95	1170	8550	510	8541	659.462592	7.327362	0.017608
19	pred_treatment_nn	20	1.00	1174	9000	512	9000	662.000000	7.355556	0.004532

Compare Uplift model and Propensity model

Code

fig = rsm.inc_uplift_plot(
    cg_rct_stacked.query("training == 0"),
    "converted",
    "yes",
    ["pred_treatment_nn", "uplift_score_nn"],
    "ad",
    1, qnt = 20
)

The uplift_score_nn line generally lies above the pred_treatment_nn line, indicating that the uplift model predicts a higher incremental uplift across the different segments of the targeted population.
Both lines show a rise in incremental uplift with an increase in the targeted population, reaching a peak, and then beginning to plateau, suggesting a point of diminishing returns.
The uplift model’s curve suggests that targeting based on its scores leads to higher incremental gains compared to the propensity model, which is likely predicting the likelihood of response to the treatment without considering the control group’s response.

Code

fig = rsm.uplift_plot(
    cg_rct_stacked.query("training == 0"),
    "converted",
    "yes",
    ["pred_treatment_nn", "uplift_score_nn"],
    "ad",
    1, qnt = 20
)

The uplift decreases from the first to the last segment, which suggests that the initial segments are the most responsive to the targeting. The treatment model shows positive uplift in the early segments but drops off more sharply than the uplift model in later segments, indicating that the treatment model might be less effective at distinguishing between those who will respond due to the treatment and those who would have responded anyway.
The uplift model has a more gradual decline in uplift across segments and less negative uplift in the lower segments, which could imply that it is more effective at targeting the right individuals.
The negative values in later segments for both models suggest that certain individuals are either not influenced by or negatively influenced by the treatment. This could represent individuals who might purchase or respond anyway, so the propensity might have been an unnecessary expense for this group, or it could represent a group for whom the treatment had an adverse effect.

6. Using the Incremental_resp to calculate the profits for Propensity Model

Code

propensity_profit_nn = prof_calc(prop_tab_nn, 14.99, 1.5)
propensity_profit_nn

45823.831691362975

Code

# Different profit between Uplift model and Propensity model
difference_nn = uplift_profit_nn - propensity_profit_nn
difference_nn

9211.124941481823

Using Random Forest Model

2. Train an uplift model

Code

rf_treatment = rsm.model.rforest(
    data = {'cg_rct_stacked': cg_rct_stacked.query("training == 1 & ad == 1")},
    rvar = 'converted',
    lev = 'yes',
    evar = evar,
)
rf_treatment.summary()

Random Forest
Data                 : cg_rct_stacked
Response variable    : converted
Level                : yes
Explanatory variables: GameLevel, NumGameDays, NumGameDays4Plus, NumInGameMessagesSent, NumFriends, NumFriendRequestIgnored, NumSpaceHeroBadges, AcquiredSpaceship, AcquiredIonWeapon, TimesLostSpaceship, TimesKilled, TimesCaptain, TimesNavigator, PurchasedCoinPackSmall, PurchasedCoinPackLarge, NumAdsClicked, DaysUser, UserConsole, UserHasOldOS
OOB                  : True
Model type           : classification
Nr. of features      : (19, 19)
Nr. of observations  : 21,000
max_features         : sqrt (4)
n_estimators         : 100
min_samples_leaf     : 1
random_state         : 1234
AUC                  : 0.761

Estimation data      :
 GameLevel  NumGameDays  NumGameDays4Plus  NumInGameMessagesSent  NumFriends  NumFriendRequestIgnored  NumSpaceHeroBadges  TimesLostSpaceship  TimesKilled  TimesCaptain  TimesNavigator  NumAdsClicked  DaysUser  AcquiredSpaceship_yes  AcquiredIonWeapon_yes  PurchasedCoinPackSmall_yes  PurchasedCoinPackLarge_yes  UserConsole_yes  UserHasOldOS_yes
         5           15                 0                    179         362                       50                   0                  22            0             4               4              2      1308                   True                  False                       False                       False             True             False
         4            4                 0                     36           0                        0                   0                   0            0             0               0              2      2922                  False                  False                       False                       False             True             False
         8           17                 0                    222          20                       63                  10                  10            0             9               6              4      2192                   True                  False                        True                       False             True             False
        10           18                 2                      0          56                        6                   2                   1            0             0               0             13      2313                  False                  False                       False                        True             True             False
        10           20                 5                     36           0                       16                   0                   0            0             0               0              9      1766                  False                  False                       False                        True             True             False

Model Tuning

Code

max_features = [None, 'auto', 'sqrt', 'log2', 0.25, 0.5, 0.75, 1.0, 2.0, 3.0, 4.0]
n_estimators = [10, 50, 100, 200, 500, 1000]


param_grid = {"max_features": max_features, "n_estimators": n_estimators}
scoring = {"AUC": "roc_auc"}

rf_cv_treatment = GridSearchCV(rf_treatment.fitted, param_grid, scoring=scoring, cv=5, n_jobs=4, refit="AUC", verbose=5)

Code

rf_cv_treatment.fit(rf_treatment.data_onehot, rf_treatment.data.converted_yes)

Fitting 5 folds for each of 66 candidates, totalling 330 fits
[CV 1/5] END alpha=0.0001, hidden_layer_sizes=(1,); AUC: (test=0.711) total time=   0.2s
[CV 1/5] END alpha=0.0001, hidden_layer_sizes=(2,); AUC: (test=0.755) total time=   1.3s
[CV 1/5] END alpha=0.0001, hidden_layer_sizes=(3,); AUC: (test=0.755) total time=   7.4s
[CV 4/5] END alpha=0.0001, hidden_layer_sizes=(3, 3); AUC: (test=0.773) total time=  14.7s
[CV 2/5] END alpha=0.0001, hidden_layer_sizes=(4, 2); AUC: (test=0.777) total time=  14.7s
[CV 1/5] END alpha=0.0001, hidden_layer_sizes=(5, 5); AUC: (test=0.766) total time=  20.0s
[CV 5/5] END alpha=0.0001, hidden_layer_sizes=(5, 5); AUC: (test=0.769) total time=  13.5s
[CV 4/5] END alpha=0.0001, hidden_layer_sizes=(5,); AUC: (test=0.791) total time=   2.5s
[CV 1/5] END alpha=0.0001, hidden_layer_sizes=(10,); AUC: (test=0.759) total time=   2.1s
[CV 4/5] END alpha=0.0001, hidden_layer_sizes=(10,); AUC: (test=0.712) total time=   2.8s
[CV 3/5] END alpha=0.0001, hidden_layer_sizes=(5, 10); AUC: (test=0.752) total time= 1.5min
[CV 2/5] END alpha=0.0001, hidden_layer_sizes=(10, 5); AUC: (test=0.726) total time=  37.2s
[CV 5/5] END alpha=0.001, hidden_layer_sizes=(1,); AUC: (test=0.699) total time=   0.1s
[CV 2/5] END alpha=0.001, hidden_layer_sizes=(2,); AUC: (test=0.737) total time=   1.1s
[CV 3/5] END alpha=0.001, hidden_layer_sizes=(2,); AUC: (test=0.753) total time=   3.4s
[CV 1/5] END alpha=0.001, hidden_layer_sizes=(3,); AUC: (test=0.755) total time=  10.0s
[CV 4/5] END alpha=0.001, hidden_layer_sizes=(3, 3); AUC: (test=0.777) total time=  33.6s
[CV 5/5] END alpha=0.001, hidden_layer_sizes=(4, 2); AUC: (test=0.772) total time=  15.7s
[CV 4/5] END alpha=0.001, hidden_layer_sizes=(5, 5); AUC: (test=0.779) total time=  21.0s
[CV 1/5] END alpha=0.001, hidden_layer_sizes=(5,); AUC: (test=0.763) total time=   1.5s
[CV 2/5] END alpha=0.001, hidden_layer_sizes=(5,); AUC: (test=0.774) total time=   1.0s
[CV 3/5] END alpha=0.001, hidden_layer_sizes=(5,); AUC: (test=0.774) total time=   5.8s
[CV 3/5] END alpha=0.001, hidden_layer_sizes=(10,); AUC: (test=0.768) total time=   2.5s
[CV 1/5] END alpha=0.001, hidden_layer_sizes=(5, 10); AUC: (test=0.743) total time= 1.2min
[CV 1/5] END alpha=0.001, hidden_layer_sizes=(10, 5); AUC: (test=0.749) total time=  36.9s
[CV 3/5] END alpha=0.01, hidden_layer_sizes=(2,); AUC: (test=0.749) total time=   1.3s
[CV 1/5] END alpha=0.01, hidden_layer_sizes=(3,); AUC: (test=0.757) total time=  13.4s
[CV 4/5] END alpha=0.01, hidden_layer_sizes=(3, 3); AUC: (test=0.762) total time=  14.7s
[CV 3/5] END alpha=0.01, hidden_layer_sizes=(4, 2); AUC: (test=0.775) total time=   7.1s
[CV 2/5] END alpha=0.01, hidden_layer_sizes=(5, 5); AUC: (test=0.776) total time=  22.2s
[CV 1/5] END alpha=0.01, hidden_layer_sizes=(5,); AUC: (test=0.766) total time=   1.3s
[CV 2/5] END alpha=0.01, hidden_layer_sizes=(5,); AUC: (test=0.769) total time=   1.4s
[CV 3/5] END alpha=0.01, hidden_layer_sizes=(5,); AUC: (test=0.773) total time=   2.2s
[CV 4/5] END alpha=0.01, hidden_layer_sizes=(5,); AUC: (test=0.788) total time=   2.1s
[CV 5/5] END alpha=0.01, hidden_layer_sizes=(5,); AUC: (test=0.771) total time=   2.2s
[CV 1/5] END alpha=0.01, hidden_layer_sizes=(10,); AUC: (test=0.760) total time=   2.1s
[CV 2/5] END alpha=0.01, hidden_layer_sizes=(10,); AUC: (test=0.744) total time=   2.0s
[CV 4/5] END alpha=0.01, hidden_layer_sizes=(10,); AUC: (test=0.728) total time=   3.7s
[CV 1/5] END alpha=0.01, hidden_layer_sizes=(5, 10); AUC: (test=0.749) total time=  53.4s
[CV 1/5] END alpha=0.01, hidden_layer_sizes=(10, 5); AUC: (test=0.740) total time=  23.3s
[CV 5/5] END alpha=0.01, hidden_layer_sizes=(10, 5); AUC: (test=0.703) total time=  11.6s
[CV 2/5] END alpha=0.1, hidden_layer_sizes=(3, 3); AUC: (test=0.759) total time=   8.4s
[CV 1/5] END alpha=0.1, hidden_layer_sizes=(4, 2); AUC: (test=0.778) total time=  17.7s
[CV 5/5] END alpha=0.1, hidden_layer_sizes=(4, 2); AUC: (test=0.771) total time=  13.3s
[CV 3/5] END alpha=0.1, hidden_layer_sizes=(5, 5); AUC: (test=0.772) total time=  22.4s
[CV 4/5] END alpha=0.1, hidden_layer_sizes=(10,); AUC: (test=0.734) total time=   1.6s
[CV 5/5] END alpha=0.1, hidden_layer_sizes=(10,); AUC: (test=0.750) total time=   2.6s
[CV 4/5] END alpha=0.1, hidden_layer_sizes=(5, 10); AUC: (test=0.786) total time=  33.8s
[CV 3/5] END alpha=0.1, hidden_layer_sizes=(10, 5); AUC: (test=0.748) total time=   9.6s
[CV 1/5] END alpha=1, hidden_layer_sizes=(1,); AUC: (test=0.711) total time=   0.1s
[CV 2/5] END alpha=1, hidden_layer_sizes=(1,); AUC: (test=0.719) total time=   0.1s
[CV 3/5] END alpha=1, hidden_layer_sizes=(1,); AUC: (test=0.711) total time=   0.1s
[CV 4/5] END alpha=1, hidden_layer_sizes=(1,); AUC: (test=0.703) total time=   0.1s
[CV 5/5] END alpha=1, hidden_layer_sizes=(1,); AUC: (test=0.699) total time=   0.2s
[CV 1/5] END alpha=1, hidden_layer_sizes=(2,); AUC: (test=0.741) total time=   0.3s
[CV 2/5] END alpha=1, hidden_layer_sizes=(2,); AUC: (test=0.734) total time=   0.2s
[CV 3/5] END alpha=1, hidden_layer_sizes=(2,); AUC: (test=0.738) total time=   0.4s
[CV 4/5] END alpha=1, hidden_layer_sizes=(2,); AUC: (test=0.740) total time=   0.2s
[CV 5/5] END alpha=1, hidden_layer_sizes=(2,); AUC: (test=0.730) total time=   0.4s
[CV 1/5] END alpha=1, hidden_layer_sizes=(3,); AUC: (test=0.744) total time=   0.8s
[CV 5/5] END alpha=1, hidden_layer_sizes=(3,); AUC: (test=0.741) total time=   1.5s
[CV 4/5] END alpha=1, hidden_layer_sizes=(3, 3); AUC: (test=0.770) total time=   2.6s
[CV 1/5] END alpha=1, hidden_layer_sizes=(4, 2); AUC: (test=0.774) total time=   5.2s
[CV 1/5] END alpha=1, hidden_layer_sizes=(5, 5); AUC: (test=0.763) total time=   9.2s
[CV 3/5] END alpha=1, hidden_layer_sizes=(5,); AUC: (test=0.758) total time=   1.0s
[CV 4/5] END alpha=1, hidden_layer_sizes=(5,); AUC: (test=0.767) total time=   0.8s
[CV 1/5] END alpha=1, hidden_layer_sizes=(10,); AUC: (test=0.753) total time=   1.6s
[CV 3/5] END alpha=1, hidden_layer_sizes=(10,); AUC: (test=0.755) total time=   2.6s
[CV 4/5] END alpha=1, hidden_layer_sizes=(5, 10); AUC: (test=0.786) total time=  18.2s
[CV 1/5] END alpha=1, hidden_layer_sizes=(10, 5); AUC: (test=0.753) total time=   6.7s
[CV 5/5] END alpha=1, hidden_layer_sizes=(10, 5); AUC: (test=0.737) total time=  10.5s
[CV 2/5] END alpha=0.0001, hidden_layer_sizes=(1,); AUC: (test=0.829) total time=   0.1s
[CV 1/5] END alpha=0.0001, hidden_layer_sizes=(2,); AUC: (test=0.836) total time=   0.3s
[CV 5/5] END alpha=0.0001, hidden_layer_sizes=(3,); AUC: (test=0.851) total time=   0.4s
[CV 1/5] END alpha=0.0001, hidden_layer_sizes=(3, 3); AUC: (test=0.834) total time=   1.4s
[CV 3/5] END alpha=0.0001, hidden_layer_sizes=(4, 2); AUC: (test=0.823) total time=   1.3s
[CV 4/5] END alpha=0.0001, hidden_layer_sizes=(4, 2); AUC: (test=0.813) total time=   1.7s
[CV 4/5] END alpha=0.0001, hidden_layer_sizes=(5, 5); AUC: (test=0.795) total time=   3.8s
[CV 5/5] END alpha=0.0001, hidden_layer_sizes=(5, 5); AUC: (test=0.835) total time=  14.0s
[CV 4/5] END alpha=0.0001, hidden_layer_sizes=(5, 10); AUC: (test=0.791) total time=   7.3s
[CV 5/5] END alpha=0.0001, hidden_layer_sizes=(5, 10); AUC: (test=0.852) total time=  34.9s
[CV 4/5] END alpha=0.001, hidden_layer_sizes=(5, 5); AUC: (test=0.792) total time=  15.5s
[CV 4/5] END alpha=0.001, hidden_layer_sizes=(5, 10); AUC: (test=0.777) total time=  10.1s
[CV 3/5] END alpha=0.001, hidden_layer_sizes=(10, 5); AUC: (test=0.798) total time=   9.0s
[CV 5/5] END alpha=0.001, hidden_layer_sizes=(10, 5); AUC: (test=0.821) total time=   6.9s
[CV 1/5] END alpha=0.01, hidden_layer_sizes=(1,); AUC: (test=0.845) total time=   0.1s
[CV 2/5] END alpha=0.01, hidden_layer_sizes=(1,); AUC: (test=0.829) total time=   0.1s
[CV 4/5] END alpha=0.01, hidden_layer_sizes=(1,); AUC: (test=0.823) total time=   0.2s
[CV 2/5] END alpha=0.01, hidden_layer_sizes=(2,); AUC: (test=0.839) total time=   0.2s
[CV 4/5] END alpha=0.01, hidden_layer_sizes=(2,); AUC: (test=0.826) total time=   0.3s
[CV 1/5] END alpha=0.01, hidden_layer_sizes=(3,); AUC: (test=0.844) total time=   0.4s
[CV 3/5] END alpha=0.01, hidden_layer_sizes=(3,); AUC: (test=0.832) total time=   0.2s
[CV 5/5] END alpha=0.01, hidden_layer_sizes=(3,); AUC: (test=0.851) total time=   0.5s
[CV 2/5] END alpha=0.01, hidden_layer_sizes=(3, 3); AUC: (test=0.840) total time=   2.0s
[CV 3/5] END alpha=0.01, hidden_layer_sizes=(3, 3); AUC: (test=0.843) total time=   2.2s
[CV 5/5] END alpha=0.01, hidden_layer_sizes=(3, 3); AUC: (test=0.852) total time=   5.7s
[CV 4/5] END alpha=0.01, hidden_layer_sizes=(5, 5); AUC: (test=0.797) total time=   4.0s
[CV 5/5] END alpha=0.01, hidden_layer_sizes=(5,); AUC: (test=0.853) total time=   0.3s
[CV 2/5] END alpha=0.01, hidden_layer_sizes=(10,); AUC: (test=0.819) total time=   1.2s
[CV 4/5] END alpha=0.01, hidden_layer_sizes=(10,); AUC: (test=0.798) total time=   1.4s
[CV 2/5] END alpha=0.01, hidden_layer_sizes=(5, 10); AUC: (test=0.813) total time=  13.6s
[CV 1/5] END alpha=0.01, hidden_layer_sizes=(10, 5); AUC: (test=0.824) total time=   9.6s
[CV 3/5] END alpha=0.01, hidden_layer_sizes=(10, 5); AUC: (test=0.790) total time=   5.1s
[CV 1/5] END alpha=0.1, hidden_layer_sizes=(1,); AUC: (test=0.845) total time=   0.1s
[CV 2/5] END alpha=0.1, hidden_layer_sizes=(1,); AUC: (test=0.829) total time=   0.1s
[CV 3/5] END alpha=0.1, hidden_layer_sizes=(1,); AUC: (test=0.826) total time=   0.1s
[CV 4/5] END alpha=0.1, hidden_layer_sizes=(1,); AUC: (test=0.823) total time=   0.1s
[CV 5/5] END alpha=0.1, hidden_layer_sizes=(1,); AUC: (test=0.849) total time=   0.1s
[CV 1/5] END alpha=0.1, hidden_layer_sizes=(2,); AUC: (test=0.834) total time=   0.3s
[CV 2/5] END alpha=0.1, hidden_layer_sizes=(2,); AUC: (test=0.839) total time=   0.2s
[CV 3/5] END alpha=0.1, hidden_layer_sizes=(2,); AUC: (test=0.835) total time=   0.2s
[CV 4/5] END alpha=0.1, hidden_layer_sizes=(2,); AUC: (test=0.826) total time=   0.4s
[CV 5/5] END alpha=0.1, hidden_layer_sizes=(2,); AUC: (test=0.853) total time=   0.3s
[CV 1/5] END alpha=0.1, hidden_layer_sizes=(3,); AUC: (test=0.845) total time=   0.3s
[CV 2/5] END alpha=0.1, hidden_layer_sizes=(3,); AUC: (test=0.836) total time=   0.3s
[CV 3/5] END alpha=0.1, hidden_layer_sizes=(3,); AUC: (test=0.833) total time=   0.2s
[CV 4/5] END alpha=0.1, hidden_layer_sizes=(3,); AUC: (test=0.823) total time=   0.5s
[CV 5/5] END alpha=0.1, hidden_layer_sizes=(3,); AUC: (test=0.851) total time=   0.3s
[CV 1/5] END alpha=0.1, hidden_layer_sizes=(3, 3); AUC: (test=0.840) total time=   1.6s
[CV 3/5] END alpha=0.1, hidden_layer_sizes=(3, 3); AUC: (test=0.846) total time=   1.3s
[CV 5/5] END alpha=0.1, hidden_layer_sizes=(3, 3); AUC: (test=0.844) total time=   1.8s
[CV 3/5] END alpha=0.1, hidden_layer_sizes=(4, 2); AUC: (test=0.832) total time=   0.4s
[CV 5/5] END alpha=0.1, hidden_layer_sizes=(4, 2); AUC: (test=0.856) total time=   1.3s
[CV 2/5] END alpha=0.1, hidden_layer_sizes=(5, 5); AUC: (test=0.827) total time=   1.9s
[CV 3/5] END alpha=0.1, hidden_layer_sizes=(5, 5); AUC: (test=0.842) total time=   2.3s
[CV 1/5] END alpha=0.1, hidden_layer_sizes=(5,); AUC: (test=0.850) total time=   0.5s
[CV 2/5] END alpha=0.1, hidden_layer_sizes=(5,); AUC: (test=0.835) total time=   0.5s
[CV 3/5] END alpha=0.1, hidden_layer_sizes=(5,); AUC: (test=0.835) total time=   1.4s
[CV 4/5] END alpha=0.1, hidden_layer_sizes=(5,); AUC: (test=0.827) total time=   0.6s
[CV 5/5] END alpha=0.1, hidden_layer_sizes=(5,); AUC: (test=0.852) total time=   0.5s
[CV 1/5] END alpha=0.1, hidden_layer_sizes=(10,); AUC: (test=0.856) total time=   1.6s
[CV 3/5] END alpha=0.1, hidden_layer_sizes=(10,); AUC: (test=0.823) total time=   1.5s
[CV 5/5] END alpha=0.1, hidden_layer_sizes=(10,); AUC: (test=0.864) total time=   2.5s
[CV 2/5] END alpha=0.1, hidden_layer_sizes=(5, 10); AUC: (test=0.840) total time=  21.7s
[CV 3/5] END alpha=1, hidden_layer_sizes=(2,); AUC: (test=0.836) total time=   0.2s
[CV 5/5] END alpha=1, hidden_layer_sizes=(2,); AUC: (test=0.855) total time=   0.3s
[CV 2/5] END alpha=1, hidden_layer_sizes=(3,); AUC: (test=0.835) total time=   0.3s
[CV 3/5] END alpha=1, hidden_layer_sizes=(3,); AUC: (test=0.837) total time=   0.5s
[CV 5/5] END alpha=1, hidden_layer_sizes=(3,); AUC: (test=0.854) total time=   0.8s
[CV 2/5] END alpha=1, hidden_layer_sizes=(3, 3); AUC: (test=0.854) total time=   2.3s
[CV 2/5] END alpha=1, hidden_layer_sizes=(4, 2); AUC: (test=0.846) total time=   2.4s
[CV 4/5] END alpha=1, hidden_layer_sizes=(5, 5); AUC: (test=0.825) total time=   2.3s
[CV 1/5] END alpha=1, hidden_layer_sizes=(5,); AUC: (test=0.853) total time=   0.5s
[CV 3/5] END alpha=1, hidden_layer_sizes=(5,); AUC: (test=0.848) total time=   0.6s
[CV 1/5] END alpha=1, hidden_layer_sizes=(10,); AUC: (test=0.847) total time=   1.2s
[CV 4/5] END alpha=1, hidden_layer_sizes=(10,); AUC: (test=0.825) total time=   1.6s
[CV 1/5] END alpha=1, hidden_layer_sizes=(5, 10); AUC: (test=0.825) total time=   3.9s
[CV 1/5] END alpha=1, hidden_layer_sizes=(10, 5); AUC: (test=0.839) total time=   4.1s
[CV 5/5] END alpha=1, hidden_layer_sizes=(10, 5); AUC: (test=0.818) total time=   2.6s
[CV 3/5] END max_features=None, n_estimators=10; AUC: (test=0.719) total time=   0.5s
[CV 1/5] END max_features=None, n_estimators=50; AUC: (test=0.749) total time=   3.0s
[CV 1/5] END max_features=None, n_estimators=100; AUC: (test=0.757) total time=   6.6s
[CV 5/5] END max_features=None, n_estimators=100; AUC: (test=0.752) total time=   6.5s
[CV 4/5] END max_features=None, n_estimators=200; AUC: (test=0.766) total time=  14.8s
[CV 3/5] END max_features=None, n_estimators=500; AUC: (test=0.765) total time=  31.9s
[CV 2/5] END max_features=None, n_estimators=1000; AUC: (test=0.763) total time= 1.1min
[CV 1/5] END max_features=auto, n_estimators=10; AUC: (test=nan) total time=   0.0s
[CV 2/5] END max_features=auto, n_estimators=10; AUC: (test=nan) total time=   0.0s
[CV 3/5] END max_features=auto, n_estimators=10; AUC: (test=nan) total time=   0.0s
[CV 4/5] END max_features=auto, n_estimators=10; AUC: (test=nan) total time=   0.0s
[CV 5/5] END max_features=auto, n_estimators=10; AUC: (test=nan) total time=   0.0s
[CV 1/5] END max_features=auto, n_estimators=50; AUC: (test=nan) total time=   0.0s
[CV 2/5] END max_features=auto, n_estimators=50; AUC: (test=nan) total time=   0.0s
[CV 3/5] END max_features=auto, n_estimators=50; AUC: (test=nan) total time=   0.0s
[CV 4/5] END max_features=auto, n_estimators=50; AUC: (test=nan) total time=   0.0s
[CV 5/5] END max_features=auto, n_estimators=50; AUC: (test=nan) total time=   0.0s
[CV 1/5] END max_features=auto, n_estimators=100; AUC: (test=nan) total time=   0.0s
[CV 2/5] END max_features=auto, n_estimators=100; AUC: (test=nan) total time=   0.0s
[CV 3/5] END max_features=auto, n_estimators=100; AUC: (test=nan) total time=   0.0s
[CV 4/5] END max_features=auto, n_estimators=100; AUC: (test=nan) total time=   0.0s
[CV 5/5] END max_features=auto, n_estimators=100; AUC: (test=nan) total time=   0.0s
[CV 1/5] END max_features=auto, n_estimators=200; AUC: (test=nan) total time=   0.0s
[CV 2/5] END max_features=auto, n_estimators=200; AUC: (test=nan) total time=   0.0s
[CV 3/5] END max_features=auto, n_estimators=200; AUC: (test=nan) total time=   0.0s
[CV 4/5] END max_features=auto, n_estimators=200; AUC: (test=nan) total time=   0.0s
[CV 5/5] END max_features=auto, n_estimators=200; AUC: (test=nan) total time=   0.0s
[CV 1/5] END max_features=auto, n_estimators=500; AUC: (test=nan) total time=   0.0s
[CV 2/5] END max_features=auto, n_estimators=500; AUC: (test=nan) total time=   0.0s
[CV 3/5] END max_features=auto, n_estimators=500; AUC: (test=nan) total time=   0.0s
[CV 4/5] END max_features=auto, n_estimators=500; AUC: (test=nan) total time=   0.0s
[CV 5/5] END max_features=auto, n_estimators=500; AUC: (test=nan) total time=   0.0s
[CV 1/5] END max_features=auto, n_estimators=1000; AUC: (test=nan) total time=   0.0s
[CV 2/5] END max_features=auto, n_estimators=1000; AUC: (test=nan) total time=   0.0s
[CV 3/5] END max_features=auto, n_estimators=1000; AUC: (test=nan) total time=   0.0s
[CV 4/5] END max_features=auto, n_estimators=1000; AUC: (test=nan) total time=   0.0s
[CV 5/5] END max_features=auto, n_estimators=1000; AUC: (test=nan) total time=   0.0s
[CV 1/5] END max_features=sqrt, n_estimators=10; AUC: (test=0.721) total time=   0.2s
[CV 2/5] END max_features=sqrt, n_estimators=10; AUC: (test=0.714) total time=   0.2s
[CV 3/5] END max_features=sqrt, n_estimators=10; AUC: (test=0.711) total time=   0.2s
[CV 4/5] END max_features=sqrt, n_estimators=10; AUC: (test=0.697) total time=   0.2s
[CV 5/5] END max_features=sqrt, n_estimators=10; AUC: (test=0.703) total time=   0.2s
[CV 1/5] END max_features=sqrt, n_estimators=50; AUC: (test=0.770) total time=   0.9s
[CV 2/5] END max_features=sqrt, n_estimators=50; AUC: (test=0.755) total time=   0.9s
[CV 3/5] END max_features=sqrt, n_estimators=50; AUC: (test=0.772) total time=   0.9s
[CV 4/5] END max_features=sqrt, n_estimators=50; AUC: (test=0.757) total time=   0.9s
[CV 5/5] END max_features=sqrt, n_estimators=50; AUC: (test=0.750) total time=   0.9s
[CV 3/5] END max_features=sqrt, n_estimators=100; AUC: (test=0.780) total time=   1.7s
[CV 4/5] END max_features=sqrt, n_estimators=100; AUC: (test=0.769) total time=   1.7s
[CV 2/5] END max_features=sqrt, n_estimators=200; AUC: (test=0.774) total time=   3.4s
[CV 3/5] END max_features=sqrt, n_estimators=200; AUC: (test=0.782) total time=   3.3s
[CV 1/5] END max_features=sqrt, n_estimators=500; AUC: (test=0.780) total time=   8.5s
[CV 4/5] END max_features=sqrt, n_estimators=500; AUC: (test=0.778) total time=   8.6s
[CV 2/5] END max_features=sqrt, n_estimators=1000; AUC: (test=0.776) total time=  18.1s
[CV 5/5] END max_features=sqrt, n_estimators=1000; AUC: (test=0.766) total time=  17.5s
[CV 1/5] END max_features=log2, n_estimators=500; AUC: (test=0.780) total time=   9.4s
[CV 5/5] END max_features=log2, n_estimators=500; AUC: (test=0.765) total time=   8.9s
[CV 4/5] END max_features=log2, n_estimators=1000; AUC: (test=0.779) total time=  17.3s
[CV 2/5] END max_features=0.25, n_estimators=200; AUC: (test=0.774) total time=   3.8s
[CV 4/5] END alpha=0.0001, hidden_layer_sizes=(1,); AUC: (test=0.703) total time=   0.2s
[CV 4/5] END alpha=0.0001, hidden_layer_sizes=(2,); AUC: (test=0.758) total time=   2.7s
[CV 2/5] END alpha=0.0001, hidden_layer_sizes=(3,); AUC: (test=0.731) total time=   0.4s
[CV 3/5] END alpha=0.0001, hidden_layer_sizes=(3,); AUC: (test=0.761) total time=   1.0s
[CV 5/5] END alpha=0.0001, hidden_layer_sizes=(3,); AUC: (test=0.750) total time=   3.3s
[CV 3/5] END alpha=0.0001, hidden_layer_sizes=(3, 3); AUC: (test=0.760) total time=  17.8s
[CV 3/5] END alpha=0.0001, hidden_layer_sizes=(4, 2); AUC: (test=0.780) total time=  16.6s
[CV 2/5] END alpha=0.0001, hidden_layer_sizes=(5, 5); AUC: (test=0.775) total time=  26.1s
[CV 1/5] END alpha=0.0001, hidden_layer_sizes=(5,); AUC: (test=0.761) total time=   2.1s
[CV 2/5] END alpha=0.0001, hidden_layer_sizes=(5,); AUC: (test=0.767) total time=   1.0s
[CV 3/5] END alpha=0.0001, hidden_layer_sizes=(5,); AUC: (test=0.773) total time=   1.6s
[CV 5/5] END alpha=0.0001, hidden_layer_sizes=(5,); AUC: (test=0.770) total time=   1.6s
[CV 3/5] END alpha=0.0001, hidden_layer_sizes=(10,); AUC: (test=0.761) total time=   3.0s
[CV 1/5] END alpha=0.0001, hidden_layer_sizes=(5, 10); AUC: (test=0.764) total time=  37.7s
[CV 5/5] END alpha=0.0001, hidden_layer_sizes=(5, 10); AUC: (test=0.746) total time=  55.1s
[CV 3/5] END alpha=0.0001, hidden_layer_sizes=(10, 5); AUC: (test=0.738) total time=  26.8s
[CV 5/5] END alpha=0.0001, hidden_layer_sizes=(10, 5); AUC: (test=0.706) total time=   8.9s
[CV 1/5] END alpha=0.001, hidden_layer_sizes=(1,); AUC: (test=0.711) total time=   0.1s
[CV 2/5] END alpha=0.001, hidden_layer_sizes=(1,); AUC: (test=0.719) total time=   0.1s
[CV 3/5] END alpha=0.001, hidden_layer_sizes=(1,); AUC: (test=0.711) total time=   0.1s
[CV 4/5] END alpha=0.001, hidden_layer_sizes=(1,); AUC: (test=0.703) total time=   0.1s
[CV 1/5] END alpha=0.001, hidden_layer_sizes=(2,); AUC: (test=0.756) total time=   1.8s
[CV 4/5] END alpha=0.001, hidden_layer_sizes=(2,); AUC: (test=0.756) total time=   1.4s
[CV 5/5] END alpha=0.001, hidden_layer_sizes=(2,); AUC: (test=0.753) total time=   1.8s
[CV 2/5] END alpha=0.001, hidden_layer_sizes=(3,); AUC: (test=0.730) total time=   0.5s
[CV 3/5] END alpha=0.001, hidden_layer_sizes=(3,); AUC: (test=0.761) total time=   0.7s
[CV 4/5] END alpha=0.001, hidden_layer_sizes=(3,); AUC: (test=0.718) total time=   1.7s
[CV 5/5] END alpha=0.001, hidden_layer_sizes=(3,); AUC: (test=0.750) total time=   4.3s
[CV 3/5] END alpha=0.001, hidden_layer_sizes=(3, 3); AUC: (test=0.760) total time=  25.1s
[CV 3/5] END alpha=0.001, hidden_layer_sizes=(4, 2); AUC: (test=0.781) total time=  19.3s
[CV 2/5] END alpha=0.001, hidden_layer_sizes=(5, 5); AUC: (test=0.772) total time=  32.8s
[CV 5/5] END alpha=0.001, hidden_layer_sizes=(5,); AUC: (test=0.773) total time=   1.5s
[CV 1/5] END alpha=0.001, hidden_layer_sizes=(10,); AUC: (test=0.758) total time=   3.4s
[CV 4/5] END alpha=0.001, hidden_layer_sizes=(10,); AUC: (test=0.720) total time=   2.8s
[CV 2/5] END alpha=0.001, hidden_layer_sizes=(5, 10); AUC: (test=0.727) total time= 1.3min
[CV 3/5] END alpha=0.001, hidden_layer_sizes=(10, 5); AUC: (test=0.742) total time=  13.2s
[CV 4/5] END alpha=0.001, hidden_layer_sizes=(10, 5); AUC: (test=0.739) total time=  13.2s
[CV 1/5] END alpha=0.01, hidden_layer_sizes=(1,); AUC: (test=0.711) total time=   0.1s
[CV 2/5] END alpha=0.01, hidden_layer_sizes=(1,); AUC: (test=0.719) total time=   0.1s
[CV 3/5] END alpha=0.01, hidden_layer_sizes=(1,); AUC: (test=0.712) total time=   0.1s
[CV 4/5] END alpha=0.01, hidden_layer_sizes=(1,); AUC: (test=0.703) total time=   0.1s
[CV 5/5] END alpha=0.01, hidden_layer_sizes=(1,); AUC: (test=0.699) total time=   0.1s
[CV 1/5] END alpha=0.01, hidden_layer_sizes=(2,); AUC: (test=0.756) total time=   1.0s
[CV 2/5] END alpha=0.01, hidden_layer_sizes=(2,); AUC: (test=0.737) total time=   0.5s
[CV 4/5] END alpha=0.01, hidden_layer_sizes=(2,); AUC: (test=0.760) total time=   1.7s
[CV 2/5] END alpha=0.01, hidden_layer_sizes=(3,); AUC: (test=0.731) total time=   0.6s
[CV 3/5] END alpha=0.01, hidden_layer_sizes=(3,); AUC: (test=0.758) total time=   0.6s
[CV 4/5] END alpha=0.01, hidden_layer_sizes=(3,); AUC: (test=0.719) total time=   0.8s
[CV 5/5] END alpha=0.01, hidden_layer_sizes=(3,); AUC: (test=0.751) total time=   7.5s
[CV 3/5] END alpha=0.01, hidden_layer_sizes=(3, 3); AUC: (test=0.760) total time=  10.9s
[CV 2/5] END alpha=0.01, hidden_layer_sizes=(4, 2); AUC: (test=0.775) total time=   7.8s
[CV 4/5] END alpha=0.01, hidden_layer_sizes=(4, 2); AUC: (test=0.787) total time=  20.3s
[CV 4/5] END alpha=0.01, hidden_layer_sizes=(5, 5); AUC: (test=0.781) total time=  32.8s
[CV 3/5] END alpha=0.01, hidden_layer_sizes=(5, 10); AUC: (test=0.753) total time=  40.0s
[CV 5/5] END alpha=0.01, hidden_layer_sizes=(5, 10); AUC: (test=0.746) total time=  35.2s
[CV 1/5] END alpha=0.1, hidden_layer_sizes=(3,); AUC: (test=0.755) total time=   2.9s
[CV 4/5] END alpha=0.1, hidden_layer_sizes=(3,); AUC: (test=0.724) total time=   0.4s
[CV 5/5] END alpha=0.1, hidden_layer_sizes=(3,); AUC: (test=0.744) total time=   3.4s
[CV 4/5] END alpha=0.1, hidden_layer_sizes=(3, 3); AUC: (test=0.774) total time=  10.2s
[CV 2/5] END alpha=0.1, hidden_layer_sizes=(4, 2); AUC: (test=0.771) total time=   8.9s
[CV 3/5] END alpha=0.1, hidden_layer_sizes=(4, 2); AUC: (test=0.788) total time=  20.2s
[CV 4/5] END alpha=0.1, hidden_layer_sizes=(5, 5); AUC: (test=0.778) total time=  20.7s
[CV 3/5] END alpha=0.1, hidden_layer_sizes=(10,); AUC: (test=0.763) total time=   2.6s
[CV 3/5] END alpha=0.1, hidden_layer_sizes=(5, 10); AUC: (test=0.756) total time=  22.9s
[CV 1/5] END alpha=0.1, hidden_layer_sizes=(10, 5); AUC: (test=0.748) total time=  20.1s
[CV 5/5] END alpha=0.1, hidden_layer_sizes=(10, 5); AUC: (test=0.716) total time=   4.5s
[CV 3/5] END alpha=1, hidden_layer_sizes=(3,); AUC: (test=0.739) total time=   0.6s
[CV 1/5] END alpha=1, hidden_layer_sizes=(3, 3); AUC: (test=0.773) total time=   2.9s
[CV 5/5] END alpha=1, hidden_layer_sizes=(3, 3); AUC: (test=0.755) total time=   4.4s
[CV 4/5] END alpha=1, hidden_layer_sizes=(4, 2); AUC: (test=0.775) total time=   3.5s
[CV 3/5] END alpha=1, hidden_layer_sizes=(5, 5); AUC: (test=0.773) total time=  11.0s
[CV 4/5] END alpha=1, hidden_layer_sizes=(10,); AUC: (test=0.749) total time=   2.2s
[CV 3/5] END alpha=1, hidden_layer_sizes=(5, 10); AUC: (test=0.768) total time=  18.0s
[CV 5/5] END alpha=1, hidden_layer_sizes=(5, 10); AUC: (test=0.762) total time=  14.5s
[CV 4/5] END alpha=0.0001, hidden_layer_sizes=(1,); AUC: (test=0.823) total time=   0.1s
[CV 2/5] END alpha=0.0001, hidden_layer_sizes=(2,); AUC: (test=0.839) total time=   0.2s
[CV 3/5] END alpha=0.0001, hidden_layer_sizes=(3,); AUC: (test=0.832) total time=   0.2s
[CV 4/5] END alpha=0.0001, hidden_layer_sizes=(3,); AUC: (test=0.824) total time=   0.2s
[CV 2/5] END alpha=0.0001, hidden_layer_sizes=(3, 3); AUC: (test=0.839) total time=   2.3s
[CV 3/5] END alpha=0.0001, hidden_layer_sizes=(3, 3); AUC: (test=0.847) total time=   0.7s
[CV 5/5] END alpha=0.0001, hidden_layer_sizes=(4, 2); AUC: (test=0.872) total time=   4.0s
[CV 1/5] END alpha=0.0001, hidden_layer_sizes=(5, 5); AUC: (test=0.812) total time=   4.6s
[CV 5/5] END alpha=0.0001, hidden_layer_sizes=(5,); AUC: (test=0.845) total time=   0.5s
[CV 1/5] END alpha=0.0001, hidden_layer_sizes=(10,); AUC: (test=0.861) total time=   1.4s
[CV 4/5] END alpha=0.0001, hidden_layer_sizes=(10,); AUC: (test=0.794) total time=   1.7s
[CV 5/5] END alpha=0.0001, hidden_layer_sizes=(10,); AUC: (test=0.871) total time=   2.1s
[CV 3/5] END alpha=0.0001, hidden_layer_sizes=(5, 10); AUC: (test=0.802) total time=  12.5s
[CV 1/5] END alpha=0.0001, hidden_layer_sizes=(10, 5); AUC: (test=0.797) total time=  15.1s
[CV 4/5] END alpha=0.0001, hidden_layer_sizes=(10, 5); AUC: (test=0.771) total time=  13.1s
[CV 2/5] END alpha=0.001, hidden_layer_sizes=(4, 2); AUC: (test=0.817) total time=   0.9s
[CV 4/5] END alpha=0.001, hidden_layer_sizes=(4, 2); AUC: (test=0.806) total time=   2.9s
[CV 2/5] END alpha=0.001, hidden_layer_sizes=(5, 5); AUC: (test=0.815) total time=   7.2s
[CV 5/5] END alpha=0.001, hidden_layer_sizes=(5,); AUC: (test=0.848) total time=   0.5s
[CV 2/5] END alpha=0.001, hidden_layer_sizes=(10,); AUC: (test=0.815) total time=   1.6s
[CV 5/5] END alpha=0.001, hidden_layer_sizes=(10,); AUC: (test=0.867) total time=   2.2s
[CV 3/5] END alpha=0.001, hidden_layer_sizes=(5, 10); AUC: (test=0.807) total time=  12.5s
[CV 1/5] END alpha=0.001, hidden_layer_sizes=(10, 5); AUC: (test=0.797) total time=   7.7s
[CV 4/5] END alpha=0.001, hidden_layer_sizes=(10, 5); AUC: (test=0.792) total time=  24.4s
[CV 3/5] END alpha=0.01, hidden_layer_sizes=(5, 5); AUC: (test=0.817) total time=   3.0s
[CV 1/5] END alpha=0.01, hidden_layer_sizes=(5,); AUC: (test=0.847) total time=   0.4s
[CV 2/5] END alpha=0.01, hidden_layer_sizes=(5,); AUC: (test=0.833) total time=   0.5s
[CV 4/5] END alpha=0.01, hidden_layer_sizes=(5,); AUC: (test=0.817) total time=   0.6s
[CV 3/5] END alpha=0.01, hidden_layer_sizes=(10,); AUC: (test=0.801) total time=   1.7s
[CV 1/5] END alpha=0.01, hidden_layer_sizes=(5, 10); AUC: (test=0.795) total time=  11.8s
[CV 5/5] END alpha=0.01, hidden_layer_sizes=(5, 10); AUC: (test=0.847) total time=  44.3s
[CV 3/5] END alpha=0.1, hidden_layer_sizes=(5, 10); AUC: (test=0.801) total time=   4.0s
[CV 5/5] END alpha=0.1, hidden_layer_sizes=(5, 10); AUC: (test=0.838) total time=   8.6s
[CV 3/5] END alpha=0.1, hidden_layer_sizes=(10, 5); AUC: (test=0.800) total time=   4.2s
[CV 5/5] END alpha=0.1, hidden_layer_sizes=(10, 5); AUC: (test=0.804) total time=   4.4s
[CV 3/5] END alpha=1, hidden_layer_sizes=(3, 3); AUC: (test=0.849) total time=   1.5s
[CV 1/5] END alpha=1, hidden_layer_sizes=(4, 2); AUC: (test=0.833) total time=   1.6s
[CV 5/5] END alpha=1, hidden_layer_sizes=(4, 2); AUC: (test=0.849) total time=   0.8s
[CV 3/5] END alpha=1, hidden_layer_sizes=(5, 5); AUC: (test=0.846) total time=   3.1s
[CV 5/5] END alpha=1, hidden_layer_sizes=(5,); AUC: (test=0.852) total time=   1.0s
[CV 3/5] END alpha=1, hidden_layer_sizes=(10,); AUC: (test=0.833) total time=   2.8s
[CV 3/5] END alpha=1, hidden_layer_sizes=(5, 10); AUC: (test=0.835) total time=   4.2s
[CV 2/5] END alpha=1, hidden_layer_sizes=(10, 5); AUC: (test=0.813) total time=   2.6s
[CV 4/5] END alpha=1, hidden_layer_sizes=(10, 5); AUC: (test=0.777) total time=   5.0s
[CV 4/5] END max_features=None, n_estimators=10; AUC: (test=0.712) total time=   0.5s
[CV 3/5] END max_features=None, n_estimators=50; AUC: (test=0.751) total time=   3.0s
[CV 5/5] END max_features=None, n_estimators=50; AUC: (test=0.741) total time=   3.1s
[CV 4/5] END max_features=None, n_estimators=100; AUC: (test=0.765) total time=   7.2s
[CV 3/5] END max_features=None, n_estimators=200; AUC: (test=0.760) total time=  14.0s
[CV 2/5] END max_features=None, n_estimators=500; AUC: (test=0.761) total time=  32.1s
[CV 1/5] END max_features=None, n_estimators=1000; AUC: (test=0.763) total time= 1.1min
[CV 5/5] END max_features=None, n_estimators=1000; AUC: (test=0.751) total time=  60.0s
[CV 2/5] END max_features=log2, n_estimators=50; AUC: (test=0.755) total time=   0.9s
[CV 4/5] END max_features=log2, n_estimators=50; AUC: (test=0.757) total time=   0.9s
[CV 2/5] END max_features=log2, n_estimators=100; AUC: (test=0.766) total time=   1.7s
[CV 4/5] END max_features=log2, n_estimators=100; AUC: (test=0.769) total time=   1.8s
[CV 1/5] END max_features=log2, n_estimators=200; AUC: (test=0.779) total time=   3.8s
[CV 3/5] END max_features=log2, n_estimators=200; AUC: (test=0.782) total time=   3.7s
[CV 2/5] END max_features=log2, n_estimators=500; AUC: (test=0.776) total time=   9.1s
[CV 1/5] END max_features=log2, n_estimators=1000; AUC: (test=0.780) total time=  17.0s
[CV 5/5] END max_features=log2, n_estimators=1000; AUC: (test=0.766) total time=  17.0s
[CV 2/5] END alpha=0.0001, hidden_layer_sizes=(1,); AUC: (test=0.719) total time=   0.2s
[CV 5/5] END alpha=0.0001, hidden_layer_sizes=(1,); AUC: (test=0.699) total time=   0.1s
[CV 3/5] END alpha=0.0001, hidden_layer_sizes=(2,); AUC: (test=0.753) total time=   3.2s
[CV 4/5] END alpha=0.0001, hidden_layer_sizes=(3,); AUC: (test=0.718) total time=   1.7s
[CV 2/5] END alpha=0.0001, hidden_layer_sizes=(3, 3); AUC: (test=0.759) total time=   8.8s
[CV 1/5] END alpha=0.0001, hidden_layer_sizes=(4, 2); AUC: (test=0.775) total time=  18.4s
[CV 5/5] END alpha=0.0001, hidden_layer_sizes=(4, 2); AUC: (test=0.772) total time=  14.6s
[CV 3/5] END alpha=0.0001, hidden_layer_sizes=(5, 5); AUC: (test=0.769) total time=  27.4s
[CV 2/5] END alpha=0.0001, hidden_layer_sizes=(10,); AUC: (test=0.754) total time=   2.6s
[CV 5/5] END alpha=0.0001, hidden_layer_sizes=(10,); AUC: (test=0.756) total time=   3.3s
[CV 4/5] END alpha=0.0001, hidden_layer_sizes=(5, 10); AUC: (test=0.778) total time=  45.9s
[CV 1/5] END alpha=0.0001, hidden_layer_sizes=(10, 5); AUC: (test=0.739) total time= 1.5min
[CV 2/5] END alpha=0.001, hidden_layer_sizes=(3, 3); AUC: (test=0.759) total time=   5.6s
[CV 1/5] END alpha=0.001, hidden_layer_sizes=(4, 2); AUC: (test=0.776) total time=  26.5s
[CV 4/5] END alpha=0.001, hidden_layer_sizes=(4, 2); AUC: (test=0.787) total time=  17.4s
[CV 3/5] END alpha=0.001, hidden_layer_sizes=(5, 5); AUC: (test=0.761) total time=  44.8s
[CV 4/5] END alpha=0.001, hidden_layer_sizes=(5, 10); AUC: (test=0.780) total time= 1.8min
[CV 2/5] END alpha=0.01, hidden_layer_sizes=(3, 3); AUC: (test=0.762) total time=   5.6s
[CV 5/5] END alpha=0.01, hidden_layer_sizes=(3, 3); AUC: (test=0.758) total time=  18.7s
[CV 1/5] END alpha=0.01, hidden_layer_sizes=(5, 5); AUC: (test=0.766) total time=  24.2s
[CV 5/5] END alpha=0.01, hidden_layer_sizes=(5, 5); AUC: (test=0.774) total time=  13.5s
[CV 3/5] END alpha=0.01, hidden_layer_sizes=(10,); AUC: (test=0.763) total time=   3.9s
[CV 5/5] END alpha=0.01, hidden_layer_sizes=(10,); AUC: (test=0.748) total time=   2.6s
[CV 2/5] END alpha=0.01, hidden_layer_sizes=(5, 10); AUC: (test=0.739) total time=  59.8s
[CV 3/5] END alpha=0.01, hidden_layer_sizes=(10, 5); AUC: (test=0.731) total time=  13.9s
[CV 4/5] END alpha=0.01, hidden_layer_sizes=(10, 5); AUC: (test=0.724) total time=  13.5s
[CV 3/5] END alpha=0.1, hidden_layer_sizes=(3, 3); AUC: (test=0.759) total time=  32.2s
[CV 2/5] END alpha=0.1, hidden_layer_sizes=(5, 5); AUC: (test=0.770) total time=  20.5s
[CV 1/5] END alpha=0.1, hidden_layer_sizes=(5,); AUC: (test=0.768) total time=   2.2s
[CV 2/5] END alpha=0.1, hidden_layer_sizes=(5,); AUC: (test=0.765) total time=   1.4s
[CV 3/5] END alpha=0.1, hidden_layer_sizes=(5,); AUC: (test=0.768) total time=   2.2s
[CV 4/5] END alpha=0.1, hidden_layer_sizes=(5,); AUC: (test=0.783) total time=   2.2s
[CV 2/5] END alpha=0.1, hidden_layer_sizes=(10,); AUC: (test=0.740) total time=   2.3s
[CV 2/5] END alpha=0.1, hidden_layer_sizes=(5, 10); AUC: (test=0.746) total time=  33.6s
[CV 2/5] END alpha=0.1, hidden_layer_sizes=(10, 5); AUC: (test=0.723) total time=   9.8s
[CV 4/5] END alpha=0.1, hidden_layer_sizes=(10, 5); AUC: (test=0.733) total time=   6.5s
[CV 3/5] END alpha=1, hidden_layer_sizes=(3, 3); AUC: (test=0.755) total time=   3.8s
[CV 2/5] END alpha=1, hidden_layer_sizes=(4, 2); AUC: (test=0.758) total time=   3.2s
[CV 5/5] END alpha=1, hidden_layer_sizes=(4, 2); AUC: (test=0.771) total time=   2.8s
[CV 4/5] END alpha=1, hidden_layer_sizes=(5, 5); AUC: (test=0.780) total time=   6.1s
[CV 1/5] END alpha=1, hidden_layer_sizes=(5,); AUC: (test=0.767) total time=   1.0s
[CV 2/5] END alpha=1, hidden_layer_sizes=(5,); AUC: (test=0.758) total time=   1.6s
[CV 5/5] END alpha=1, hidden_layer_sizes=(5,); AUC: (test=0.748) total time=   0.7s
[CV 2/5] END alpha=1, hidden_layer_sizes=(10,); AUC: (test=0.741) total time=   1.6s
[CV 5/5] END alpha=1, hidden_layer_sizes=(10,); AUC: (test=0.739) total time=   1.9s
[CV 2/5] END alpha=1, hidden_layer_sizes=(5, 10); AUC: (test=0.769) total time=  24.0s
[CV 3/5] END alpha=1, hidden_layer_sizes=(10, 5); AUC: (test=0.749) total time=   4.0s
[CV 1/5] END alpha=0.0001, hidden_layer_sizes=(1,); AUC: (test=0.845) total time=   0.1s
[CV 5/5] END alpha=0.0001, hidden_layer_sizes=(1,); AUC: (test=0.849) total time=   0.2s
[CV 4/5] END alpha=0.0001, hidden_layer_sizes=(2,); AUC: (test=0.826) total time=   0.2s
[CV 5/5] END alpha=0.0001, hidden_layer_sizes=(2,); AUC: (test=0.854) total time=   0.4s
[CV 4/5] END alpha=0.0001, hidden_layer_sizes=(3, 3); AUC: (test=0.811) total time=   3.8s
[CV 5/5] END alpha=0.0001, hidden_layer_sizes=(3, 3); AUC: (test=0.852) total time=   4.0s
[CV 1/5] END alpha=0.0001, hidden_layer_sizes=(5,); AUC: (test=0.826) total time=   1.0s
[CV 2/5] END alpha=0.0001, hidden_layer_sizes=(5,); AUC: (test=0.825) total time=   1.0s
[CV 3/5] END alpha=0.0001, hidden_layer_sizes=(5,); AUC: (test=0.820) total time=   1.0s
[CV 4/5] END alpha=0.0001, hidden_layer_sizes=(5,); AUC: (test=0.815) total time=   0.9s
[CV 2/5] END alpha=0.0001, hidden_layer_sizes=(10,); AUC: (test=0.803) total time=   1.6s
[CV 3/5] END alpha=0.0001, hidden_layer_sizes=(10,); AUC: (test=0.804) total time=   1.9s
[CV 1/5] END alpha=0.0001, hidden_layer_sizes=(5, 10); AUC: (test=0.796) total time=  22.9s
[CV 3/5] END alpha=0.0001, hidden_layer_sizes=(10, 5); AUC: (test=0.788) total time=   8.3s
[CV 1/5] END alpha=0.001, hidden_layer_sizes=(1,); AUC: (test=0.845) total time=   0.1s
[CV 2/5] END alpha=0.001, hidden_layer_sizes=(1,); AUC: (test=0.829) total time=   0.1s
[CV 3/5] END alpha=0.001, hidden_layer_sizes=(1,); AUC: (test=0.826) total time=   0.1s
[CV 4/5] END alpha=0.001, hidden_layer_sizes=(1,); AUC: (test=0.823) total time=   0.1s
[CV 5/5] END alpha=0.001, hidden_layer_sizes=(1,); AUC: (test=0.849) total time=   0.1s
[CV 1/5] END alpha=0.001, hidden_layer_sizes=(2,); AUC: (test=0.842) total time=   0.2s
[CV 2/5] END alpha=0.001, hidden_layer_sizes=(2,); AUC: (test=0.838) total time=   0.1s
[CV 3/5] END alpha=0.001, hidden_layer_sizes=(2,); AUC: (test=0.836) total time=   0.1s
[CV 4/5] END alpha=0.001, hidden_layer_sizes=(2,); AUC: (test=0.826) total time=   0.1s
[CV 5/5] END alpha=0.001, hidden_layer_sizes=(2,); AUC: (test=0.853) total time=   0.5s
[CV 1/5] END alpha=0.001, hidden_layer_sizes=(3,); AUC: (test=0.844) total time=   0.3s
[CV 2/5] END alpha=0.001, hidden_layer_sizes=(3,); AUC: (test=0.836) total time=   0.3s
[CV 3/5] END alpha=0.001, hidden_layer_sizes=(3,); AUC: (test=0.832) total time=   0.3s
[CV 5/5] END alpha=0.001, hidden_layer_sizes=(3,); AUC: (test=0.851) total time=   0.4s
[CV 1/5] END alpha=0.001, hidden_layer_sizes=(3, 3); AUC: (test=0.838) total time=   3.7s
[CV 5/5] END alpha=0.001, hidden_layer_sizes=(3, 3); AUC: (test=0.852) total time=   4.9s
[CV 3/5] END alpha=0.001, hidden_layer_sizes=(4, 2); AUC: (test=0.826) total time=   1.3s
[CV 1/5] END alpha=0.001, hidden_layer_sizes=(5, 5); AUC: (test=0.812) total time=   6.8s
[CV 1/5] END alpha=0.001, hidden_layer_sizes=(5,); AUC: (test=0.836) total time=   0.6s
[CV 2/5] END alpha=0.001, hidden_layer_sizes=(5,); AUC: (test=0.831) total time=   0.7s
[CV 3/5] END alpha=0.001, hidden_layer_sizes=(5,); AUC: (test=0.819) total time=   0.9s
[CV 4/5] END alpha=0.001, hidden_layer_sizes=(5,); AUC: (test=0.816) total time=   0.9s
[CV 3/5] END alpha=0.001, hidden_layer_sizes=(10,); AUC: (test=0.804) total time=   1.6s
[CV 1/5] END alpha=0.001, hidden_layer_sizes=(5, 10); AUC: (test=0.791) total time=  13.2s
[CV 5/5] END alpha=0.001, hidden_layer_sizes=(5, 10); AUC: (test=0.840) total time=  29.3s
[CV 1/5] END alpha=0.01, hidden_layer_sizes=(4, 2); AUC: (test=0.831) total time=   1.0s
[CV 3/5] END alpha=0.01, hidden_layer_sizes=(4, 2); AUC: (test=0.826) total time=   1.1s
[CV 5/5] END alpha=0.01, hidden_layer_sizes=(4, 2); AUC: (test=0.867) total time=   1.9s
[CV 2/5] END alpha=0.01, hidden_layer_sizes=(5, 5); AUC: (test=0.809) total time=   3.8s
[CV 3/5] END alpha=0.01, hidden_layer_sizes=(5,); AUC: (test=0.826) total time=   0.6s
[CV 1/5] END alpha=0.01, hidden_layer_sizes=(10,); AUC: (test=0.856) total time=   1.3s
[CV 5/5] END alpha=0.01, hidden_layer_sizes=(10,); AUC: (test=0.860) total time=   2.4s
[CV 4/5] END alpha=0.01, hidden_layer_sizes=(5, 10); AUC: (test=0.771) total time=  13.7s
[CV 2/5] END alpha=0.01, hidden_layer_sizes=(10, 5); AUC: (test=0.792) total time=   9.0s
[CV 4/5] END alpha=0.01, hidden_layer_sizes=(10, 5); AUC: (test=0.796) total time=  43.5s
[CV 2/5] END alpha=0.1, hidden_layer_sizes=(10, 5); AUC: (test=0.803) total time=   5.2s
[CV 1/5] END alpha=1, hidden_layer_sizes=(1,); AUC: (test=0.845) total time=   0.1s
[CV 2/5] END alpha=1, hidden_layer_sizes=(1,); AUC: (test=0.830) total time=   0.1s
[CV 3/5] END alpha=1, hidden_layer_sizes=(1,); AUC: (test=0.826) total time=   0.1s
[CV 4/5] END alpha=1, hidden_layer_sizes=(1,); AUC: (test=0.823) total time=   0.1s
[CV 5/5] END alpha=1, hidden_layer_sizes=(1,); AUC: (test=0.849) total time=   0.2s
[CV 1/5] END alpha=1, hidden_layer_sizes=(2,); AUC: (test=0.842) total time=   0.4s
[CV 2/5] END alpha=1, hidden_layer_sizes=(2,); AUC: (test=0.840) total time=   0.2s
[CV 4/5] END alpha=1, hidden_layer_sizes=(2,); AUC: (test=0.828) total time=   0.5s
[CV 1/5] END alpha=1, hidden_layer_sizes=(3,); AUC: (test=0.847) total time=   0.6s
[CV 4/5] END alpha=1, hidden_layer_sizes=(3,); AUC: (test=0.823) total time=   0.5s
[CV 1/5] END alpha=1, hidden_layer_sizes=(3, 3); AUC: (test=0.858) total time=   1.8s
[CV 5/5] END alpha=1, hidden_layer_sizes=(3, 3); AUC: (test=0.857) total time=   2.2s
[CV 4/5] END alpha=1, hidden_layer_sizes=(4, 2); AUC: (test=0.816) total time=   1.2s
[CV 2/5] END alpha=1, hidden_layer_sizes=(5, 5); AUC: (test=0.839) total time=   2.1s
[CV 5/5] END alpha=1, hidden_layer_sizes=(5, 5); AUC: (test=0.841) total time=   3.1s
[CV 5/5] END alpha=1, hidden_layer_sizes=(10,); AUC: (test=0.841) total time=   1.8s
[CV 4/5] END alpha=1, hidden_layer_sizes=(5, 10); AUC: (test=0.808) total time=   5.1s
[CV 3/5] END alpha=1, hidden_layer_sizes=(10, 5); AUC: (test=0.824) total time=   4.6s
[CV 2/5] END max_features=None, n_estimators=10; AUC: (test=0.702) total time=   0.5s
[CV 5/5] END max_features=None, n_estimators=10; AUC: (test=0.684) total time=   0.5s
[CV 4/5] END max_features=None, n_estimators=50; AUC: (test=0.761) total time=   3.0s
[CV 3/5] END max_features=None, n_estimators=100; AUC: (test=0.756) total time=   6.7s
[CV 2/5] END max_features=None, n_estimators=200; AUC: (test=0.759) total time=  13.4s
[CV 1/5] END max_features=None, n_estimators=500; AUC: (test=0.763) total time=  33.7s
[CV 5/5] END max_features=None, n_estimators=500; AUC: (test=0.750) total time=  31.6s
[CV 4/5] END max_features=None, n_estimators=1000; AUC: (test=0.767) total time= 1.0min
[CV 3/5] END max_features=sqrt, n_estimators=500; AUC: (test=0.782) total time=   8.5s
[CV 1/5] END max_features=sqrt, n_estimators=1000; AUC: (test=0.780) total time=  17.7s
[CV 4/5] END max_features=sqrt, n_estimators=1000; AUC: (test=0.779) total time=  17.4s
[CV 4/5] END max_features=log2, n_estimators=200; AUC: (test=0.774) total time=   3.9s
[CV 3/5] END max_features=log2, n_estimators=500; AUC: (test=0.782) total time=   9.1s
[CV 2/5] END max_features=log2, n_estimators=1000; AUC: (test=0.776) total time=  17.4s
[CV 2/5] END max_features=0.25, n_estimators=10; AUC: (test=0.714) total time=   0.2s
[CV 4/5] END max_features=0.25, n_estimators=10; AUC: (test=0.697) total time=   0.2s
[CV 1/5] END max_features=0.25, n_estimators=50; AUC: (test=0.770) total time=   0.9s
[CV 3/5] END max_features=0.25, n_estimators=50; AUC: (test=0.772) total time=   0.9s
[CV 5/5] END max_features=0.25, n_estimators=50; AUC: (test=0.750) total time=   0.9s
[CV 2/5] END max_features=0.25, n_estimators=100; AUC: (test=0.766) total time=   1.7s
[CV 4/5] END max_features=0.25, n_estimators=100; AUC: (test=0.769) total time=   1.9s
[CV 1/5] END max_features=0.25, n_estimators=200; AUC: (test=0.779) total time=   3.8s
[CV 4/5] END max_features=0.25, n_estimators=200; AUC: (test=0.774) total time=   3.4s
[CV 2/5] END max_features=0.25, n_estimators=500; AUC: (test=0.776) total time=   8.1s
[CV 1/5] END max_features=0.25, n_estimators=1000; AUC: (test=0.780) total time=  20.1s
[CV 5/5] END max_features=0.25, n_estimators=1000; AUC: (test=0.766) total time=  18.4s
[CV 5/5] END max_features=0.5, n_estimators=200; AUC: (test=0.759) total time=   6.3s

/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/joblib/externals/loky/process_executor.py:752: UserWarning:

A worker stopped while some jobs were given to the executor. This can be caused by a too short worker timeout or by a memory leak.

/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/model_selection/_validation.py:547: FitFailedWarning:


120 fits failed out of a total of 330.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
30 fits failed with the following error:
Traceback (most recent call last):
  File "/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/model_selection/_validation.py", line 895, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/base.py", line 1467, in wrapper
    estimator._validate_params()
  File "/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/base.py", line 666, in _validate_params
    validate_parameter_constraints(
  File "/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/utils/_param_validation.py", line 95, in validate_parameter_constraints
    raise InvalidParameterError(
sklearn.utils._param_validation.InvalidParameterError: The 'max_features' parameter of RandomForestClassifier must be an int in the range [1, inf), a float in the range (0.0, 1.0], a str among {'log2', 'sqrt'} or None. Got 'auto' instead.

--------------------------------------------------------------------------------
30 fits failed with the following error:
Traceback (most recent call last):
  File "/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/model_selection/_validation.py", line 895, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/base.py", line 1467, in wrapper
    estimator._validate_params()
  File "/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/base.py", line 666, in _validate_params
    validate_parameter_constraints(
  File "/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/utils/_param_validation.py", line 95, in validate_parameter_constraints
    raise InvalidParameterError(
sklearn.utils._param_validation.InvalidParameterError: The 'max_features' parameter of RandomForestClassifier must be an int in the range [1, inf), a float in the range (0.0, 1.0], a str among {'sqrt', 'log2'} or None. Got 2.0 instead.

--------------------------------------------------------------------------------
30 fits failed with the following error:
Traceback (most recent call last):
  File "/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/model_selection/_validation.py", line 895, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/base.py", line 1467, in wrapper
    estimator._validate_params()
  File "/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/base.py", line 666, in _validate_params
    validate_parameter_constraints(
  File "/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/utils/_param_validation.py", line 95, in validate_parameter_constraints
    raise InvalidParameterError(
sklearn.utils._param_validation.InvalidParameterError: The 'max_features' parameter of RandomForestClassifier must be an int in the range [1, inf), a float in the range (0.0, 1.0], a str among {'sqrt', 'log2'} or None. Got 3.0 instead.

--------------------------------------------------------------------------------
30 fits failed with the following error:
Traceback (most recent call last):
  File "/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/model_selection/_validation.py", line 895, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/base.py", line 1467, in wrapper
    estimator._validate_params()
  File "/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/base.py", line 666, in _validate_params
    validate_parameter_constraints(
  File "/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/utils/_param_validation.py", line 95, in validate_parameter_constraints
    raise InvalidParameterError(
sklearn.utils._param_validation.InvalidParameterError: The 'max_features' parameter of RandomForestClassifier must be an int in the range [1, inf), a float in the range (0.0, 1.0], a str among {'sqrt', 'log2'} or None. Got 4.0 instead.


/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/model_selection/_search.py:1051: UserWarning:

One or more of the test scores are non-finite: [0.7054542  0.75040485 0.75711448 0.75888311 0.76132513 0.76196086
        nan        nan        nan        nan        nan        nan
 0.70889806 0.76061223 0.76902557 0.77397572 0.77621517 0.7770484
 0.70889806 0.76061223 0.76902557 0.77397572 0.77621517 0.7770484
 0.70889806 0.76061223 0.76902557 0.77397572 0.77621517 0.7770484
 0.7220481  0.75976207 0.76435102 0.76833309 0.77083125 0.7711869
 0.71471084 0.75111362 0.75801595 0.76259915 0.76559531 0.76627665
 0.7054542  0.75040485 0.75711448 0.75888311 0.76132513 0.76196086
        nan        nan        nan        nan        nan        nan
        nan        nan        nan        nan        nan        nan
        nan        nan        nan        nan        nan        nan]

GridSearchCV(cv=5,
             estimator=RandomForestClassifier(oob_score=True,
                                              random_state=1234),
             n_jobs=4,
             param_grid={'max_features': [None, 'auto', 'sqrt', 'log2', 0.25,
                                          0.5, 0.75, 1.0, 2.0, 3.0, 4.0],
                         'n_estimators': [10, 50, 100, 200, 500, 1000]},
             refit='AUC', scoring={'AUC': 'roc_auc'}, verbose=5)

Code

rf_cv_treatment.best_params_

{'max_features': 'sqrt', 'n_estimators': 1000}

Code

rf_treatment = rsm.model.rforest(
    data = {'cg_rct_stacked': cg_rct_stacked.query("training == 1 & ad == 1")},
    rvar = 'converted',
    lev = 'yes',
    evar = evar,
    **rf_cv_treatment.best_params_
)
rf_treatment.summary()

Random Forest
Data                 : cg_rct_stacked
Response variable    : converted
Level                : yes
Explanatory variables: GameLevel, NumGameDays, NumGameDays4Plus, NumInGameMessagesSent, NumFriends, NumFriendRequestIgnored, NumSpaceHeroBadges, AcquiredSpaceship, AcquiredIonWeapon, TimesLostSpaceship, TimesKilled, TimesCaptain, TimesNavigator, PurchasedCoinPackSmall, PurchasedCoinPackLarge, NumAdsClicked, DaysUser, UserConsole, UserHasOldOS
OOB                  : True
Model type           : classification
Nr. of features      : (19, 19)
Nr. of observations  : 21,000
max_features         : sqrt (4)
n_estimators         : 1000
min_samples_leaf     : 1
random_state         : 1234
AUC                  : 0.775

Estimation data      :
 GameLevel  NumGameDays  NumGameDays4Plus  NumInGameMessagesSent  NumFriends  NumFriendRequestIgnored  NumSpaceHeroBadges  TimesLostSpaceship  TimesKilled  TimesCaptain  TimesNavigator  NumAdsClicked  DaysUser  AcquiredSpaceship_yes  AcquiredIonWeapon_yes  PurchasedCoinPackSmall_yes  PurchasedCoinPackLarge_yes  UserConsole_yes  UserHasOldOS_yes
         5           15                 0                    179         362                       50                   0                  22            0             4               4              2      1308                   True                  False                       False                       False             True             False
         4            4                 0                     36           0                        0                   0                   0            0             0               0              2      2922                  False                  False                       False                       False             True             False
         8           17                 0                    222          20                       63                  10                  10            0             9               6              4      2192                   True                  False                        True                       False             True             False
        10           18                 2                      0          56                        6                   2                   1            0             0               0             13      2313                  False                  False                       False                        True             True             False
        10           20                 5                     36           0                       16                   0                   0            0             0               0              9      1766                  False                  False                       False                        True             True             False

Code

rf_control = rsm.model.rforest(
    data = {'cg_rct_stacked': cg_rct_stacked.query("training == 1 & ad == 0")},
    rvar = 'converted',
    lev = 'yes',
    evar = evar,
)
rf_control.summary()

Random Forest
Data                 : cg_rct_stacked
Response variable    : converted
Level                : yes
Explanatory variables: GameLevel, NumGameDays, NumGameDays4Plus, NumInGameMessagesSent, NumFriends, NumFriendRequestIgnored, NumSpaceHeroBadges, AcquiredSpaceship, AcquiredIonWeapon, TimesLostSpaceship, TimesKilled, TimesCaptain, TimesNavigator, PurchasedCoinPackSmall, PurchasedCoinPackLarge, NumAdsClicked, DaysUser, UserConsole, UserHasOldOS
OOB                  : True
Model type           : classification
Nr. of features      : (19, 19)
Nr. of observations  : 21,000
max_features         : sqrt (4)
n_estimators         : 100
min_samples_leaf     : 1
random_state         : 1234
AUC                  : 0.851

Estimation data      :
 GameLevel  NumGameDays  NumGameDays4Plus  NumInGameMessagesSent  NumFriends  NumFriendRequestIgnored  NumSpaceHeroBadges  TimesLostSpaceship  TimesKilled  TimesCaptain  TimesNavigator  NumAdsClicked  DaysUser  AcquiredSpaceship_yes  AcquiredIonWeapon_yes  PurchasedCoinPackSmall_yes  PurchasedCoinPackLarge_yes  UserConsole_yes  UserHasOldOS_yes
         7           18                 0                    124           0                       81                   0                   8            0             0               4              3      2101                   True                  False                       False                        True            False             False
        10            3                 2                     60         479                       18                   0                  10            7             0               0              7      1644                  False                  False                        True                       False             True             False
         2            1                 0                      0           0                        0                   0                   0            0             0               2              8      3197                  False                  False                       False                       False             True              True
         8           15                 0                      0          51                        6                   0                   0            0             2               1             21      2009                   True                  False                        True                       False             True             False
        10           18                 0                      0           0                        0                   0                   0            0             0               0              6      3288                  False                  False                        True                       False             True             False

Model Tuning

Code

rf_cv_control = GridSearchCV(rf_control.fitted, param_grid, scoring=scoring, cv=5, n_jobs=4, refit="AUC", verbose=5)
rf_cv_control.fit(rf_control.data_onehot, rf_control.data.converted_yes)

Fitting 5 folds for each of 66 candidates, totalling 330 fits

/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/ensemble/_forest.py:615: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable OOB estimates.
  warn(
/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/model_selection/_validation.py:547: FitFailedWarning:


120 fits failed out of a total of 330.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
30 fits failed with the following error:
Traceback (most recent call last):
  File "/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/model_selection/_validation.py", line 895, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/base.py", line 1467, in wrapper
    estimator._validate_params()
  File "/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/base.py", line 666, in _validate_params
    validate_parameter_constraints(
  File "/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/utils/_param_validation.py", line 95, in validate_parameter_constraints
    raise InvalidParameterError(
sklearn.utils._param_validation.InvalidParameterError: The 'max_features' parameter of RandomForestClassifier must be an int in the range [1, inf), a float in the range (0.0, 1.0], a str among {'sqrt', 'log2'} or None. Got 'auto' instead.

--------------------------------------------------------------------------------
30 fits failed with the following error:
Traceback (most recent call last):
  File "/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/model_selection/_validation.py", line 895, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/base.py", line 1467, in wrapper
    estimator._validate_params()
  File "/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/base.py", line 666, in _validate_params
    validate_parameter_constraints(
  File "/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/utils/_param_validation.py", line 95, in validate_parameter_constraints
    raise InvalidParameterError(
sklearn.utils._param_validation.InvalidParameterError: The 'max_features' parameter of RandomForestClassifier must be an int in the range [1, inf), a float in the range (0.0, 1.0], a str among {'log2', 'sqrt'} or None. Got 2.0 instead.

--------------------------------------------------------------------------------
22 fits failed with the following error:
Traceback (most recent call last):
  File "/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/model_selection/_validation.py", line 895, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/base.py", line 1467, in wrapper
    estimator._validate_params()
  File "/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/base.py", line 666, in _validate_params
    validate_parameter_constraints(
  File "/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/utils/_param_validation.py", line 95, in validate_parameter_constraints
    raise InvalidParameterError(
sklearn.utils._param_validation.InvalidParameterError: The 'max_features' parameter of RandomForestClassifier must be an int in the range [1, inf), a float in the range (0.0, 1.0], a str among {'log2', 'sqrt'} or None. Got 3.0 instead.

--------------------------------------------------------------------------------
8 fits failed with the following error:
Traceback (most recent call last):
  File "/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/model_selection/_validation.py", line 895, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/base.py", line 1467, in wrapper
    estimator._validate_params()
  File "/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/base.py", line 666, in _validate_params
    validate_parameter_constraints(
  File "/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/utils/_param_validation.py", line 95, in validate_parameter_constraints
    raise InvalidParameterError(
sklearn.utils._param_validation.InvalidParameterError: The 'max_features' parameter of RandomForestClassifier must be an int in the range [1, inf), a float in the range (0.0, 1.0], a str among {'sqrt', 'log2'} or None. Got 3.0 instead.

--------------------------------------------------------------------------------
18 fits failed with the following error:
Traceback (most recent call last):
  File "/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/model_selection/_validation.py", line 895, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/base.py", line 1467, in wrapper
    estimator._validate_params()
  File "/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/base.py", line 666, in _validate_params
    validate_parameter_constraints(
  File "/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/utils/_param_validation.py", line 95, in validate_parameter_constraints
    raise InvalidParameterError(
sklearn.utils._param_validation.InvalidParameterError: The 'max_features' parameter of RandomForestClassifier must be an int in the range [1, inf), a float in the range (0.0, 1.0], a str among {'sqrt', 'log2'} or None. Got 4.0 instead.

--------------------------------------------------------------------------------
12 fits failed with the following error:
Traceback (most recent call last):
  File "/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/model_selection/_validation.py", line 895, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/base.py", line 1467, in wrapper
    estimator._validate_params()
  File "/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/base.py", line 666, in _validate_params
    validate_parameter_constraints(
  File "/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/utils/_param_validation.py", line 95, in validate_parameter_constraints
    raise InvalidParameterError(
sklearn.utils._param_validation.InvalidParameterError: The 'max_features' parameter of RandomForestClassifier must be an int in the range [1, inf), a float in the range (0.0, 1.0], a str among {'log2', 'sqrt'} or None. Got 4.0 instead.


/Users/duyentran/Library/Python/3.11/lib/python/site-packages/sklearn/model_selection/_search.py:1051: UserWarning:

One or more of the test scores are non-finite: [0.79399629 0.84770861 0.85644669 0.85794913 0.86166784 0.86328154
        nan        nan        nan        nan        nan        nan
 0.79976758 0.86098559 0.86567103 0.87050655 0.87364622 0.8743026
 0.79976758 0.86098559 0.86567103 0.87050655 0.87364622 0.8743026
 0.79976758 0.86098559 0.86567103 0.87050655 0.87364622 0.8743026
 0.80333525 0.85486483 0.86299353 0.8676931  0.86846026 0.86937865
 0.79893787 0.84862995 0.85982764 0.86258705 0.86599233 0.8667254
 0.79399629 0.84770861 0.85644669 0.85794913 0.86166784 0.86328154
        nan        nan        nan        nan        nan        nan
        nan        nan        nan        nan        nan        nan
        nan        nan        nan        nan        nan        nan]

GridSearchCV(cv=5,
             estimator=RandomForestClassifier(oob_score=True,
                                              random_state=1234),
             n_jobs=4,
             param_grid={'max_features': [None, 'auto', 'sqrt', 'log2', 0.25,
                                          0.5, 0.75, 1.0, 2.0, 3.0, 4.0],
                         'n_estimators': [10, 50, 100, 200, 500, 1000]},
             refit='AUC', scoring={'AUC': 'roc_auc'}, verbose=5)

Code

rf_cv_control.best_params_

{'max_features': 'sqrt', 'n_estimators': 1000}

Code

rf_control = rsm.model.rforest(
    data = {'cg_rct_stacked': cg_rct_stacked.query("training == 1 & ad == 0")},
    rvar = 'converted',
    lev = 'yes',
    evar = evar,
    **rf_cv_control.best_params_
)
rf_control.summary()

Random Forest
Data                 : cg_rct_stacked
Response variable    : converted
Level                : yes
Explanatory variables: GameLevel, NumGameDays, NumGameDays4Plus, NumInGameMessagesSent, NumFriends, NumFriendRequestIgnored, NumSpaceHeroBadges, AcquiredSpaceship, AcquiredIonWeapon, TimesLostSpaceship, TimesKilled, TimesCaptain, TimesNavigator, PurchasedCoinPackSmall, PurchasedCoinPackLarge, NumAdsClicked, DaysUser, UserConsole, UserHasOldOS
OOB                  : True
Model type           : classification
Nr. of features      : (19, 19)
Nr. of observations  : 21,000
max_features         : sqrt (4)
n_estimators         : 1000
min_samples_leaf     : 1
random_state         : 1234
AUC                  : 0.873

Estimation data      :
 GameLevel  NumGameDays  NumGameDays4Plus  NumInGameMessagesSent  NumFriends  NumFriendRequestIgnored  NumSpaceHeroBadges  TimesLostSpaceship  TimesKilled  TimesCaptain  TimesNavigator  NumAdsClicked  DaysUser  AcquiredSpaceship_yes  AcquiredIonWeapon_yes  PurchasedCoinPackSmall_yes  PurchasedCoinPackLarge_yes  UserConsole_yes  UserHasOldOS_yes
         7           18                 0                    124           0                       81                   0                   8            0             0               4              3      2101                   True                  False                       False                        True            False             False
        10            3                 2                     60         479                       18                   0                  10            7             0               0              7      1644                  False                  False                        True                       False             True             False
         2            1                 0                      0           0                        0                   0                   0            0             0               2              8      3197                  False                  False                       False                       False             True              True
         8           15                 0                      0          51                        6                   0                   0            0             2               1             21      2009                   True                  False                        True                       False             True             False
        10           18                 0                      0           0                        0                   0                   0            0             0               0              6      3288                  False                  False                        True                       False             True             False

Code

# Predictions
cg_rct_stacked["pred_treatment_rf"] = rf_treatment.predict(cg_rct_stacked)["prediction"]
cg_rct_stacked["pred_control_rf"] = rf_control.predict(cg_rct_stacked)["prediction"]

Code

cg_rct_stacked["uplift_score_rf"] = (
    cg_rct_stacked.pred_treatment_rf - cg_rct_stacked.pred_control_rf
)

3. Calculate the Uplift and Incremental Uplift

Code

uplift_tab_rf = rsm.uplift_tab(
    cg_rct_stacked.query("training == 0"), "converted", "yes", "uplift_score_rf", "ad", 1, qnt = 20
)
uplift_tab_rf

	pred	bins	cum_prop	T_resp	T_n	C_resp	C_n	incremental_resp	inc_uplift	uplift
0	uplift_score_rf	1	0.05	210	449	59	579	164.246978	1.824966	0.365806
1	uplift_score_rf	2	0.10	357	898	96	1113	279.544474	3.106050	0.258106
2	uplift_score_rf	3	0.15	478	1347	126	1686	377.334520	4.192606	0.217132
3	uplift_score_rf	4	0.20	576	1795	156	2199	448.660300	4.985114	0.160270
4	uplift_score_rf	5	0.25	666	2237	182	2707	515.599557	5.728884	0.152439
5	uplift_score_rf	6	0.30	732	2700	203	3201	560.772259	6.230803	0.100038
6	uplift_score_rf	7	0.35	777	3146	229	3708	582.708198	6.474536	0.049615
7	uplift_score_rf	8	0.40	824	3590	239	4186	619.028667	6.878096	0.084935
8	uplift_score_rf	9	0.45	855	4040	252	4704	638.571429	7.095238	0.043792
9	uplift_score_rf	10	0.50	888	4474	258	5157	664.169866	7.379665	0.062792
10	uplift_score_rf	11	0.55	916	4930	270	5632	679.654119	7.551712	0.036140
11	uplift_score_rf	12	0.60	942	5376	279	6138	697.636364	7.751515	0.040509
12	uplift_score_rf	13	0.65	968	5818	288	6550	712.185649	7.913174	0.036979
13	uplift_score_rf	14	0.70	989	6269	290	7006	729.506709	8.105630	0.042177
14	uplift_score_rf	15	0.75	1009	6750	292	7499	746.164955	8.290722	0.037523
15	uplift_score_rf	16	0.80	1017	7184	299	7898	745.030387	8.278115	0.000889
16	uplift_score_rf	17	0.85	1031	7644	303	8322	752.685652	8.363174	0.021001
17	uplift_score_rf	18	0.90	1087	8098	353	8662	756.984530	8.410939	-0.023711
18	uplift_score_rf	19	0.95	1139	8548	432	8842	721.364171	8.015157	-0.323333
19	uplift_score_rf	20	1.00	1174	9000	512	9000	662.000000	7.355556	-0.428895

Code

fig = rsm.inc_uplift_plot(
    cg_rct_stacked.query("training == 0"), "converted", "yes", "uplift_score_rf", "ad", 1, qnt = 20
)

The curve starts at 0% uplift when 0% of the population is targeted, which is expected as no one has yet been exposed to the campaign.
As the percentage of the targeted population increases, the incremental uplift also increases. This indicates that targeting more of the population is initially resulting in a higher incremental gain.
The curve shows a steep initial growth in uplift as the targeting begins, suggesting that the early segments of the population targeted are highly responsive to the campaign.
After a certain point, the rate of increase in incremental uplift starts to diminish. This is seen as the curve begins to flatten, suggesting that the additional gains from targeting more of the population are decreasing.
The curve reaches a peak and then plateaus, indicating that there is an optimal point of targeting beyond which the incremental benefits do not increase significantly. This is typically where the marketer would aim to stop targeting additional customers to maximize efficiency and return on investment.

Code

fig = rsm.uplift_plot(
    cg_rct_stacked.query("training == 0"), "converted", "yes", "uplift_score_rf", "ad", 1, qnt = 20
)

The first few bars show a positive uplift, with the first bar indicating an uplift of approximately 30%. This suggests that the first segment of the population (likely the top 5% or 10%) responded very well to the intervention.
As moving right along the x-axis, the uplift decreases. This is expected as typically, the individuals most likely to respond are targeted first, and as moving through the population, the less responsive individuals are included.
Eventually, the uplift drops to 0% and then becomes negative. The negative bars at the end suggest that targeting those segments of the population may have been counterproductive, either because the intervention had an adverse effect or because it was an unnecessary expense for those individuals who might have taken the desired action without any intervention.
The most negative bar, located towards the right end of the chart, indicates a significant negative impact on that segment. This might represent a group that was either deterred by the campaign or where the cost of targeting outweighed the benefits.

4. Using the incremental_resp to calculate the profits for Uplift model

Code

uplift_profit_rf = prof_calc(uplift_tab_rf, 14.99, 1.5)
uplift_profit_rf

58311.16473340722

5. Calculate the uplift and Increatmental Uplift for Propensity Model

Code

prop_tab_rf = rsm.uplift_tab(
    cg_rct_stacked.query("training == 0"), "converted", "yes", "pred_treatment_rf", "ad", 1, qnt = 20
)

Compare Uplift model and Propensity model

Code

fig = rsm.inc_uplift_plot(
    cg_rct_stacked.query("training == 0"),
    "converted",
    "yes",
    ["pred_treatment_rf", "uplift_score_rf"],
    "ad",
    1, qnt = 20
)

Uplift Model: The line representing uplift_score_rf is consistently above the pred_treatment_rf line, suggesting that the uplift model identifies individuals who will respond to the treatment more effectively than the propensity model alone. This indicates that using the uplift model, the campaign would yield a higher incremental gain across the targeted population segments.
Propensity Model: The pred_treatment_rf line shows that the propensity model does predict some level of uplift, but it is lower compared to the uplift model. This implies that while the propensity model identifies individuals likely to respond to the treatment, it does not do so as effectively as the uplift model when it comes to maximizing incremental uplift.
Diminishing Returns: Both models show an increase in incremental uplift as a larger percentage of the population is targeted, but the rate of increase slows down, and both lines begin to plateau. This indicates diminishing returns; beyond a certain point, targeting additional segments of the population yields progressively smaller increases in uplift.
Optimal Targeting Point: The point at which the curves start to plateau suggests the optimal targeting point for the campaign. Beyond this point, the additional cost of targeting more individuals may not be justified by the incremental gains.

Code

fig = rsm.uplift_plot(
    cg_rct_stacked.query("training == 0"),
    "converted",
    "yes",
    ["pred_treatment_rf", "uplift_score_rf"],
    "ad",
    1, qnt = 20
)

Initial Segments: The first few segments, representing the most responsive parts of the population, show a significant positive uplift for both models. This suggests that both models are effective at identifying the individuals most likely to be influenced by the campaign.
Decreasing Uplift: As moving to the right, representing a larger share of the population being targeted, the uplift for both models decreases. This is typical in targeted marketing as the most responsive individuals are usually the first ones targeted.
Negative Uplift: Towards the end segments, both models show negative uplift, which can indicate that targeting these individuals may have a counterproductive effect. It could mean that the campaign is reaching individuals who either would have made a purchase without the campaign or who may be turned off by the campaign.
Comparison Between Models: In most segments, the uplift_score_rf bars are higher than the pred_treatment_rf bars, suggesting that the uplift model is more effective at identifying which segments of the population will provide a higher incremental uplift when targeted.

6. Using the Incremental_resp to calculate the profits for Propensity Model

Code

propensity_profit_rf = prof_calc(prop_tab_rf, 14.99, 1.5)
propensity_profit_rf

48633.38627925748

Code

# Difference in profits from using uplift model and propensity model
difference_rf = uplift_profit_rf - propensity_profit_rf
difference_rf

9677.778454149739

Using XGBoost Model

Code

import warnings 
warnings.filterwarnings("ignore") 
import xgboost as xgb
# Create X_train, X_test, y_train, y_test for treatment group
X_train_treatment = cg_rct_stacked.loc[(cg_rct_stacked["training"] == 1) & (cg_rct_stacked["ad"] == 1), evar]
y_train_treatment = cg_rct_stacked.query("training == 1 & ad == 1").converted_yes

X_test_treatment = cg_rct_stacked.loc[(cg_rct_stacked["training"] == 0) & (cg_rct_stacked["ad"] == 1), evar]
y_test_treatment = cg_rct_stacked.query("training == 0 & ad == 1").converted_yes

Code

import warnings 
warnings.filterwarnings("ignore") 

# Setup model
xgbc_treatment = xgb.XGBClassifier(use_label_encoder=False, eval_metric="logloss", enable_categorical=True)
xgbc_treatment.fit(X_train_treatment, y_train_treatment)

# Set up and fit GridSearchCV
param_grid = {
    'learning_rate': [0.01, 0.05, 0.1, 0.2],
    'max_depth': [3, 4, 5, 6, 7, 8, 9, 10],
    'min_child_weight': [1, 2, 3, 4, 5, 6]
}

xgbc_cv_treatment = GridSearchCV(xgbc_treatment, param_grid, scoring='roc_auc', cv=5, n_jobs=4, verbose=5)
xgbc_cv_treatment.fit(X_train_treatment, y_train_treatment)

# Retrieve the best parameters and retrain the model
best_params_treatment = xgbc_cv_treatment.best_params_
xgbc_treatment = xgb.XGBClassifier(**best_params_treatment, use_label_encoder=False, eval_metric="logloss", enable_categorical=True)
xgbc_treatment.fit(X_train_treatment, y_train_treatment)

Fitting 5 folds for each of 192 candidates, totalling 960 fits
[CV 3/5] END alpha=0.0001, hidden_layer_sizes=(1,); AUC: (test=0.711) total time=   0.2s
[CV 2/5] END alpha=0.0001, hidden_layer_sizes=(2,); AUC: (test=0.737) total time=   0.7s
[CV 5/5] END alpha=0.0001, hidden_layer_sizes=(2,); AUC: (test=0.754) total time=   3.6s
[CV 1/5] END alpha=0.0001, hidden_layer_sizes=(3, 3); AUC: (test=0.778) total time=   6.8s
[CV 5/5] END alpha=0.0001, hidden_layer_sizes=(3, 3); AUC: (test=0.759) total time=  17.7s
[CV 4/5] END alpha=0.0001, hidden_layer_sizes=(4, 2); AUC: (test=0.794) total time=  25.6s
[CV 4/5] END alpha=0.0001, hidden_layer_sizes=(5, 5); AUC: (test=0.775) total time=  24.7s
[CV 2/5] END alpha=0.0001, hidden_layer_sizes=(5, 10); AUC: (test=0.729) total time= 1.8min
[CV 4/5] END alpha=0.0001, hidden_layer_sizes=(10, 5); AUC: (test=0.719) total time=  29.2s
[CV 1/5] END alpha=0.001, hidden_layer_sizes=(3, 3); AUC: (test=0.778) total time=   4.2s
[CV 5/5] END alpha=0.001, hidden_layer_sizes=(3, 3); AUC: (test=0.757) total time=  17.6s
[CV 2/5] END alpha=0.001, hidden_layer_sizes=(4, 2); AUC: (test=0.777) total time=  16.1s
[CV 1/5] END alpha=0.001, hidden_layer_sizes=(5, 5); AUC: (test=0.768) total time=  27.3s
[CV 5/5] END alpha=0.001, hidden_layer_sizes=(5, 5); AUC: (test=0.757) total time=  11.3s
[CV 4/5] END alpha=0.001, hidden_layer_sizes=(5,); AUC: (test=0.788) total time=   3.2s
[CV 2/5] END alpha=0.001, hidden_layer_sizes=(10,); AUC: (test=0.752) total time=   4.4s
[CV 5/5] END alpha=0.001, hidden_layer_sizes=(10,); AUC: (test=0.748) total time=   2.8s
[CV 3/5] END alpha=0.001, hidden_layer_sizes=(5, 10); AUC: (test=0.764) total time=  34.2s
[CV 5/5] END alpha=0.001, hidden_layer_sizes=(5, 10); AUC: (test=0.747) total time=  37.6s
[CV 2/5] END alpha=0.001, hidden_layer_sizes=(10, 5); AUC: (test=0.721) total time=  23.0s
[CV 5/5] END alpha=0.001, hidden_layer_sizes=(10, 5); AUC: (test=0.691) total time=  10.8s
[CV 5/5] END alpha=0.01, hidden_layer_sizes=(2,); AUC: (test=0.746) total time=   5.8s
[CV 1/5] END alpha=0.01, hidden_layer_sizes=(3, 3); AUC: (test=0.775) total time=  10.9s
[CV 1/5] END alpha=0.01, hidden_layer_sizes=(4, 2); AUC: (test=0.778) total time=  14.7s
[CV 5/5] END alpha=0.01, hidden_layer_sizes=(4, 2); AUC: (test=0.772) total time=  12.7s
[CV 3/5] END alpha=0.01, hidden_layer_sizes=(5, 5); AUC: (test=0.766) total time=  46.6s
[CV 4/5] END alpha=0.01, hidden_layer_sizes=(5, 10); AUC: (test=0.764) total time=  43.2s
[CV 2/5] END alpha=0.01, hidden_layer_sizes=(10, 5); AUC: (test=0.732) total time=  18.9s
[CV 1/5] END alpha=0.1, hidden_layer_sizes=(1,); AUC: (test=0.711) total time=   0.1s
[CV 2/5] END alpha=0.1, hidden_layer_sizes=(1,); AUC: (test=0.719) total time=   0.1s
[CV 3/5] END alpha=0.1, hidden_layer_sizes=(1,); AUC: (test=0.712) total time=   0.1s
[CV 4/5] END alpha=0.1, hidden_layer_sizes=(1,); AUC: (test=0.703) total time=   0.1s
[CV 5/5] END alpha=0.1, hidden_layer_sizes=(1,); AUC: (test=0.699) total time=   0.1s
[CV 1/5] END alpha=0.1, hidden_layer_sizes=(2,); AUC: (test=0.751) total time=   0.4s
[CV 2/5] END alpha=0.1, hidden_layer_sizes=(2,); AUC: (test=0.734) total time=   0.3s
[CV 3/5] END alpha=0.1, hidden_layer_sizes=(2,); AUC: (test=0.747) total time=   2.2s
[CV 4/5] END alpha=0.1, hidden_layer_sizes=(2,); AUC: (test=0.751) total time=   0.7s
[CV 5/5] END alpha=0.1, hidden_layer_sizes=(2,); AUC: (test=0.738) total time=   2.9s
[CV 2/5] END alpha=0.1, hidden_layer_sizes=(3,); AUC: (test=0.732) total time=   0.5s
[CV 3/5] END alpha=0.1, hidden_layer_sizes=(3,); AUC: (test=0.756) total time=   1.2s
[CV 1/5] END alpha=0.1, hidden_layer_sizes=(3, 3); AUC: (test=0.778) total time=   4.5s
[CV 5/5] END alpha=0.1, hidden_layer_sizes=(3, 3); AUC: (test=0.755) total time=  22.0s
[CV 4/5] END alpha=0.1, hidden_layer_sizes=(4, 2); AUC: (test=0.751) total time=   3.0s
[CV 1/5] END alpha=0.1, hidden_layer_sizes=(5, 5); AUC: (test=0.775) total time=  19.1s
[CV 5/5] END alpha=0.1, hidden_layer_sizes=(5, 5); AUC: (test=0.765) total time=  12.1s
[CV 5/5] END alpha=0.1, hidden_layer_sizes=(5,); AUC: (test=0.769) total time=   2.0s
[CV 1/5] END alpha=0.1, hidden_layer_sizes=(10,); AUC: (test=0.753) total time=   2.0s
[CV 1/5] END alpha=0.1, hidden_layer_sizes=(5, 10); AUC: (test=0.777) total time=  20.3s
[CV 5/5] END alpha=0.1, hidden_layer_sizes=(5, 10); AUC: (test=0.750) total time=  27.9s
[CV 2/5] END alpha=1, hidden_layer_sizes=(3,); AUC: (test=0.730) total time=   0.3s
[CV 4/5] END alpha=1, hidden_layer_sizes=(3,); AUC: (test=0.730) total time=   1.1s
[CV 2/5] END alpha=1, hidden_layer_sizes=(3, 3); AUC: (test=0.769) total time=   6.1s
[CV 3/5] END alpha=1, hidden_layer_sizes=(4, 2); AUC: (test=0.771) total time=   2.8s
[CV 2/5] END alpha=1, hidden_layer_sizes=(5, 5); AUC: (test=0.761) total time=   5.3s
[CV 5/5] END alpha=1, hidden_layer_sizes=(5, 5); AUC: (test=0.773) total time=   7.6s
[CV 1/5] END alpha=1, hidden_layer_sizes=(5, 10); AUC: (test=0.777) total time=  20.4s
[CV 2/5] END alpha=1, hidden_layer_sizes=(10, 5); AUC: (test=0.744) total time=   5.6s
[CV 4/5] END alpha=1, hidden_layer_sizes=(10, 5); AUC: (test=0.754) total time=   5.0s
[CV 3/5] END alpha=0.0001, hidden_layer_sizes=(1,); AUC: (test=0.826) total time=   0.2s
[CV 3/5] END alpha=0.0001, hidden_layer_sizes=(2,); AUC: (test=0.836) total time=   0.2s
[CV 1/5] END alpha=0.0001, hidden_layer_sizes=(3,); AUC: (test=0.839) total time=   0.3s
[CV 2/5] END alpha=0.0001, hidden_layer_sizes=(3,); AUC: (test=0.836) total time=   0.3s
[CV 1/5] END alpha=0.0001, hidden_layer_sizes=(4, 2); AUC: (test=0.820) total time=   2.8s
[CV 2/5] END alpha=0.0001, hidden_layer_sizes=(4, 2); AUC: (test=0.818) total time=   0.6s
[CV 2/5] END alpha=0.0001, hidden_layer_sizes=(5, 5); AUC: (test=0.821) total time=   7.8s
[CV 3/5] END alpha=0.0001, hidden_layer_sizes=(5, 5); AUC: (test=0.816) total time=   5.4s
[CV 2/5] END alpha=0.0001, hidden_layer_sizes=(5, 10); AUC: (test=0.822) total time=  15.7s
[CV 2/5] END alpha=0.0001, hidden_layer_sizes=(10, 5); AUC: (test=0.779) total time=  12.4s
[CV 5/5] END alpha=0.0001, hidden_layer_sizes=(10, 5); AUC: (test=0.809) total time=   4.1s
[CV 4/5] END alpha=0.001, hidden_layer_sizes=(3,); AUC: (test=0.819) total time=   0.6s
[CV 2/5] END alpha=0.001, hidden_layer_sizes=(3, 3); AUC: (test=0.839) total time=   1.8s
[CV 3/5] END alpha=0.001, hidden_layer_sizes=(3, 3); AUC: (test=0.846) total time=   0.9s
[CV 4/5] END alpha=0.001, hidden_layer_sizes=(3, 3); AUC: (test=0.814) total time=   4.2s
[CV 1/5] END alpha=0.001, hidden_layer_sizes=(4, 2); AUC: (test=0.820) total time=   2.2s
[CV 5/5] END alpha=0.001, hidden_layer_sizes=(4, 2); AUC: (test=0.875) total time=   2.8s
[CV 3/5] END alpha=0.001, hidden_layer_sizes=(5, 5); AUC: (test=0.820) total time=   3.2s
[CV 5/5] END alpha=0.001, hidden_layer_sizes=(5, 5); AUC: (test=0.825) total time=   4.0s
[CV 1/5] END alpha=0.001, hidden_layer_sizes=(10,); AUC: (test=0.864) total time=   1.4s
[CV 4/5] END alpha=0.001, hidden_layer_sizes=(10,); AUC: (test=0.783) total time=   1.8s
[CV 2/5] END alpha=0.001, hidden_layer_sizes=(5, 10); AUC: (test=0.816) total time=  14.5s
[CV 2/5] END alpha=0.001, hidden_layer_sizes=(10, 5); AUC: (test=0.776) total time=  19.5s
[CV 3/5] END alpha=0.01, hidden_layer_sizes=(1,); AUC: (test=0.826) total time=   0.1s
[CV 5/5] END alpha=0.01, hidden_layer_sizes=(1,); AUC: (test=0.849) total time=   0.1s
[CV 1/5] END alpha=0.01, hidden_layer_sizes=(2,); AUC: (test=0.841) total time=   0.3s
[CV 3/5] END alpha=0.01, hidden_layer_sizes=(2,); AUC: (test=0.836) total time=   0.1s
[CV 5/5] END alpha=0.01, hidden_layer_sizes=(2,); AUC: (test=0.853) total time=   0.3s
[CV 2/5] END alpha=0.01, hidden_layer_sizes=(3,); AUC: (test=0.837) total time=   0.3s
[CV 4/5] END alpha=0.01, hidden_layer_sizes=(3,); AUC: (test=0.819) total time=   0.3s
[CV 1/5] END alpha=0.01, hidden_layer_sizes=(3, 3); AUC: (test=0.846) total time=   2.8s
[CV 4/5] END alpha=0.01, hidden_layer_sizes=(3, 3); AUC: (test=0.816) total time=   3.6s
[CV 2/5] END alpha=0.01, hidden_layer_sizes=(4, 2); AUC: (test=0.814) total time=   1.2s
[CV 4/5] END alpha=0.01, hidden_layer_sizes=(4, 2); AUC: (test=0.798) total time=   1.9s
[CV 1/5] END alpha=0.01, hidden_layer_sizes=(5, 5); AUC: (test=0.825) total time=   2.9s
[CV 5/5] END alpha=0.01, hidden_layer_sizes=(5, 5); AUC: (test=0.838) total time=   5.2s
[CV 3/5] END alpha=0.01, hidden_layer_sizes=(5, 10); AUC: (test=0.823) total time=  27.1s
[CV 5/5] END alpha=0.01, hidden_layer_sizes=(10, 5); AUC: (test=0.819) total time=   5.3s
[CV 2/5] END alpha=0.1, hidden_layer_sizes=(3, 3); AUC: (test=0.839) total time=   1.1s
[CV 4/5] END alpha=0.1, hidden_layer_sizes=(3, 3); AUC: (test=0.815) total time=   1.1s
[CV 1/5] END alpha=0.1, hidden_layer_sizes=(4, 2); AUC: (test=0.829) total time=   1.2s
[CV 2/5] END alpha=0.1, hidden_layer_sizes=(4, 2); AUC: (test=0.818) total time=   0.6s
[CV 4/5] END alpha=0.1, hidden_layer_sizes=(4, 2); AUC: (test=0.810) total time=   1.0s
[CV 1/5] END alpha=0.1, hidden_layer_sizes=(5, 5); AUC: (test=0.826) total time=   2.6s
[CV 4/5] END alpha=0.1, hidden_layer_sizes=(5, 5); AUC: (test=0.808) total time=   1.9s
[CV 5/5] END alpha=0.1, hidden_layer_sizes=(5, 5); AUC: (test=0.845) total time=   4.5s
[CV 2/5] END alpha=0.1, hidden_layer_sizes=(10,); AUC: (test=0.818) total time=   1.1s
[CV 4/5] END alpha=0.1, hidden_layer_sizes=(10,); AUC: (test=0.800) total time=   2.4s
[CV 1/5] END alpha=0.1, hidden_layer_sizes=(5, 10); AUC: (test=0.796) total time=   7.3s
[CV 4/5] END alpha=0.1, hidden_layer_sizes=(5, 10); AUC: (test=0.777) total time=   7.6s
[CV 1/5] END alpha=0.1, hidden_layer_sizes=(10, 5); AUC: (test=0.809) total time=   5.9s
[CV 4/5] END alpha=0.1, hidden_layer_sizes=(10, 5); AUC: (test=0.791) total time=   5.3s
[CV 4/5] END alpha=1, hidden_layer_sizes=(3, 3); AUC: (test=0.823) total time=   1.8s
[CV 3/5] END alpha=1, hidden_layer_sizes=(4, 2); AUC: (test=0.827) total time=   1.2s
[CV 1/5] END alpha=1, hidden_layer_sizes=(5, 5); AUC: (test=0.842) total time=   3.3s
[CV 2/5] END alpha=1, hidden_layer_sizes=(5,); AUC: (test=0.845) total time=   0.4s
[CV 4/5] END alpha=1, hidden_layer_sizes=(5,); AUC: (test=0.834) total time=   0.9s
[CV 2/5] END alpha=1, hidden_layer_sizes=(10,); AUC: (test=0.838) total time=   2.2s
[CV 2/5] END alpha=1, hidden_layer_sizes=(5, 10); AUC: (test=0.826) total time=   3.4s
[CV 5/5] END alpha=1, hidden_layer_sizes=(5, 10); AUC: (test=0.832) total time=   4.8s
[CV 1/5] END max_features=None, n_estimators=10; AUC: (test=0.710) total time=   0.5s
[CV 2/5] END max_features=None, n_estimators=50; AUC: (test=0.751) total time=   3.1s
[CV 2/5] END max_features=None, n_estimators=100; AUC: (test=0.756) total time=   6.6s
[CV 1/5] END max_features=None, n_estimators=200; AUC: (test=0.760) total time=  13.5s
[CV 5/5] END max_features=None, n_estimators=200; AUC: (test=0.750) total time=  13.6s
[CV 4/5] END max_features=None, n_estimators=500; AUC: (test=0.767) total time=  32.4s
[CV 3/5] END max_features=None, n_estimators=1000; AUC: (test=0.766) total time= 1.1min
[CV 1/5] END max_features=sqrt, n_estimators=100; AUC: (test=0.773) total time=   1.7s
[CV 2/5] END max_features=sqrt, n_estimators=100; AUC: (test=0.766) total time=   1.7s
[CV 5/5] END max_features=sqrt, n_estimators=100; AUC: (test=0.756) total time=   1.7s
[CV 1/5] END max_features=sqrt, n_estimators=200; AUC: (test=0.779) total time=   3.3s
[CV 4/5] END max_features=sqrt, n_estimators=200; AUC: (test=0.774) total time=   3.4s
[CV 5/5] END max_features=sqrt, n_estimators=200; AUC: (test=0.761) total time=   3.3s
[CV 2/5] END max_features=sqrt, n_estimators=500; AUC: (test=0.776) total time=   8.4s
[CV 5/5] END max_features=sqrt, n_estimators=500; AUC: (test=0.765) total time=   9.1s
[CV 3/5] END max_features=sqrt, n_estimators=1000; AUC: (test=0.783) total time=  17.2s
[CV 1/5] END max_features=log2, n_estimators=10; AUC: (test=0.721) total time=   0.2s
[CV 2/5] END max_features=log2, n_estimators=10; AUC: (test=0.714) total time=   0.2s
[CV 3/5] END max_features=log2, n_estimators=10; AUC: (test=0.711) total time=   0.2s
[CV 4/5] END max_features=log2, n_estimators=10; AUC: (test=0.697) total time=   0.2s
[CV 5/5] END max_features=log2, n_estimators=10; AUC: (test=0.703) total time=   0.2s
[CV 1/5] END max_features=log2, n_estimators=50; AUC: (test=0.770) total time=   0.9s
[CV 3/5] END max_features=log2, n_estimators=50; AUC: (test=0.772) total time=   0.8s
[CV 5/5] END max_features=log2, n_estimators=50; AUC: (test=0.750) total time=   0.9s
[CV 1/5] END max_features=log2, n_estimators=100; AUC: (test=0.773) total time=   1.7s
[CV 3/5] END max_features=log2, n_estimators=100; AUC: (test=0.780) total time=   1.7s
[CV 5/5] END max_features=log2, n_estimators=100; AUC: (test=0.756) total time=   1.7s
[CV 2/5] END max_features=log2, n_estimators=200; AUC: (test=0.774) total time=   3.8s
[CV 5/5] END max_features=log2, n_estimators=200; AUC: (test=0.761) total time=   3.9s
[CV 4/5] END max_features=log2, n_estimators=500; AUC: (test=0.778) total time=   8.7s
[CV 3/5] END max_features=log2, n_estimators=1000; AUC: (test=0.783) total time=  17.0s
[CV 1/5] END max_features=0.25, n_estimators=10; AUC: (test=0.721) total time=   0.2s
[CV 3/5] END max_features=0.25, n_estimators=10; AUC: (test=0.711) total time=   0.2s
[CV 5/5] END max_features=0.25, n_estimators=10; AUC: (test=0.703) total time=   0.2s
[CV 2/5] END max_features=0.25, n_estimators=50; AUC: (test=0.755) total time=   0.9s
[CV 4/5] END max_features=0.25, n_estimators=50; AUC: (test=0.757) total time=   1.0s
[CV 1/5] END max_features=0.25, n_estimators=100; AUC: (test=0.773) total time=   1.7s
[CV 3/5] END max_features=0.25, n_estimators=100; AUC: (test=0.780) total time=   1.8s
[CV 5/5] END max_features=0.25, n_estimators=100; AUC: (test=0.756) total time=   1.9s
[CV 3/5] END max_features=0.25, n_estimators=200; AUC: (test=0.782) total time=   3.6s
[CV 5/5] END max_features=0.25, n_estimators=200; AUC: (test=0.761) total time=   3.3s
[CV 3/5] END max_features=0.25, n_estimators=500; AUC: (test=0.782) total time=   8.0s
[CV 2/5] END max_features=0.25, n_estimators=1000; AUC: (test=0.776) total time=  19.9s
[CV 1/5] END max_features=0.5, n_estimators=10; AUC: (test=0.727) total time=   0.4s
[CV 2/5] END max_features=0.5, n_estimators=10; AUC: (test=0.723) total time=   0.4s
[CV 3/5] END max_features=0.5, n_estimators=10; AUC: (test=0.713) total time=   0.4s
[CV 4/5] END max_features=0.5, n_estimators=10; AUC: (test=0.729) total time=   0.3s
[CV 5/5] END max_features=0.5, n_estimators=10; AUC: (test=0.717) total time=   0.3s
[CV 1/5] END max_features=0.5, n_estimators=50; AUC: (test=0.764) total time=   1.7s
[CV 3/5] END max_features=0.5, n_estimators=50; AUC: (test=0.764) total time=   1.7s
[CV 5/5] END max_features=0.5, n_estimators=50; AUC: (test=0.751) total time=   1.6s
[CV 3/5] END max_features=0.5, n_estimators=100; AUC: (test=0.767) total time=   3.3s
[CV 1/5] END max_features=0.5, n_estimators=200; AUC: (test=0.768) total time=   6.2s
[CV 4/5] END max_features=0.5, n_estimators=200; AUC: (test=0.772) total time=   6.4s
[CV 3/5] END max_features=0.5, n_estimators=500; AUC: (test=0.774) total time=  15.7s
[CV 2/5] END max_features=0.5, n_estimators=1000; AUC: (test=0.771) total time=  30.5s
[CV 1/5] END max_features=0.75, n_estimators=10; AUC: (test=0.726) total time=   0.5s
[CV 2/5] END max_features=0.75, n_estimators=10; AUC: (test=0.711) total time=   0.5s
[CV 3/5] END max_features=0.75, n_estimators=10; AUC: (test=0.718) total time=   0.5s
[CV 4/5] END max_features=0.75, n_estimators=10; AUC: (test=0.716) total time=   0.5s
[CV 5/5] END max_features=0.75, n_estimators=10; AUC: (test=0.702) total time=   0.4s
[CV 1/5] END max_features=0.75, n_estimators=50; AUC: (test=0.756) total time=   2.2s
[CV 3/5] END max_features=0.75, n_estimators=50; AUC: (test=0.751) total time=   2.1s
[CV 5/5] END max_features=0.75, n_estimators=50; AUC: (test=0.745) total time=   2.2s
[CV 2/5] END max_features=0.75, n_estimators=100; AUC: (test=0.760) total time=   4.2s
[CV 5/5] END max_features=0.75, n_estimators=100; AUC: (test=0.751) total time=   4.4s
[CV 3/5] END max_features=0.75, n_estimators=200; AUC: (test=0.767) total time=   8.8s
[CV 2/5] END max_features=0.75, n_estimators=500; AUC: (test=0.767) total time=  25.9s
[CV 5/5] END max_features=0.75, n_estimators=500; AUC: (test=0.755) total time=  23.7s
[CV 4/5] END max_features=0.75, n_estimators=1000; AUC: (test=0.771) total time=  51.3s
[CV 5/5] END max_features=1.0, n_estimators=100; AUC: (test=0.752) total time=   6.6s
[CV 2/5] END max_features=1.0, n_estimators=200; AUC: (test=0.759) total time=  11.6s
[CV 1/5] END max_features=1.0, n_estimators=500; AUC: (test=0.763) total time=  28.8s
[CV 5/5] END max_features=1.0, n_estimators=500; AUC: (test=0.750) total time=  31.8s
[CV 4/5] END max_features=1.0, n_estimators=1000; AUC: (test=0.767) total time= 1.0min
[CV 2/5] END max_features=None, n_estimators=10; AUC: (test=0.785) total time=   0.5s
[CV 1/5] END max_features=None, n_estimators=50; AUC: (test=0.856) total time=   2.2s
[CV 5/5] END max_features=None, n_estimators=50; AUC: (test=0.853) total time=   2.4s
[CV 4/5] END max_features=None, n_estimators=100; AUC: (test=0.848) total time=   5.1s
[CV 3/5] END max_features=None, n_estimators=200; AUC: (test=0.858) total time=   9.4s
[CV 2/5] END max_features=None, n_estimators=500; AUC: (test=0.859) total time=  25.0s
[CV 1/5] END max_features=None, n_estimators=1000; AUC: (test=0.866) total time=  52.0s
[CV 5/5] END max_features=None, n_estimators=1000; AUC: (test=0.871) total time= 1.0min
[CV 1/5] END max_features=log2, n_estimators=10; AUC: (test=0.810) total time=   0.2s
[CV 2/5] END max_features=log2, n_estimators=10; AUC: (test=0.786) total time=   0.2s
[CV 3/5] END max_features=log2, n_estimators=10; AUC: (test=0.810) total time=   0.2s
[CV 4/5] END max_features=log2, n_estimators=10; AUC: (test=0.783) total time=   0.2s
[CV 5/5] END max_features=log2, n_estimators=10; AUC: (test=0.810) total time=   0.2s
[CV 1/5] END max_features=log2, n_estimators=50; AUC: (test=0.870) total time=   0.8s
[CV 3/5] END max_features=log2, n_estimators=50; AUC: (test=0.865) total time=   0.8s
[CV 5/5] END max_features=log2, n_estimators=50; AUC: (test=0.862) total time=   0.8s
[CV 2/5] END max_features=log2, n_estimators=100; AUC: (test=0.864) total time=   1.7s
[CV 4/5] END max_features=log2, n_estimators=100; AUC: (test=0.856) total time=   1.5s
[CV 1/5] END max_features=log2, n_estimators=200; AUC: (test=0.875) total time=   3.6s
[CV 3/5] END max_features=log2, n_estimators=200; AUC: (test=0.872) total time=   3.2s
[CV 2/5] END max_features=log2, n_estimators=500; AUC: (test=0.872) total time=   8.3s
[CV 1/5] END max_features=log2, n_estimators=1000; AUC: (test=0.874) total time=  18.9s
[CV 5/5] END max_features=log2, n_estimators=1000; AUC: (test=0.882) total time=  17.3s
[CV 4/5] END max_features=0.25, n_estimators=500; AUC: (test=0.867) total time=   9.0s
[CV 3/5] END max_features=0.25, n_estimators=1000; AUC: (test=0.876) total time=  18.1s
[CV 3/5] END max_features=0.5, n_estimators=50; AUC: (test=0.861) total time=   1.4s
[CV 5/5] END max_features=0.5, n_estimators=50; AUC: (test=0.863) total time=   1.4s
[CV 2/5] END max_features=0.5, n_estimators=100; AUC: (test=0.860) total time=   3.0s
[CV 5/5] END max_features=0.5, n_estimators=100; AUC: (test=0.877) total time=   2.7s
[CV 3/5] END max_features=0.5, n_estimators=200; AUC: (test=0.869) total time=   5.4s
[CV 2/5] END max_features=0.5, n_estimators=500; AUC: (test=0.866) total time=  14.6s
[CV 1/5] END max_features=0.5, n_estimators=1000; AUC: (test=0.871) total time=  31.1s
[CV 5/5] END max_features=0.5, n_estimators=1000; AUC: (test=0.880) total time=  30.6s
[CV 2/5] END max_features=0.75, n_estimators=500; AUC: (test=0.863) total time=  26.7s
[CV 1/5] END max_features=0.75, n_estimators=1000; AUC: (test=0.869) total time=  41.9s
[CV 5/5] END max_features=0.75, n_estimators=1000; AUC: (test=0.875) total time=  43.2s
[CV 2/5] END max_features=1.0, n_estimators=500; AUC: (test=0.859) total time=  34.2s
[CV 1/5] END max_features=1.0, n_estimators=1000; AUC: (test=0.866) total time=  48.3s
[CV 5/5] END max_features=1.0, n_estimators=1000; AUC: (test=0.871) total time=  45.3s
[CV 4/5] END learning_rate=0.01, max_depth=3, min_child_weight=1;, score=0.773 total time=   0.2s
[CV 3/5] END learning_rate=0.01, max_depth=3, min_child_weight=2;, score=0.762 total time=   0.2s
[CV 2/5] END learning_rate=0.01, max_depth=3, min_child_weight=3;, score=0.771 total time=   0.2s
[CV 5/5] END learning_rate=0.01, max_depth=3, min_child_weight=3;, score=0.753 total time=   0.1s
[CV 4/5] END learning_rate=0.01, max_depth=3, min_child_weight=4;, score=0.773 total time=   0.2s
[CV 3/5] END learning_rate=0.01, max_depth=3, min_child_weight=5;, score=0.762 total time=   0.1s
[CV 2/5] END learning_rate=0.01, max_depth=3, min_child_weight=6;, score=0.771 total time=   0.2s
[CV 1/5] END learning_rate=0.01, max_depth=4, min_child_weight=1;, score=0.768 total time=   0.2s
[CV 5/5] END learning_rate=0.01, max_depth=4, min_child_weight=1;, score=0.757 total time=   0.2s
[CV 4/5] END learning_rate=0.01, max_depth=4, min_child_weight=2;, score=0.779 total time=   0.2s
[CV 3/5] END learning_rate=0.01, max_depth=4, min_child_weight=3;, score=0.768 total time=   0.2s
[CV 2/5] END learning_rate=0.01, max_depth=4, min_child_weight=4;, score=0.774 total time=   0.1s
[CV 5/5] END learning_rate=0.01, max_depth=4, min_child_weight=4;, score=0.757 total time=   0.1s
[CV 4/5] END learning_rate=0.01, max_depth=4, min_child_weight=5;, score=0.779 total time=   0.2s
[CV 3/5] END learning_rate=0.01, max_depth=4, min_child_weight=6;, score=0.768 total time=   0.1s
[CV 2/5] END learning_rate=0.01, max_depth=5, min_child_weight=1;, score=0.777 total time=   0.3s
[CV 1/5] END learning_rate=0.01, max_depth=5, min_child_weight=2;, score=0.774 total time=   0.3s
[CV 1/5] END learning_rate=0.01, max_depth=5, min_child_weight=3;, score=0.774 total time=   0.2s
[CV 5/5] END learning_rate=0.01, max_depth=5, min_child_weight=3;, score=0.758 total time=   0.2s
[CV 4/5] END learning_rate=0.01, max_depth=5, min_child_weight=4;, score=0.780 total time=   0.2s
[CV 4/5] END learning_rate=0.01, max_depth=5, min_child_weight=5;, score=0.780 total time=   0.2s
[CV 3/5] END learning_rate=0.01, max_depth=5, min_child_weight=6;, score=0.772 total time=   0.2s
[CV 3/5] END learning_rate=0.01, max_depth=6, min_child_weight=1;, score=0.773 total time=   0.3s
[CV 2/5] END learning_rate=0.01, max_depth=6, min_child_weight=2;, score=0.776 total time=   0.2s
[CV 1/5] END learning_rate=0.01, max_depth=6, min_child_weight=3;, score=0.774 total time=   0.2s
[CV 5/5] END learning_rate=0.01, max_depth=6, min_child_weight=3;, score=0.763 total time=   0.2s
[CV 4/5] END learning_rate=0.01, max_depth=6, min_child_weight=4;, score=0.780 total time=   0.2s
[CV 3/5] END learning_rate=0.01, max_depth=6, min_child_weight=5;, score=0.777 total time=   0.2s
[CV 2/5] END learning_rate=0.01, max_depth=6, min_child_weight=6;, score=0.776 total time=   0.2s
[CV 1/5] END learning_rate=0.01, max_depth=7, min_child_weight=1;, score=0.777 total time=   0.3s
[CV 5/5] END learning_rate=0.01, max_depth=7, min_child_weight=1;, score=0.767 total time=   0.3s
[CV 4/5] END learning_rate=0.01, max_depth=7, min_child_weight=2;, score=0.782 total time=   0.2s
[CV 4/5] END learning_rate=0.01, max_depth=7, min_child_weight=3;, score=0.780 total time=   0.3s
[CV 2/5] END learning_rate=0.01, max_depth=7, min_child_weight=4;, score=0.776 total time=   0.3s
[CV 1/5] END learning_rate=0.01, max_depth=7, min_child_weight=5;, score=0.777 total time=   0.3s
[CV 5/5] END learning_rate=0.01, max_depth=7, min_child_weight=5;, score=0.765 total time=   0.3s
[CV 4/5] END learning_rate=0.01, max_depth=7, min_child_weight=6;, score=0.779 total time=   0.3s
[CV 4/5] END learning_rate=0.01, max_depth=8, min_child_weight=1;, score=0.780 total time=   0.4s
[CV 3/5] END learning_rate=0.01, max_depth=8, min_child_weight=2;, score=0.773 total time=   0.3s
[CV 2/5] END learning_rate=0.01, max_depth=8, min_child_weight=3;, score=0.774 total time=   0.4s
[CV 1/5] END learning_rate=0.01, max_depth=8, min_child_weight=4;, score=0.780 total time=   0.4s
[CV 5/5] END learning_rate=0.01, max_depth=8, min_child_weight=4;, score=0.764 total time=   0.3s
[CV 4/5] END learning_rate=0.01, max_depth=8, min_child_weight=5;, score=0.778 total time=   0.3s
[CV 4/5] END learning_rate=0.01, max_depth=8, min_child_weight=6;, score=0.779 total time=   0.3s
[CV 3/5] END learning_rate=0.01, max_depth=9, min_child_weight=1;, score=0.773 total time=   0.5s
[CV 2/5] END learning_rate=0.01, max_depth=9, min_child_weight=2;, score=0.772 total time=   0.4s
[CV 5/5] END learning_rate=0.01, max_depth=9, min_child_weight=2;, score=0.764 total time=   0.4s
[CV 4/5] END learning_rate=0.01, max_depth=9, min_child_weight=3;, score=0.779 total time=   0.4s
[CV 4/5] END learning_rate=0.01, max_depth=9, min_child_weight=4;, score=0.780 total time=   0.4s
[CV 3/5] END learning_rate=0.01, max_depth=9, min_child_weight=5;, score=0.775 total time=   0.4s
[CV 2/5] END learning_rate=0.01, max_depth=9, min_child_weight=6;, score=0.774 total time=   0.5s
[CV 1/5] END learning_rate=0.01, max_depth=10, min_child_weight=1;, score=0.772 total time=   0.5s
[CV 5/5] END learning_rate=0.01, max_depth=10, min_child_weight=1;, score=0.759 total time=   0.5s
[CV 4/5] END learning_rate=0.01, max_depth=10, min_child_weight=2;, score=0.776 total time=   0.5s
[CV 3/5] END learning_rate=0.01, max_depth=10, min_child_weight=3;, score=0.776 total time=   0.4s
[CV 2/5] END learning_rate=0.01, max_depth=10, min_child_weight=4;, score=0.772 total time=   0.5s
[CV 1/5] END learning_rate=0.01, max_depth=10, min_child_weight=5;, score=0.781 total time=   0.4s
[CV 5/5] END learning_rate=0.01, max_depth=10, min_child_weight=5;, score=0.761 total time=   0.4s
[CV 4/5] END learning_rate=0.01, max_depth=10, min_child_weight=6;, score=0.780 total time=   0.4s
[CV 5/5] END learning_rate=0.05, max_depth=3, min_child_weight=1;, score=0.768 total time=   0.1s
[CV 4/5] END learning_rate=0.05, max_depth=3, min_child_weight=2;, score=0.780 total time=   0.1s
[CV 3/5] END learning_rate=0.05, max_depth=3, min_child_weight=3;, score=0.779 total time=   0.2s
[CV 2/5] END learning_rate=0.05, max_depth=3, min_child_weight=4;, score=0.783 total time=   0.2s
[CV 1/5] END learning_rate=0.05, max_depth=3, min_child_weight=5;, score=0.775 total time=   0.2s
[CV 5/5] END learning_rate=0.05, max_depth=3, min_child_weight=5;, score=0.767 total time=   0.2s
[CV 4/5] END learning_rate=0.05, max_depth=3, min_child_weight=6;, score=0.781 total time=   0.2s
[CV 3/5] END learning_rate=0.05, max_depth=4, min_child_weight=1;, score=0.780 total time=   0.2s
[CV 2/5] END learning_rate=0.05, max_depth=4, min_child_weight=2;, score=0.785 total time=   0.1s
[CV 1/5] END learning_rate=0.05, max_depth=4, min_child_weight=3;, score=0.778 total time=   0.1s
[CV 5/5] END learning_rate=0.05, max_depth=4, min_child_weight=3;, score=0.768 total time=   0.2s
[CV 5/5] END learning_rate=0.05, max_depth=4, min_child_weight=4;, score=0.769 total time=   0.1s
[CV 4/5] END learning_rate=0.05, max_depth=4, min_child_weight=5;, score=0.786 total time=   0.1s
[CV 1/5] END learning_rate=0.05, max_depth=4, min_child_weight=6;, score=0.779 total time=   0.1s
[CV 5/5] END learning_rate=0.05, max_depth=4, min_child_weight=6;, score=0.768 total time=   0.1s
[CV 4/5] END learning_rate=0.05, max_depth=5, min_child_weight=1;, score=0.788 total time=   0.2s
[CV 2/5] END learning_rate=0.05, max_depth=5, min_child_weight=2;, score=0.782 total time=   0.2s
[CV 1/5] END learning_rate=0.05, max_depth=5, min_child_weight=3;, score=0.780 total time=   0.2s
[CV 5/5] END learning_rate=0.05, max_depth=5, min_child_weight=3;, score=0.769 total time=   0.2s
[CV 4/5] END learning_rate=0.05, max_depth=5, min_child_weight=4;, score=0.789 total time=   0.2s
[CV 3/5] END learning_rate=0.05, max_depth=5, min_child_weight=5;, score=0.782 total time=   0.2s
[CV 1/5] END learning_rate=0.05, max_depth=5, min_child_weight=6;, score=0.782 total time=   0.2s
[CV 5/5] END learning_rate=0.05, max_depth=5, min_child_weight=6;, score=0.769 total time=   0.2s
[CV 4/5] END learning_rate=0.05, max_depth=6, min_child_weight=1;, score=0.789 total time=   0.2s
[CV 3/5] END learning_rate=0.05, max_depth=6, min_child_weight=2;, score=0.782 total time=   0.2s
[CV 2/5] END learning_rate=0.05, max_depth=6, min_child_weight=3;, score=0.779 total time=   0.2s
[CV 5/5] END learning_rate=0.05, max_depth=6, min_child_weight=3;, score=0.769 total time=   0.2s
[CV 4/5] END learning_rate=0.05, max_depth=6, min_child_weight=4;, score=0.788 total time=   0.2s
[CV 3/5] END learning_rate=0.05, max_depth=6, min_child_weight=5;, score=0.786 total time=   0.2s
[CV 2/5] END learning_rate=0.05, max_depth=6, min_child_weight=6;, score=0.779 total time=   0.2s
[CV 1/5] END learning_rate=0.05, max_depth=7, min_child_weight=1;, score=0.780 total time=   0.3s
[CV 5/5] END learning_rate=0.05, max_depth=7, min_child_weight=1;, score=0.771 total time=   0.3s
[CV 4/5] END learning_rate=0.05, max_depth=7, min_child_weight=2;, score=0.788 total time=   0.4s
[CV 3/5] END learning_rate=0.05, max_depth=7, min_child_weight=3;, score=0.781 total time=   0.3s
[CV 3/5] END learning_rate=0.05, max_depth=7, min_child_weight=4;, score=0.782 total time=   0.3s
[CV 2/5] END learning_rate=0.05, max_depth=7, min_child_weight=5;, score=0.776 total time=   0.2s
[CV 5/5] END learning_rate=0.05, max_depth=7, min_child_weight=5;, score=0.773 total time=   0.3s
[CV 4/5] END learning_rate=0.05, max_depth=7, min_child_weight=6;, score=0.788 total time=   0.2s
[CV 3/5] END learning_rate=0.05, max_depth=8, min_child_weight=1;, score=0.775 total time=   0.3s
[CV 2/5] END learning_rate=0.05, max_depth=8, min_child_weight=2;, score=0.772 total time=   0.3s
[CV 1/5] END learning_rate=0.05, max_depth=8, min_child_weight=3;, score=0.782 total time=   0.3s
[CV 5/5] END learning_rate=0.05, max_depth=8, min_child_weight=3;, score=0.769 total time=   0.3s
[CV 4/5] END learning_rate=0.05, max_depth=8, min_child_weight=4;, score=0.787 total time=   0.3s
[CV 3/5] END learning_rate=0.05, max_depth=8, min_child_weight=5;, score=0.780 total time=   0.3s
[CV 2/5] END learning_rate=0.05, max_depth=8, min_child_weight=6;, score=0.775 total time=   0.2s
[CV 1/5] END learning_rate=0.05, max_depth=9, min_child_weight=1;, score=0.778 total time=   0.3s
[CV 5/5] END learning_rate=0.05, max_depth=9, min_child_weight=1;, score=0.768 total time=   0.5s
[CV 4/5] END learning_rate=0.05, max_depth=9, min_child_weight=2;, score=0.789 total time=   0.4s
[CV 3/5] END learning_rate=0.05, max_depth=9, min_child_weight=3;, score=0.774 total time=   0.4s
[CV 2/5] END learning_rate=0.05, max_depth=9, min_child_weight=4;, score=0.774 total time=   0.3s
[CV 1/5] END learning_rate=0.05, max_depth=9, min_child_weight=5;, score=0.783 total time=   0.4s
[CV 5/5] END learning_rate=0.05, max_depth=9, min_child_weight=5;, score=0.769 total time=   0.3s
[CV 5/5] END learning_rate=0.05, max_depth=9, min_child_weight=6;, score=0.766 total time=   0.4s
[CV 3/5] END learning_rate=0.05, max_depth=10, min_child_weight=1;, score=0.769 total time=   0.5s
[CV 3/5] END learning_rate=0.05, max_depth=10, min_child_weight=2;, score=0.770 total time=   0.4s
[CV 2/5] END learning_rate=0.05, max_depth=10, min_child_weight=3;, score=0.772 total time=   0.4s
[CV 1/5] END learning_rate=0.05, max_depth=10, min_child_weight=4;, score=0.782 total time=   0.3s
[CV 5/5] END learning_rate=0.05, max_depth=10, min_child_weight=4;, score=0.767 total time=   0.3s
[CV 4/5] END learning_rate=0.05, max_depth=10, min_child_weight=5;, score=0.785 total time=   0.3s
[CV 3/5] END learning_rate=0.05, max_depth=10, min_child_weight=6;, score=0.779 total time=   0.3s
[CV 2/5] END learning_rate=0.1, max_depth=3, min_child_weight=1;, score=0.784 total time=   0.1s
[CV 1/5] END learning_rate=0.1, max_depth=3, min_child_weight=2;, score=0.775 total time=   0.1s
[CV 5/5] END learning_rate=0.1, max_depth=3, min_child_weight=2;, score=0.769 total time=   0.1s
[CV 4/5] END learning_rate=0.1, max_depth=3, min_child_weight=3;, score=0.784 total time=   0.1s
[CV 3/5] END learning_rate=0.1, max_depth=3, min_child_weight=4;, score=0.778 total time=   0.1s
[CV 2/5] END learning_rate=0.1, max_depth=3, min_child_weight=5;, score=0.786 total time=   0.1s

XGBClassifier(base_score=None, booster=None, callbacks=None,
              colsample_bylevel=None, colsample_bynode=None,
              colsample_bytree=None, device=None, early_stopping_rounds=None,
              enable_categorical=True, eval_metric='logloss',
              feature_types=None, gamma=None, grow_policy=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=0.05, max_bin=None, max_cat_threshold=None,
              max_cat_to_onehot=None, max_delta_step=None, max_depth=6,
              max_leaves=None, min_child_weight=4, missing=nan,
              monotone_constraints=None, multi_strategy=None, n_estimators=None,
              n_jobs=None, num_parallel_tree=None, random_state=None, ...)

For control group

Code

# Create X_train, X_test, y_train, y_test for control group
X_train_control = cg_rct_stacked.loc[(cg_rct_stacked["training"] == 1) & (cg_rct_stacked["ad"] == 0), evar]
y_train_control = cg_rct_stacked.query("training == 1 & ad == 0").converted_yes

X_test_control = cg_rct_stacked.loc[(cg_rct_stacked["training"] == 0) & (cg_rct_stacked["ad"] == 0), evar]
y_test_control = cg_rct_stacked.query("training == 0 & ad == 0").converted_yes

Code

import warnings 
warnings.filterwarnings("ignore") 
# Use the same param_grid as the treatment group
xgbc_control = xgb.XGBClassifier(use_label_encoder=False, eval_metric="logloss", enable_categorical=True)

# Set up and fit GridSearchCV
xgbc_cv_control = GridSearchCV(xgbc_control, param_grid, scoring='roc_auc', cv=5, n_jobs=4, verbose=5)
xgbc_cv_control.fit(X_train_control, y_train_control)

# Retrieve the best parameters and retrain the model
best_params_control = xgbc_cv_control.best_params_
xgbc_control = xgb.XGBClassifier(**best_params_control, use_label_encoder=False, eval_metric="logloss", enable_categorical=True)
xgbc_control.fit(X_train_control, y_train_control)

Fitting 5 folds for each of 192 candidates, totalling 960 fits
[CV 1/5] END learning_rate=0.1, max_depth=3, min_child_weight=6;, score=0.777 total time=   0.1s
[CV 4/5] END learning_rate=0.1, max_depth=3, min_child_weight=6;, score=0.785 total time=   0.1s
[CV 3/5] END learning_rate=0.1, max_depth=4, min_child_weight=1;, score=0.782 total time=   0.1s
[CV 2/5] END learning_rate=0.1, max_depth=4, min_child_weight=2;, score=0.780 total time=   0.1s
[CV 1/5] END learning_rate=0.1, max_depth=4, min_child_weight=3;, score=0.780 total time=   0.1s
[CV 5/5] END learning_rate=0.1, max_depth=4, min_child_weight=3;, score=0.769 total time=   0.2s
[CV 4/5] END learning_rate=0.1, max_depth=4, min_child_weight=4;, score=0.783 total time=   0.2s
[CV 3/5] END learning_rate=0.1, max_depth=4, min_child_weight=5;, score=0.783 total time=   0.1s
[CV 2/5] END learning_rate=0.1, max_depth=4, min_child_weight=6;, score=0.782 total time=   0.1s
[CV 1/5] END learning_rate=0.1, max_depth=5, min_child_weight=1;, score=0.779 total time=   0.2s
[CV 5/5] END learning_rate=0.1, max_depth=5, min_child_weight=1;, score=0.767 total time=   0.2s
[CV 4/5] END learning_rate=0.1, max_depth=5, min_child_weight=2;, score=0.788 total time=   0.2s
[CV 3/5] END learning_rate=0.1, max_depth=5, min_child_weight=3;, score=0.779 total time=   0.2s
[CV 2/5] END learning_rate=0.1, max_depth=5, min_child_weight=4;, score=0.777 total time=   0.2s
[CV 2/5] END learning_rate=0.1, max_depth=5, min_child_weight=5;, score=0.779 total time=   0.2s
[CV 1/5] END learning_rate=0.1, max_depth=5, min_child_weight=6;, score=0.780 total time=   0.2s
[CV 5/5] END learning_rate=0.1, max_depth=5, min_child_weight=6;, score=0.771 total time=   0.2s
[CV 4/5] END learning_rate=0.1, max_depth=6, min_child_weight=1;, score=0.787 total time=   0.2s
[CV 3/5] END learning_rate=0.1, max_depth=6, min_child_weight=2;, score=0.781 total time=   0.2s
[CV 2/5] END learning_rate=0.1, max_depth=6, min_child_weight=3;, score=0.775 total time=   0.2s
[CV 1/5] END learning_rate=0.1, max_depth=6, min_child_weight=4;, score=0.785 total time=   0.2s
[CV 5/5] END learning_rate=0.1, max_depth=6, min_child_weight=4;, score=0.767 total time=   0.2s
[CV 4/5] END learning_rate=0.1, max_depth=6, min_child_weight=5;, score=0.786 total time=   0.2s
[CV 3/5] END learning_rate=0.1, max_depth=6, min_child_weight=6;, score=0.784 total time=   0.2s
[CV 2/5] END learning_rate=0.1, max_depth=7, min_child_weight=1;, score=0.769 total time=   0.3s
[CV 1/5] END learning_rate=0.1, max_depth=7, min_child_weight=2;, score=0.782 total time=   0.3s
[CV 5/5] END learning_rate=0.1, max_depth=7, min_child_weight=2;, score=0.769 total time=   0.3s
[CV 4/5] END learning_rate=0.1, max_depth=7, min_child_weight=3;, score=0.783 total time=   0.2s
[CV 3/5] END learning_rate=0.1, max_depth=7, min_child_weight=4;, score=0.778 total time=   0.2s
[CV 2/5] END learning_rate=0.1, max_depth=7, min_child_weight=5;, score=0.773 total time=   0.2s
[CV 2/5] END learning_rate=0.1, max_depth=7, min_child_weight=6;, score=0.770 total time=   0.2s
[CV 1/5] END learning_rate=0.1, max_depth=8, min_child_weight=1;, score=0.776 total time=   0.3s
[CV 5/5] END learning_rate=0.1, max_depth=8, min_child_weight=1;, score=0.765 total time=   0.3s
[CV 4/5] END learning_rate=0.1, max_depth=8, min_child_weight=2;, score=0.781 total time=   0.6s
[CV 3/5] END learning_rate=0.1, max_depth=8, min_child_weight=3;, score=0.776 total time=   0.7s
[CV 2/5] END learning_rate=0.1, max_depth=8, min_child_weight=4;, score=0.770 total time=   0.4s
[CV 1/5] END learning_rate=0.1, max_depth=8, min_child_weight=5;, score=0.781 total time=   0.4s
[CV 2/5] END learning_rate=0.1, max_depth=8, min_child_weight=6;, score=0.768 total time=   0.3s
[CV 1/5] END learning_rate=0.1, max_depth=9, min_child_weight=1;, score=0.770 total time=   0.4s
[CV 1/5] END learning_rate=0.1, max_depth=9, min_child_weight=2;, score=0.777 total time=   0.3s
[CV 5/5] END learning_rate=0.1, max_depth=9, min_child_weight=2;, score=0.763 total time=   0.4s
[CV 5/5] END learning_rate=0.1, max_depth=9, min_child_weight=3;, score=0.767 total time=   0.5s
[CV 5/5] END learning_rate=0.1, max_depth=9, min_child_weight=4;, score=0.761 total time=   0.3s
[CV 4/5] END learning_rate=0.1, max_depth=9, min_child_weight=5;, score=0.780 total time=   0.4s
[CV 3/5] END learning_rate=0.1, max_depth=9, min_child_weight=6;, score=0.776 total time=   0.3s
[CV 2/5] END learning_rate=0.1, max_depth=10, min_child_weight=1;, score=0.758 total time=   0.5s
[CV 1/5] END learning_rate=0.1, max_depth=10, min_child_weight=2;, score=0.769 total time=   0.4s
[CV 4/5] END learning_rate=0.1, max_depth=10, min_child_weight=2;, score=0.778 total time=   0.4s
[CV 3/5] END learning_rate=0.1, max_depth=10, min_child_weight=3;, score=0.765 total time=   0.4s
[CV 2/5] END learning_rate=0.1, max_depth=10, min_child_weight=4;, score=0.765 total time=   0.4s
[CV 3/5] END learning_rate=0.1, max_depth=10, min_child_weight=5;, score=0.766 total time=   0.4s
[CV 2/5] END learning_rate=0.1, max_depth=10, min_child_weight=6;, score=0.762 total time=   0.4s
[CV 1/5] END learning_rate=0.2, max_depth=3, min_child_weight=1;, score=0.776 total time=   0.1s
[CV 3/5] END learning_rate=0.2, max_depth=3, min_child_weight=1;, score=0.780 total time=   0.1s
[CV 5/5] END learning_rate=0.2, max_depth=3, min_child_weight=1;, score=0.768 total time=   0.2s
[CV 1/5] END learning_rate=0.2, max_depth=3, min_child_weight=3;, score=0.774 total time=   0.2s
[CV 1/5] END learning_rate=0.2, max_depth=3, min_child_weight=4;, score=0.777 total time=   0.1s
[CV 4/5] END learning_rate=0.2, max_depth=3, min_child_weight=4;, score=0.783 total time=   0.2s
[CV 5/5] END learning_rate=0.2, max_depth=3, min_child_weight=5;, score=0.771 total time=   0.2s
[CV 4/5] END learning_rate=0.2, max_depth=3, min_child_weight=6;, score=0.783 total time=   0.2s
[CV 4/5] END learning_rate=0.2, max_depth=4, min_child_weight=1;, score=0.783 total time=   0.2s
[CV 3/5] END learning_rate=0.2, max_depth=4, min_child_weight=2;, score=0.776 total time=   0.3s
[CV 2/5] END learning_rate=0.2, max_depth=4, min_child_weight=3;, score=0.772 total time=   0.2s
[CV 1/5] END learning_rate=0.2, max_depth=4, min_child_weight=4;, score=0.775 total time=   0.2s
[CV 5/5] END learning_rate=0.2, max_depth=4, min_child_weight=4;, score=0.765 total time=   0.1s
[CV 4/5] END learning_rate=0.2, max_depth=4, min_child_weight=5;, score=0.782 total time=   0.1s
[CV 3/5] END learning_rate=0.2, max_depth=4, min_child_weight=6;, score=0.781 total time=   0.1s
[CV 2/5] END learning_rate=0.2, max_depth=5, min_child_weight=1;, score=0.764 total time=   0.2s
[CV 2/5] END learning_rate=0.2, max_depth=5, min_child_weight=2;, score=0.767 total time=   0.2s
[CV 1/5] END learning_rate=0.2, max_depth=5, min_child_weight=3;, score=0.773 total time=   0.2s
[CV 5/5] END learning_rate=0.2, max_depth=5, min_child_weight=3;, score=0.760 total time=   0.2s
[CV 4/5] END learning_rate=0.2, max_depth=5, min_child_weight=4;, score=0.777 total time=   0.2s
[CV 3/5] END learning_rate=0.2, max_depth=5, min_child_weight=5;, score=0.774 total time=   0.2s
[CV 2/5] END learning_rate=0.2, max_depth=5, min_child_weight=6;, score=0.773 total time=   0.2s
[CV 1/5] END learning_rate=0.2, max_depth=6, min_child_weight=1;, score=0.768 total time=   0.3s
[CV 5/5] END learning_rate=0.2, max_depth=6, min_child_weight=1;, score=0.760 total time=   0.2s
[CV 5/5] END learning_rate=0.2, max_depth=6, min_child_weight=2;, score=0.756 total time=   0.2s
[CV 4/5] END learning_rate=0.2, max_depth=6, min_child_weight=3;, score=0.777 total time=   0.2s
[CV 3/5] END learning_rate=0.2, max_depth=6, min_child_weight=4;, score=0.769 total time=   0.2s
[CV 2/5] END learning_rate=0.2, max_depth=6, min_child_weight=5;, score=0.765 total time=   0.2s
[CV 1/5] END learning_rate=0.2, max_depth=6, min_child_weight=6;, score=0.774 total time=   0.3s
[CV 5/5] END learning_rate=0.2, max_depth=6, min_child_weight=6;, score=0.761 total time=   0.2s
[CV 4/5] END learning_rate=0.2, max_depth=7, min_child_weight=1;, score=0.771 total time=   0.3s
[CV 3/5] END learning_rate=0.2, max_depth=7, min_child_weight=2;, score=0.758 total time=   0.3s
[CV 2/5] END learning_rate=0.2, max_depth=7, min_child_weight=3;, score=0.761 total time=   0.4s
[CV 1/5] END learning_rate=0.2, max_depth=7, min_child_weight=4;, score=0.775 total time=   0.5s
[CV 1/5] END learning_rate=0.2, max_depth=7, min_child_weight=5;, score=0.767 total time=   0.3s
[CV 5/5] END learning_rate=0.2, max_depth=7, min_child_weight=5;, score=0.759 total time=   0.3s
[CV 4/5] END learning_rate=0.2, max_depth=7, min_child_weight=6;, score=0.772 total time=   0.3s
[CV 3/5] END learning_rate=0.2, max_depth=8, min_child_weight=1;, score=0.751 total time=   0.4s
[CV 2/5] END learning_rate=0.2, max_depth=8, min_child_weight=2;, score=0.753 total time=   0.5s
[CV 1/5] END learning_rate=0.2, max_depth=8, min_child_weight=3;, score=0.763 total time=   0.4s
[CV 1/5] END learning_rate=0.2, max_depth=8, min_child_weight=4;, score=0.763 total time=   0.3s
[CV 5/5] END learning_rate=0.2, max_depth=8, min_child_weight=4;, score=0.755 total time=   0.4s
[CV 4/5] END learning_rate=0.2, max_depth=8, min_child_weight=5;, score=0.775 total time=   0.4s
[CV 3/5] END learning_rate=0.2, max_depth=8, min_child_weight=6;, score=0.763 total time=   0.4s
[CV 2/5] END learning_rate=0.2, max_depth=9, min_child_weight=1;, score=0.752 total time=   0.9s
[CV 1/5] END learning_rate=0.2, max_depth=9, min_child_weight=2;, score=0.760 total time=   0.9s
[CV 5/5] END learning_rate=0.2, max_depth=9, min_child_weight=2;, score=0.756 total time=   0.8s
[CV 4/5] END learning_rate=0.2, max_depth=9, min_child_weight=3;, score=0.765 total time=   0.4s
[CV 3/5] END learning_rate=0.2, max_depth=9, min_child_weight=4;, score=0.760 total time=   0.3s
[CV 3/5] END learning_rate=0.2, max_depth=9, min_child_weight=5;, score=0.753 total time=   0.6s
[CV 2/5] END learning_rate=0.2, max_depth=9, min_child_weight=6;, score=0.743 total time=   0.7s
[CV 1/5] END learning_rate=0.2, max_depth=10, min_child_weight=1;, score=0.759 total time=   0.7s
[CV 5/5] END learning_rate=0.2, max_depth=10, min_child_weight=1;, score=0.747 total time=   0.5s
[CV 5/5] END learning_rate=0.2, max_depth=10, min_child_weight=2;, score=0.754 total time=   0.5s
[CV 4/5] END learning_rate=0.2, max_depth=10, min_child_weight=3;, score=0.759 total time=   0.5s
[CV 3/5] END learning_rate=0.2, max_depth=10, min_child_weight=4;, score=0.757 total time=   0.4s
[CV 2/5] END learning_rate=0.2, max_depth=10, min_child_weight=5;, score=0.750 total time=   0.4s
[CV 1/5] END learning_rate=0.2, max_depth=10, min_child_weight=6;, score=0.760 total time=   0.4s
[CV 5/5] END learning_rate=0.2, max_depth=10, min_child_weight=6;, score=0.748 total time=   0.2s
[CV 3/5] END learning_rate=0.01, max_depth=3, min_child_weight=1;, score=0.861 total time=   0.2s
[CV 3/5] END learning_rate=0.01, max_depth=3, min_child_weight=2;, score=0.860 total time=   0.2s
[CV 3/5] END learning_rate=0.01, max_depth=3, min_child_weight=3;, score=0.860 total time=   0.2s
[CV 4/5] END learning_rate=0.01, max_depth=3, min_child_weight=3;, score=0.849 total time=   0.2s
[CV 3/5] END learning_rate=0.01, max_depth=3, min_child_weight=5;, score=0.861 total time=   0.2s
[CV 4/5] END learning_rate=0.01, max_depth=3, min_child_weight=5;, score=0.850 total time=   0.2s
[CV 1/5] END learning_rate=0.01, max_depth=4, min_child_weight=1;, score=0.870 total time=   0.2s
[CV 2/5] END learning_rate=0.01, max_depth=4, min_child_weight=1;, score=0.863 total time=   0.2s
[CV 4/5] END learning_rate=0.01, max_depth=4, min_child_weight=2;, score=0.859 total time=   0.2s
[CV 5/5] END learning_rate=0.01, max_depth=4, min_child_weight=2;, score=0.878 total time=   0.1s
[CV 2/5] END learning_rate=0.01, max_depth=4, min_child_weight=4;, score=0.863 total time=   0.2s
[CV 3/5] END learning_rate=0.01, max_depth=4, min_child_weight=4;, score=0.870 total time=   0.1s
[CV 5/5] END learning_rate=0.01, max_depth=4, min_child_weight=5;, score=0.877 total time=   0.2s
[CV 1/5] END learning_rate=0.01, max_depth=4, min_child_weight=6;, score=0.870 total time=   0.2s
[CV 3/5] END learning_rate=0.01, max_depth=5, min_child_weight=1;, score=0.880 total time=   0.2s
[CV 4/5] END learning_rate=0.01, max_depth=5, min_child_weight=1;, score=0.867 total time=   0.2s
[CV 1/5] END learning_rate=0.01, max_depth=5, min_child_weight=3;, score=0.876 total time=   0.3s
[CV 2/5] END learning_rate=0.01, max_depth=5, min_child_weight=3;, score=0.868 total time=   0.2s
[CV 4/5] END learning_rate=0.01, max_depth=5, min_child_weight=4;, score=0.868 total time=   0.2s
[CV 5/5] END learning_rate=0.01, max_depth=5, min_child_weight=4;, score=0.882 total time=   0.2s
[CV 2/5] END learning_rate=0.01, max_depth=5, min_child_weight=6;, score=0.869 total time=   0.2s
[CV 3/5] END learning_rate=0.01, max_depth=5, min_child_weight=6;, score=0.881 total time=   0.2s
[CV 5/5] END learning_rate=0.01, max_depth=6, min_child_weight=1;, score=0.883 total time=   0.3s
[CV 1/5] END learning_rate=0.01, max_depth=6, min_child_weight=2;, score=0.879 total time=   0.2s
[CV 3/5] END learning_rate=0.01, max_depth=6, min_child_weight=3;, score=0.877 total time=   0.2s
[CV 4/5] END learning_rate=0.01, max_depth=6, min_child_weight=3;, score=0.869 total time=   0.2s
[CV 1/5] END learning_rate=0.01, max_depth=6, min_child_weight=5;, score=0.878 total time=   0.2s
[CV 2/5] END learning_rate=0.01, max_depth=6, min_child_weight=5;, score=0.870 total time=   0.3s
[CV 4/5] END learning_rate=0.01, max_depth=6, min_child_weight=6;, score=0.870 total time=   0.2s
[CV 5/5] END learning_rate=0.01, max_depth=6, min_child_weight=6;, score=0.884 total time=   0.2s
[CV 5/5] END learning_rate=0.01, max_depth=7, min_child_weight=1;, score=0.885 total time=   0.3s
[CV 1/5] END learning_rate=0.01, max_depth=7, min_child_weight=2;, score=0.878 total time=   0.3s
[CV 3/5] END learning_rate=0.01, max_depth=7, min_child_weight=3;, score=0.879 total time=   0.2s
[CV 4/5] END learning_rate=0.01, max_depth=7, min_child_weight=3;, score=0.866 total time=   0.3s
[CV 1/5] END learning_rate=0.01, max_depth=7, min_child_weight=5;, score=0.878 total time=   0.2s
[CV 2/5] END learning_rate=0.01, max_depth=7, min_child_weight=5;, score=0.871 total time=   0.3s
[CV 4/5] END learning_rate=0.01, max_depth=7, min_child_weight=6;, score=0.870 total time=   0.2s
[CV 5/5] END learning_rate=0.01, max_depth=7, min_child_weight=6;, score=0.885 total time=   0.3s
[CV 2/5] END learning_rate=0.01, max_depth=8, min_child_weight=2;, score=0.873 total time=   0.3s
[CV 3/5] END learning_rate=0.01, max_depth=8, min_child_weight=2;, score=0.878 total time=   0.3s
[CV 5/5] END learning_rate=0.01, max_depth=8, min_child_weight=3;, score=0.883 total time=   0.4s
[CV 1/5] END learning_rate=0.01, max_depth=8, min_child_weight=4;, score=0.877 total time=   0.3s
[CV 3/5] END learning_rate=0.01, max_depth=8, min_child_weight=5;, score=0.880 total time=   0.3s
[CV 4/5] END learning_rate=0.01, max_depth=8, min_child_weight=5;, score=0.868 total time=   0.4s
[CV 1/5] END learning_rate=0.01, max_depth=9, min_child_weight=1;, score=0.868 total time=   0.5s
[CV 2/5] END learning_rate=0.01, max_depth=9, min_child_weight=1;, score=0.868 total time=   0.5s
[CV 4/5] END learning_rate=0.01, max_depth=9, min_child_weight=2;, score=0.869 total time=   0.5s
[CV 5/5] END learning_rate=0.01, max_depth=9, min_child_weight=2;, score=0.879 total time=   0.5s
[CV 2/5] END learning_rate=0.01, max_depth=9, min_child_weight=4;, score=0.873 total time=   0.5s
[CV 3/5] END learning_rate=0.01, max_depth=9, min_child_weight=4;, score=0.880 total time=   0.5s
[CV 5/5] END learning_rate=0.01, max_depth=9, min_child_weight=5;, score=0.883 total time=   0.5s
[CV 1/5] END learning_rate=0.01, max_depth=9, min_child_weight=6;, score=0.877 total time=   0.5s
[CV 3/5] END learning_rate=0.01, max_depth=10, min_child_weight=1;, score=0.878 total time=   0.7s
[CV 4/5] END learning_rate=0.01, max_depth=10, min_child_weight=1;, score=0.868 total time=   0.6s
[CV 1/5] END learning_rate=0.01, max_depth=10, min_child_weight=3;, score=0.870 total time=   0.5s
[CV 2/5] END learning_rate=0.01, max_depth=10, min_child_weight=3;, score=0.874 total time=   0.4s

XGBClassifier(base_score=None, booster=None, callbacks=None,
              colsample_bylevel=None, colsample_bynode=None,
              colsample_bytree=None, device=None, early_stopping_rounds=None,
              enable_categorical=True, eval_metric='logloss',
              feature_types=None, gamma=None, grow_policy=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=0.1, max_bin=None, max_cat_threshold=None,
              max_cat_to_onehot=None, max_delta_step=None, max_depth=3,
              max_leaves=None, min_child_weight=5, missing=nan,
              monotone_constraints=None, multi_strategy=None, n_estimators=None,
              n_jobs=None, num_parallel_tree=None, random_state=None, ...)

Code

X_full = cg_rct_stacked[evar]
cg_rct_stacked["pred_treatment_xgb"] = xgbc_treatment.predict_proba(X_full)[:, 1]
cg_rct_stacked["pred_control_xgb"] = xgbc_control.predict_proba(X_full)[:, 1]

3. Calculate the Uplift and Incremental Uplift

Code

cg_rct_stacked["uplift_score_xgb"] = (
    cg_rct_stacked.pred_treatment_xgb - cg_rct_stacked.pred_control_xgb
)
cg_rct_stacked['uplift_score_xgb']

0        0.042117
1        0.043018
2        0.004007
3        0.034363
4       -0.059559
           ...   
29995    0.004342
29996   -0.001307
29997    0.225445
29998    0.159661
29999    0.121746
Name: uplift_score_xgb, Length: 60000, dtype: float32

Code

cg_rct_stacked['uplift_score_xgb'].describe()

count    60000.000000
mean         0.065285
std          0.159799
min         -0.842148
25%          0.018705
50%          0.053788
75%          0.125166
max          0.792246
Name: uplift_score_xgb, dtype: float64

Code

uplift_tab_xgb = rsm.uplift_tab(
    cg_rct_stacked.query("training == 0"), "converted", "yes", "uplift_score_xgb", "ad", 1, qnt = 20
)
uplift_tab_xgb

	pred	bins	cum_prop	T_resp	T_n	C_resp	C_n	incremental_resp	inc_uplift	uplift
0	uplift_score_xgb	1	0.05	217	450	81	625	158.680000	1.763111	0.352622
1	uplift_score_xgb	2	0.10	357	900	110	1118	268.449016	2.982767	0.252288
2	uplift_score_xgb	3	0.15	475	1350	136	1639	362.980476	4.033116	0.212318
3	uplift_score_xgb	4	0.20	597	1800	167	2159	457.768874	5.086321	0.211496
4	uplift_score_xgb	5	0.25	681	2250	199	2674	513.554226	5.706158	0.124531
5	uplift_score_xgb	6	0.30	747	2700	209	3214	571.424393	6.349160	0.128148
6	uplift_score_xgb	7	0.35	800	3150	222	3723	612.167607	6.801862	0.092238
7	uplift_score_xgb	8	0.40	850	3600	238	4254	648.589563	7.206551	0.080979
8	uplift_score_xgb	9	0.45	895	4050	252	4777	681.351266	7.570570	0.073231
9	uplift_score_xgb	10	0.50	939	4500	267	5252	710.230008	7.891445	0.066199
10	uplift_score_xgb	11	0.55	962	4950	279	5710	720.134851	8.001498	0.024910
11	uplift_score_xgb	12	0.60	978	5400	283	6161	729.955851	8.110621	0.026686
12	uplift_score_xgb	13	0.65	1002	5850	288	6571	745.600670	8.284452	0.041138
13	uplift_score_xgb	14	0.70	1011	6300	290	6982	749.327127	8.325857	0.015134
14	uplift_score_xgb	15	0.75	1026	6750	296	7451	757.848074	8.420534	0.020540
15	uplift_score_xgb	16	0.80	1037	7200	301	7884	762.114155	8.467935	0.012897
16	uplift_score_xgb	17	0.85	1049	7650	307	8307	766.280607	8.514229	0.012482
17	uplift_score_xgb	18	0.90	1097	8100	350	8656	769.481516	8.549795	-0.016543
18	uplift_score_xgb	19	0.95	1141	8550	435	8851	720.793244	8.008814	-0.338120
19	uplift_score_xgb	20	1.00	1174	9000	512	9000	662.000000	7.355556	-0.443445

Code

fig = rsm.inc_uplift_plot(
    cg_rct_stacked.query("training == 0"), "converted", "yes", "uplift_score_xgb", "ad", 1, qnt = 20
)

Code

fig = rsm.uplift_plot(
    cg_rct_stacked.query("training == 0"), "converted", "yes", "uplift_score_xgb", "ad", 1, qnt = 20
)

4. Use the incremental_resp to calculate the profits

Code

uplift_profit_xgb = prof_calc(uplift_tab_xgb, 14.99, 1.5)
uplift_profit_xgb

57642.371278982806

5. Calculate the uplift and Increatmental Uplift for Propensity Model

Code

propensity_tab_xgb = rsm.uplift_tab(
    cg_rct_stacked.query("training == 0"), "converted", "yes", "pred_treatment_xgb", "ad", 1, qnt = 20)
propensity_tab_xgb

	pred	bins	cum_prop	T_resp	T_n	C_resp	C_n	incremental_resp	inc_uplift	uplift
0	pred_treatment_xgb	1	0.05	216	450	82	587	153.137990	1.701533	0.340307
1	pred_treatment_xgb	2	0.10	358	900	131	1067	247.503280	2.750036	0.213472
2	pred_treatment_xgb	3	0.15	505	1350	182	1522	343.567674	3.817419	0.214579
3	pred_treatment_xgb	4	0.20	602	1800	228	2002	397.004995	4.411167	0.119722
4	pred_treatment_xgb	5	0.25	700	2250	277	2463	446.954933	4.966166	0.111487
5	pred_treatment_xgb	6	0.30	782	2700	332	2893	472.148635	5.246096	0.054315
6	pred_treatment_xgb	7	0.35	844	3150	362	3312	499.706522	5.552295	0.066179
7	pred_treatment_xgb	8	0.40	899	3600	398	3733	515.180016	5.724222	0.036712
8	pred_treatment_xgb	9	0.45	954	4050	427	4162	538.490630	5.983229	0.054623
9	pred_treatment_xgb	10	0.50	993	4500	452	4614	552.167750	6.135197	0.031357
10	pred_treatment_xgb	11	0.55	1035	4950	473	5008	567.478035	6.305312	0.040034
11	pred_treatment_xgb	12	0.60	1067	5400	489	5468	584.081200	6.489791	0.036329
12	pred_treatment_xgb	13	0.65	1096	5850	500	5929	602.662169	6.696246	0.040583
13	pred_treatment_xgb	14	0.70	1109	6300	504	6407	613.417044	6.815745	0.020521
14	pred_treatment_xgb	15	0.75	1122	6750	506	6828	621.780316	6.908670	0.024138
15	pred_treatment_xgb	16	0.80	1136	7200	506	7212	630.841930	7.009355	0.031111
16	pred_treatment_xgb	17	0.85	1145	7650	507	7646	637.734763	7.085942	0.017696
17	pred_treatment_xgb	18	0.90	1158	8100	509	8103	649.188449	7.213205	0.024513
18	pred_treatment_xgb	19	0.95	1167	8550	510	8561	657.655297	7.307281	0.017817
19	pred_treatment_xgb	20	1.00	1174	9000	512	9000	662.000000	7.355556	0.011000

Code

fig = rsm.inc_uplift_plot(
    cg_rct_stacked.query("training == 0"), "converted", "yes", "pred_treatment_xgb", "ad", 1, qnt = 20)

The curve starts at 0% uplift when 0% of the population is targeted, which makes sense as no one has yet been exposed to the campaign.
As the percentage of the targeted population increases, the incremental uplift also increases. This indicates that targeting more of the population is initially resulting in higher incremental gains.
The curve shows a steep rise in incremental uplift at the beginning, suggesting that the early segments of the population targeted are highly responsive to the campaign.
The rate of increase in incremental uplift begins to slow down as the curve extends towards the right, which suggests diminishing returns; the incremental gains from targeting additional portions of the population decrease.
The curve eventually starts to plateau, indicating that there is a point at which the incremental benefits do not significantly increase with additional targeting. This plateau represents the point of maximum efficiency in targeting efforts.

The graph is indicative of a typical response to a well-targeted marketing campaign, where the most responsive individuals are targeted first, leading to higher uplifts early on. As less responsive individuals are targeted, the incremental uplift decreases

Code

fig = rsm.uplift_plot(
    cg_rct_stacked.query("training == 0"), 
    "converted", "yes", "pred_treatment_xgb", "ad", 1, qnt = 20)

High Responsiveness in Early Segments: The first few bars are the tallest, indicating the highest uplift. This suggests that the first segment of the population (likely the top 5% or 10%) is the most responsive to the campaign.
Diminishing Returns: As we move right across the x-axis to include larger percentages of the population, the size of the uplift decreases. This pattern is typical in targeted marketing, where you see the greatest response rate increases in the first few population segments.
Consistent Drop: The consistent drop-off in uplift as the population segments progress suggests that the most responsive individuals are targeted first, followed by progressively less responsive segments.
Positive Uplift Across Segments: All the bars are above the 0% line, which indicates that each segment experienced a positive uplift due to the campaign. This means that even the least responsive segments targeted still had a positive response, although it was not as strong as the initial segments.

Compare Uplift model and Propensity model

Code

fig = rsm.inc_uplift_plot(
    cg_rct_stacked.query("training == 0"),
    "converted",
    "yes",
    ["pred_treatment_xgb", "uplift_score_xgb"],
    "ad",
    1, qnt = 20
)

Uplift Model Performance: The uplift_score_xgb curve is consistently above the pred_treatment_xgb curve. This indicates that the uplift model is predicting a higher incremental uplift across the different segments of the targeted population compared to the treatment model.
Propensity Model: The pred_treatment_xgb curve shows that the propensity model also predicts an increase in incremental uplift with more of the population targeted. However, the uplift is not as high as the one predicted by the uplift model, suggesting that the propensity model might be useful, but not optimal.
Diminishing Returns: Both models show an initial rapid increase in incremental uplift, which slows and eventually plateaus as the percentage of the population targeted increases. This is indicative of diminishing returns; beyond a certain point, targeting additional people results in smaller increases in incremental uplift.
Optimal Targeting Point: The curves suggest that there is an optimal point for targeting, after which the incremental benefits of targeting additional individuals begin to decrease. Identifying this point can help optimize marketing efforts and budget allocation.

In short, the uplift_score_xgb model appears to be more effective for targeting the right individuals, likely because it accounts for the incremental effect of the treatment, identifying those who are influenced by the campaign as opposed to those who would respond without it.

Code

fig = rsm.uplift_plot(
    cg_rct_stacked.query("training == 0"),
    "converted",
    "yes",
    ["pred_treatment_xgb", "uplift_score_xgb"],
    "ad",
    1, qnt = 20
)

High Responsiveness in Early Segments: The initial segments show a higher uplift for both models. This indicates that these segments of the population are more responsive to the marketing campaign.
Decreasing Uplift: As the segments progress from left to right (targeting a larger share of the population), the uplift decreases. This trend is typical in targeted marketing, where the most responsive individuals are usually targeted first.
Negative Uplift in Later Segments: For the segments on the right, the uplift becomes negative. This could mean that targeting these individuals might be counterproductive, potentially leading to a negative reaction to the marketing campaign or an unnecessary cost for individuals who would have made a purchase without any intervention.
Comparison Between Models: In most segments, the uplift_score_xgb bars appear to have a higher uplift than the pred_treatment_xgb bars, suggesting that the uplift model is more effective at identifying which segments of the population will provide a higher incremental uplift when targeted.

6. Use the incremental_resp to calculate the profits for Propensity Model

Code

propensity_profit_xgb = prof_calc(propensity_tab_xgb, 14.99, 1.5)
propensity_profit_xgb

44331.39261063743

Code

# Difference in profits from using uplift model and propensity model
difference_xgb = uplift_profit_xgb - propensity_profit_xgb
difference_xgb

13310.978668345379

Results

Code

mod_perf = pd.DataFrame({
    "model": ["Logistic", "Neural Network", "Random Forest", "XGBoost"],
    "incremental_profit_uplift": [uplift_profit_logit, uplift_profit_nn, uplift_profit_rf, uplift_profit_xgb],
    "incremental_profit_propensity": [propensity_profit_logit, propensity_profit_nn, propensity_profit_rf, propensity_profit_xgb],
    "difference": [difference_logit, difference_nn, difference_rf, difference_xgb]
})
mod_perf.sort_values("difference", ascending=False)

	model	incremental_profit_uplift	incremental_profit_propensity	difference
0	Logistic	45907.759762	32065.482935	13842.276826
3	XGBoost	57642.371279	44331.392611	13310.978668
2	Random Forest	58311.164733	48633.386279	9677.778454
1	Neural Network	55034.956633	45823.831691	9211.124941

Code

incremental_profit_dct = {
    "Logistic": uplift_profit_logit,
    "Neural Network": uplift_profit_nn,
    "Random Forest": uplift_profit_rf,
    "XGBoost": uplift_profit_xgb
}


import seaborn as sns
import matplotlib.pyplot as plt

# Convert dictionary to DataFrame
df = pd.DataFrame(list(incremental_profit_dct.items()), columns=['Model', 'IncrementalProfit'])
plt.figure(figsize=(12, 5))  # Adjust the width and height to your preference
# Plot
sns.set(style="white")
ax = sns.barplot(x="Model", y="IncrementalProfit", data=df, palette="magma")

# Annotations
for index, row in df.iterrows():
    ax.text(index, row.IncrementalProfit, f'${row.IncrementalProfit:.2f}', ha='center')

# Set labels and title
ax.set_xlabel("Model Type", fontdict={'family': 'serif', 'color': 'black', 'size': 12})
ax.set_ylabel("Incremental Profit", fontdict={'family': 'serif', 'color': 'black', 'size': 12})
ax.set_title("Incremental Profit by Model", fontdict={'family': 'serif', 'color': 'black', 'size': 15})

plt.xticks(rotation=45)  # Rotate x labels for better readability
plt.show()

Code

difference_dct = {
    "Logistic": difference_logit,
    "Neural Network": difference_nn,
    "Random Forest": difference_rf,
    "XGBoost": difference_xgb
}

# Convert dictionary to DataFrame
df = pd.DataFrame(list(difference_dct.items()), columns=['Model', 'IncrementalProfit'])
plt.figure(figsize=(12, 5))  # Adjust the width and height to your preference
# Plot
sns.set(style="white")
ax = sns.barplot(x="Model", y="IncrementalProfit", data=df, palette="magma")

# Annotations
for index, row in df.iterrows():
    ax.text(index, row.IncrementalProfit, f'${row.IncrementalProfit:.2f}', ha='center')

# Set labels and title
ax.set_xlabel("Model Type", fontdict={'family': 'serif', 'color': 'black', 'size': 12})
ax.set_ylabel("Incremental Profit", fontdict={'family': 'serif', 'color': 'black', 'size': 12})
ax.set_title("Difference by Model", fontdict={'family': 'serif', 'color': 'black', 'size': 15})

plt.xticks(rotation=45)  # Rotate x labels for better readability
plt.show()

Part II

1. What formula would you use to select customers to target using a propensity model if your goal is to maximize expected profits?

To select customers to target using a propensity to buy model with the goal of maximizing our expected profits, we want to use a formula that incorporates both the propensity to buy and the expected profit from each customer if they do purchase. This approach allows the prioritization of customers not just based on their likelihood to buy, but also on the value they bring. The propensity model is a more general approach that can be used to predict the likelihood of a customer to purchase a product or service. The formula to select customers to target using a propensity model if the goal is to maximize expected profits is as follows:

\[ \text{Customers to target} = (\text{price}* {pred_i} - \text{cost}) > 0 \]

Code

price = 14.99
cost = 1.50

Code

def round_to_nearest_5(x):
    return round(x*20) / 20

Code

# Run the function accross all models and predictions
models = ['logit', 'nn', 'rf', 'xgb']
predictions = ['pred_treatment', 'pred_treatment_nn', 'pred_treatment_rf', 'pred_treatment_xgb']

# Define optimization function for profit
def propensity_prof_opt(data, price = 14.99, cost = 1.5):
    total_customers = 120000
    result = []
    for model, prediction in zip(models, predictions):
        data['EP'] = data[prediction] * price - cost
        data['ads_logit'] = data['EP'] > 0
        perc = np.nanmean(data['ads_logit'])
        target_customers = perc * total_customers
        rounded_perc = round_to_nearest_5(perc)
        result.append([model, perc, rounded_perc, target_customers])
    return pd.DataFrame(result, columns = ['model','perc_customer', 'rounded_perc', 'target_customers']).sort_values('perc_customer', ascending = False)

propensity_prof_opt = propensity_prof_opt(cg_rct_stacked.query("training == 1 & ad == 1"))
propensity_prof_opt

	model	perc_customer	rounded_perc	target_customers
3	xgb	0.485571	0.50	58268.571429
0	logit	0.478524	0.50	57422.857143
1	nn	0.439286	0.45	52714.285714
2	rf	0.223524	0.20	26822.857143

Therefore, we want to target customers using logit propensity model for each tuned model to maximize the expected profits.

Logistic Regression Model: 57,422 customers
Neural Network Model: 54,005 customers
Random Forest Model: 27222 customers
XGBoost Model: 58,268 customers

Uplift Expected Profit

If we wanted to target customers using an uplift model, we will select customers with the highest predicted uplift scores, where uplift is the incremental impact of the treatment (e.g., receiving an ad) on the customer’s probability of making a purchase.

The expected incremental profit can be calculated by multiplying the uplift score by the profit per conversion minus the cost of targeting the client. The formula is listed below:

\[ \text{Customers to target} = ({\text{Uplift Score} * \text{profit per conversion}) - \text{Cost per conversion}} \]

The uplift score is calculated by subtracting the probability of the customer to purchase without the treatment from the probability of the customer to purchase with the treatment.

\[ \text{Uplift Score} = \text{P(Outcome|Treatment)} - \text{P(Outcome|Control)} \]

Code

# Optimizing the uplift model profit
# Run the function accross all models and uplift scores
models = ['logit', 'nn', 'rf', 'xgb']
uplift_scores = ['uplift_score', 'uplift_score_nn', 'uplift_score_rf', 'uplift_score_xgb']

# Define optimization function for profit
def uplift_prof_opt(data, price = 14.99, cost = 1.5):
    total_customers = 120000
    result = []
    for model, uplift_score in zip(models,uplift_scores):
        data['EIP'] = (data[uplift_score] * price) - cost
        data['ads_logit'] = data['EIP'] > 0
        perc = np.nanmean(data['ads_logit'])
        target_customers = perc * total_customers
        rounded_perc = round_to_nearest_5(perc)
        result.append([model, perc, rounded_perc, target_customers])
    return pd.DataFrame(result, columns = ['model','perc_customer', 'rounded_perc', 'target_customers']).sort_values('perc_customer', ascending = False)

uplift_prof_opt = uplift_prof_opt(cg_rct_stacked.query("training == 1 & ad == 1"))
uplift_prof_opt

	model	perc_customer	rounded_perc	target_customers
1	nn	0.295000	0.30	35400.000000
3	xgb	0.283048	0.30	33965.714286
0	logit	0.221905	0.20	26628.571429
2	rf	0.138810	0.15	16657.142857

Therefore, we want to target customers using uplift model for each tuned model to maximize the expected profits.

Logistic Regression Model: 26,628 customers
Neural Network Model: 36,068 customers
Random Forest Model: 16,982 customers
XGBoost Model: 33,965 customers

Logistic Model

3. Uplift and Propensity Expected Profit

In questions 1 and 2, we calculated the optimal target of customers. for the propensity and uplift models. When we round to the nearest 5%, we will then use these percentages to calculate the number of customers to target out of the 120K customer base as well as calculate the expected incremental profits.

To calculate the incremental profits we will follow the general steps of: 1. Determining the incremental response. As we can see, the incremental_resp column shows the additional responses attributed to the treatment of the ad for each bin of customers. 2. We will then want to calculate the incremental profits for each bin. To do this, you will multiply the incremental response by the average profit per response, and we will assume an average profit per response value if not provided. 3. Finally, we will sum the incremental profits across all the bins to get the total incremental profit.

Code

# Define the function to calculate the profit
def prof_calc_opt(data, total_customers, percent_target, price = 14.99, cost = 1.5):
    # Given variables
    target_customers = total_customers * percent_target
    target_prop = percent_target
    # Calculate the scale factor
    scale_factor = 120000 / 9000

    # Calculate the expected incremental customers and profits
    target_row = data[data['cum_prop'] <= target_prop].iloc[-1]  # Ensure data is sorted if necessary
    profit = (price * target_row['incremental_resp'] - cost * target_row['T_n']) * scale_factor
    return profit

Code

# Propensity Model
propensity_profits_logit = prof_calc_opt(propensity_tab, 120000, 0.5, 14.99, 1.5)
propensity_profits_logit

799.8637508283598

Code

# Uplift Model
uplift_profits_logit = prof_calc_opt(uplift_tab, 120000, 0.2, 14.99, 1.5)
uplift_profits_logit

44387.751724137925

Code

logit_mod = pd.DataFrame({
    "model": ["Uplift", "Propensity"],
    "incremental_profits": [uplift_profits_logit, propensity_profits_logit]
})
logit_mod.sort_values("incremental_profits", ascending=False)
logit_mod

	model	incremental_profits
0	Uplift	44387.751724
1	Propensity	799.863751

Neural Network

Code

# Propensity Model
propensity_profits_nn = prof_calc_opt(prop_tab_nn, 120000, 0.45, 14.99, 1.5)
propensity_profits_nn

17515.540061466905

Code

# Uplift Model
uplift_profits_nn = prof_calc_opt(uplift_tab_nn, 120000, 0.3, 14.99, 1.5)
uplift_profits_nn

57060.084573350156

Code

nn_mod = pd.DataFrame({
    "model": ["Uplift", "Propensity"],
    "incremental_profits": [uplift_profits_nn, propensity_profits_nn]
})
nn_mod.sort_values("incremental_profits", ascending=False)
nn_mod

	model	incremental_profits
0	Uplift	57060.084573
1	Propensity	17515.540061

Random Forest

Code

# Propensity Model
propensity_profits_rf = prof_calc_opt(prop_tab_rf, 120000, 0.25, 14.99, 1.5)
propensity_profits_rf

48633.38627925748

Code

# Uplift Model
uplift_profits_rf = prof_calc_opt(uplift_tab_rf, 120000, 0.15, 14.99, 1.5)
uplift_profits_rf

48476.592645314355

Code

rf_mod = pd.DataFrame({
    "model": ["Uplift", "Propensity"],
    "incremental_profits": [uplift_profits_rf, propensity_profits_rf]
})
rf_mod.sort_values("incremental_profits", ascending=False)
rf_mod

	model	incremental_profits
0	Uplift	48476.592645
1	Propensity	48633.386279

XGBoost Model

Code

# Propensity Model
propensity_profits_xgb = prof_calc_opt(propensity_tab_xgb, 120000, 0.5, 14.99, 1.5)
propensity_profits_xgb

20359.9276983095

Code

# Uplift Model
uplift_profits_xgb = prof_calc_opt(uplift_tab_xgb, 120000, 0.3, 14.99, 1.5)
uplift_profits_xgb

60208.6887367766

Code

xgb_mod = pd.DataFrame({
    "model": ["Uplift", "Propensity"],
    "incremental_profits": [uplift_profits_xgb, propensity_profits_xgb]
})
xgb_mod.sort_values("incremental_profits", ascending=False)
xgb_mod

	model	incremental_profits
0	Uplift	60208.688737
1	Propensity	20359.927698

Code

uplift_profit_dct = {
    "Logit_Uplift": uplift_profits_logit,
    "NN_Uplift": uplift_profits_nn, 
    "RF_Uplift": uplift_profits_rf,
    "XGB_Uplift": uplift_profits_xgb}

Code

propensity_profit_dct = {
    "Logit_Propensity": propensity_profits_logit,
    "NN_Propensity": propensity_profits_nn, 
    "RF_Propensity": propensity_profits_nn,
    "XGB_Propensity": propensity_profits_xgb}

Code

# Convert dictionary to DataFrame
df = pd.DataFrame(list(uplift_profit_dct.items()), columns=['Model', 'IncrementalProfit'])
plt.figure(figsize=(12, 5))  # Adjust the width and height to your preference
# Plot
sns.set(style="white")
ax = sns.barplot(x="Model", y="IncrementalProfit", data=df, palette="magma")

# Annotations
for index, row in df.iterrows():
    ax.text(index, row.IncrementalProfit, f'${row.IncrementalProfit:.2f}', ha='center')

# Set labels and title
ax.set_xlabel("Model Type", fontdict={'family': 'serif', 'color': 'black', 'size': 12})
ax.set_ylabel("Incremental Profit", fontdict={'family': 'serif', 'color': 'black', 'size': 12})
ax.set_title("Uplift Incremental Profit by Model", fontdict={'family': 'serif', 'color': 'black', 'size': 15})

plt.xticks(rotation=45)  # Rotate x labels for better readability
plt.show()

Code

# Convert dictionary to DataFrame
df = pd.DataFrame(list(propensity_profit_dct.items()), columns=['Model', 'IncrementalProfit'])
plt.figure(figsize=(12, 5))  # Adjust the width and height to your preference
# Plot
sns.set(style="white")
ax = sns.barplot(x="Model", y="IncrementalProfit", data=df, palette="magma")

# Annotations
for index, row in df.iterrows():
    ax.text(index, row.IncrementalProfit, f'${row.IncrementalProfit:.2f}', ha='center')

# Set labels and title
ax.set_xlabel("Model Type", fontdict={'family': 'serif', 'color': 'black', 'size': 12})
ax.set_ylabel("Incremental Profit", fontdict={'family': 'serif', 'color': 'black', 'size': 12})
ax.set_title("Propensity Incremental Profit by Model", fontdict={'family': 'serif', 'color': 'black', 'size': 15})

plt.xticks(rotation=45)  # Rotate x labels for better readability
plt.show()

In this situation where the goal is to maximize the total number of conversions, a propensity model shows to be more effective. It targets a broader audience, including those who are already likely to convert without any intervention. The uplift model, on the other hand, targets only those who are likely to convert because of the intervention. This is why the propensity model is more effective in this case.

The propensity model is also used for broad targeting and segmentation, helping to focus resources on high-probability events. The uplift model is used for optimizing marketing campaigns and other interventions by targeting those individuals whose behavior can be positively changed by the intervention, thus maximizing ROI and reducing wastage. This causes the propensity model to capture a higher profit, where as the uplift model captures a higher ROI.

In essence, while both models are used to enhance decision-making in marketing and other fields, they serve different strategic purposes: propensity models for predicting actions based on existing propensities, and uplift models for identifying the additional impact of a particular action or intervention.