Conjoint

Author

Duyen Tran

Published

May 23, 2024

This assignment uses uses the MNL model to analyze (1) yogurt purchase data made by consumers at a retail location, and (2) conjoint data about consumer preferences for minivans.

1. Estimating Yogurt Preferences

Likelihood for the Multi-nomial Logit (MNL) Model

Suppose we have \(i=1,\ldots,n\) consumers who each select exactly one product \(j\) from a set of \(J\) products. The outcome variable is the identity of the product chosen \(y_i \in \{1, \ldots, J\}\) or equivalently a vector of \(J-1\) zeros and \(1\) one, where the \(1\) indicates the selected product. For example, if the third product was chosen out of 4 products, then either \(y=3\) or \(y=(0,0,1,0)\) depending on how we want to represent it. Suppose also that we have a vector of data on each product \(x_j\) (eg, size, price, etc.).

We model the consumer’s decision as the selection of the product that provides the most utility, and we’ll specify the utility function as a linear function of the product characteristics:

\[ U_{ij} = x_j'\beta + \epsilon_{ij} \]

where \(\epsilon_{ij}\) is an i.i.d. extreme value error term.

The choice of the i.i.d. extreme value error term leads to a closed-form expression for the probability that consumer \(i\) chooses product \(j\):

\[ \mathbb{P}_i(j) = \frac{e^{x_j'\beta}}{\sum_{k=1}^Je^{x_k'\beta}} \]

For example, if there are 4 products, the probability that consumer \(i\) chooses product 3 is:

\[ \mathbb{P}_i(3) = \frac{e^{x_3'\beta}}{e^{x_1'\beta} + e^{x_2'\beta} + e^{x_3'\beta} + e^{x_4'\beta}} \]

A clever way to write the individual likelihood function for consumer \(i\) is the product of the \(J\) probabilities, each raised to the power of an indicator variable (\(\delta_{ij}\)) that indicates the chosen product:

\[ L_i(\beta) = \prod_{j=1}^J \mathbb{P}_i(j)^{\delta_{ij}} = \mathbb{P}_i(1)^{\delta_{i1}} \times \ldots \times \mathbb{P}_i(J)^{\delta_{iJ}}\]

Notice that if the consumer selected product \(j=3\), then \(\delta_{i3}=1\) while \(\delta_{i1}=\delta_{i2}=\delta_{i4}=0\) and the likelihood is:

\[ L_i(\beta) = \mathbb{P}_i(1)^0 \times \mathbb{P}_i(2)^0 \times \mathbb{P}_i(3)^1 \times \mathbb{P}_i(4)^0 = \mathbb{P}_i(3) = \frac{e^{x_3'\beta}}{\sum_{k=1}^Je^{x_k'\beta}} \]

The joint likelihood (across all consumers) is the product of the \(n\) individual likelihoods:

\[ L_n(\beta) = \prod_{i=1}^n L_i(\beta) = \prod_{i=1}^n \prod_{j=1}^J \mathbb{P}_i(j)^{\delta_{ij}} \]

And the joint log-likelihood function is:

\[ \ell_n(\beta) = \sum_{i=1}^n \sum_{j=1}^J \delta_{ij} \log(\mathbb{P}_i(j)) \]

Yogurt Dataset

We will use the yogurt_data dataset, which provides anonymized consumer identifiers (id), a vector indicating the chosen product (y1:y4), a vector indicating if any products were “featured” in the store as a form of advertising (f1:f4), and the products’ prices (p1:p4). For example, consumer 1 purchased yogurt 4 at a price of 0.079/oz and none of the yogurts were featured/advertised at the time of consumer 1’s purchase. Consumers 2 through 7 each bought yogurt 2, etc.

Data Overview

Variable Description
id anonymized consumer identifiers.
y1, y2, y3, y4 a vector indicating the chosen product.
f1, f2, f3, f4 a vector indicating if any products were “featured” in the store as a form of advertising
p1, p2, p3, p4 the products’ prices
Code
yogurt_data = pd.read_csv('yogurt_data.csv')
yogurt_data.head()
id y1 y2 y3 y4 f1 f2 f3 f4 p1 p2 p3 p4
0 1 0 0 0 1 0 0 0 0 0.108 0.081 0.061 0.079
1 2 0 1 0 0 0 0 0 0 0.108 0.098 0.064 0.075
2 3 0 1 0 0 0 0 0 0 0.108 0.098 0.061 0.086
3 4 0 1 0 0 0 0 0 0 0.108 0.098 0.061 0.086
4 5 0 1 0 0 0 0 0 0 0.125 0.098 0.049 0.079
Code
yogurt_data.describe(include='all')
id y1 y2 y3 y4 f1 f2 f3 f4 p1 p2 p3 p4
count 2430.0000 2430.000000 2430.000000 2430.000000 2430.000000 2430.000000 2430.000000 2430.000000 2430.000000 2430.000000 2430.000000 2430.000000 2430.000000
mean 1215.5000 0.341975 0.401235 0.029218 0.227572 0.055556 0.039506 0.037449 0.037449 0.106248 0.081532 0.053622 0.079507
std 701.6249 0.474469 0.490249 0.168452 0.419351 0.229109 0.194836 0.189897 0.189897 0.020587 0.011047 0.008054 0.007714
min 1.0000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -0.012000 0.000000 0.025000 0.004000
25% 608.2500 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.103000 0.081000 0.050000 0.079000
50% 1215.5000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.108000 0.086000 0.054000 0.079000
75% 1822.7500 1.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.115000 0.086000 0.061000 0.086000
max 2430.0000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 0.193000 0.111000 0.086000 0.104000

Statistics Summary:

  • There are 2,430 records.

  • Binary fields (y1, y2, y3, y4, f1, f2, f3, f4) indicate varying levels of frequency with which different yogurts were chosen or conditions were met.

  • Price or index fields (p1, p2, p3, p4) show distributions with differing means, minima, and maxima, suggesting variability in yogurt pricing or attributes across the samples.

Let the vector of product features include brand dummy variables for yogurts 1-3 (we’ll omit a dummy for product 4 to avoid multi-collinearity), a dummy variable to indicate if a yogurt was featured, and a continuous variable for the yogurts’ prices:

\[ x_j' = \left[ \mathbb{1}(\text{Yogurt 1}), \mathbb{1}(\text{Yogurt 2}), \mathbb{1}(\text{Yogurt 3}), X_f, X_p \right] \]

The “hard part” of the MNL likelihood function is organizing the data, as we need to keep track of 3 dimensions (consumer \(i\), covariate \(k\), and product \(j\)) instead of the typical 2 dimensions for cross-sectional regression models (consumer \(i\) and covariate \(k\)).

What we would like to do is reorganize the data from a “wide” shape with \(n\) rows and multiple columns for each covariate, to a “long” shape with \(n \times J\) rows and a single column for each covariate. As part of this re-organization, we’ll add binary variables to indicate the first 3 products; the variables for featured and price are included in the dataset and simply need to be “pivoted” or “melted” from wide to long.

Reshape and prep the data
Code
# Melt the data into a long format
long_data = pd.melt(yogurt_data, id_vars=['id'], 
                    value_vars=['y1', 'y2', 'y3', 'y4', 'f1', 'f2', 'f3', 'f4', 'p1', 'p2', 'p3', 'p4'],
                    var_name='product_feature', value_name='value')

# Extract product and feature types from the 'product_feature' column
long_data['product'] = long_data['product_feature'].str.extract('(\d)').astype(int)
long_data['feature'] = long_data['product_feature'].str.extract('([a-z]+)')

# Pivot the table to get one row per consumer per product
reshaped_yogurt = long_data.pivot_table(index=['id', 'product'], columns='feature', values='value', aggfunc='first').reset_index()

# Add the binary indicators for the first three yogurts
for j in range(1, 4):
    reshaped_yogurt[f'Yogurt{j}'] = (reshaped_yogurt['product'] == j).astype(int)

# Ensure the resulting DataFrame is correctly structured
reshaped_yogurt
feature id product f p y Yogurt1 Yogurt2 Yogurt3
0 1 1 0.0 0.108 0.0 1 0 0
1 1 2 0.0 0.081 0.0 0 1 0
2 1 3 0.0 0.061 0.0 0 0 1
3 1 4 0.0 0.079 1.0 0 0 0
4 2 1 0.0 0.108 0.0 1 0 0
... ... ... ... ... ... ... ... ...
9715 2429 4 0.0 0.086 1.0 0 0 0
9716 2430 1 0.0 0.108 0.0 1 0 0
9717 2430 2 0.0 0.086 0.0 0 1 0
9718 2430 3 0.0 0.043 0.0 0 0 1
9719 2430 4 0.0 0.079 1.0 0 0 0

9720 rows × 8 columns

Estimation

Code up the log-likelihood function.
Code
def log_likelihood(beta, X, choices):
    # Utility calculation
    utility = X.dot(beta)
    # Exponentiated utilities
    exp_util = np.exp(utility)
    # Sum of exponentiated utilities across choices
    sum_exp_util = np.sum(exp_util.reshape(-1, 4), axis=1)
    # Compute choice probabilities
    probabilities = exp_util / np.repeat(sum_exp_util, 4)
    # Log of probabilities of chosen alternatives
    log_likelihood = np.log(probabilities) * choices

    return np.sum(log_likelihood)

Use optimize() in Python to find the MLEs for the 5 parameters (\(\beta_1, \beta_2, \beta_3, \beta_f, \beta_p\)).

Code
def negative_log_likelihood(beta, X, choices):
    # Utility calculation
    utility = X.dot(beta)
    # Exponentiated utilities
    exp_util = np.exp(utility)
    # Sum of exponentiated utilities across choices
    sum_exp_util = np.sum(exp_util.reshape(-1, 4), axis=1)
    # Compute choice probabilities
    probabilities = exp_util / np.repeat(sum_exp_util, 4)
    # Log of probabilities of chosen alternatives
    log_likelihood = np.log(probabilities) * choices

    return -np.sum(log_likelihood)

# Prepare the input matrix X and the choice vector
n_products = 4  # There are 4 products
features = ['Yogurt1', 'Yogurt2', 'Yogurt3', 'f', 'p']
X = reshaped_yogurt[features].values
choices = reshaped_yogurt['y'].values

# Define initial guesses for the parameters
initial_beta = np.zeros(len(features))

# Rerun the optimization with numpy properly imported and initial_beta defined
result = minimize(negative_log_likelihood, initial_beta, args=(X, choices))
Coef = result.x


Coef_table = pd.DataFrame({
    'Variables': features,
    'Coeficient': Coef
})

Coef_table
Variables Coeficient
0 Yogurt1 1.387751
1 Yogurt2 0.643505
2 Yogurt3 -3.086113
3 f 0.487415
4 p -37.057828
Code
# unique_choices = reshaped_yogurt['product'].nunique()


# X = reshaped_yogurt[features]
# X = sm.add_constant(reshaped_yogurt[features])
# choices = reshaped_yogurt['y']



# mnl_model = sm.MNLogit(choices, X).fit()
# mnl_summary = mnl_model.summary()
# mnl_results_table = mnl_summary.tables[1]

# # To display or print out the table
# print(mnl_results_table)

Discussion

The estimated parameters for the three yogurt product intercepts are:

\(\beta_1\) = 1.39

\(\beta_2\) = 0.64

\(\beta_3\) = - 3.09

These coefficients represent the intrinsic utilities (or preferences) of the three yogurt products when all other variables (such as price and whether the product was featured) are held constant. Here’s how to interpret these intercepts in the context of consumer preferences:

\(\beta_1\) (Yogurt 1): The positive and highest value among the three suggests that Yogurt 1 is the most preferred when no other attributes (like price or features) are considered. It has the highest intrinsic utility.

\(\beta_2\) (Yogurt 2): This is also positive but lower than \(\beta_1\) , indicating that Yogurt 2 is less preferred than Yogurt 1 but still has a positive intrinsic appeal compared to a baseline (which could be another product not included in these three, like Yogurt 4 in this analysis).

\(\beta_3\) (Yogurt 3): The negative value here suggests that Yogurt 3 is least preferred among the three, having a lower intrinsic utility relative to the others.

Given these interpretations, Yogurt 1 appears to be the most preferred option among the first three, followed by Yogurt 2, with Yogurt 3 being the least preferred under the assumption that other factors are equal. This intrinsic preference could be driven by factors not explicitly modeled but captured by the intercepts, such as brand affinity, flavor preferences, or other unobserved attributes associated with each product.

Use the estimated price coefficient as a dollar-per-util conversion factor. Use this conversion factor to calculate the dollar benefit between the most-preferred yogurt (the one with the highest intercept) and the least preferred yogurt (the one with the lowest intercept). This is a per-unit monetary measure of brand value.

Code
# Extracted beta values for Yogurt 1 and Yogurt 3 and the price coefficient
beta_1 = 1.39
beta_3 = -3.09
beta_p = -37.06  # The negative price coefficient

# Calculate utility difference
utility_difference = beta_1 - beta_3

# Convert utility difference to dollar benefit using the price coefficient
dollar_benefit = utility_difference / abs(beta_p)

print("Per-unit monetary measure of brand value is ", round(dollar_benefit, 4))
Per-unit monetary measure of brand value is  0.1209

The per-unit monetary measure of brand value between the most-preferred yogurt (Yogurt 1) and the least-preferred yogurt (Yogurt 3) is approximately $0.12 per unit. This means that, in terms of brand value, consumers might be willing to pay an extra 12 cents per unit for Yogurt 1 compared to Yogurt 3, based solely on their preference (utility difference) as captured by the model. This is a useful way to quantify the monetary value of consumer preferences in this context

One benefit of the MNL model is that we can simulate counterfactuals (eg, what if the price of yogurt 1 was $0.10/oz instead of $0.08/oz).

Calculate the market shares in the market at the time the data were collected. Then, increase the price of yogurt 1 by $0.10 and use your fitted model to predict p(y|x) for each consumer and each product (this should be a matrix of \(N \times 4\) estimated choice probabilities). Take the column averages to get the new, expected market shares that result from the $0.10 price increase to yogurt 1. Do the yogurt 1 market shares decrease?

Code
#reload the function cuz the disconnecting between tasks
def negative_log_likelihood(beta, X, choices):
    # Utility calculation
    utility = X.dot(beta)
    # Exponentiated utilities
    exp_util = np.exp(utility)
    # Sum of exponentiated utilities across choices
    sum_exp_util = np.sum(exp_util.reshape(-1, 4), axis=1)
    # Compute choice probabilities
    probabilities = exp_util / np.repeat(sum_exp_util, 4)
    # Log of probabilities of chosen alternatives
    log_likelihood = np.log(probabilities) * choices

    return -np.sum(log_likelihood)

# Prepare the input matrix X and the choice vector
n_products = 4  # There are 4 products
features = ['Yogurt1', 'Yogurt2', 'Yogurt3', 'f', 'p']
X = reshaped_yogurt[features].values
choices = reshaped_yogurt['y'].values

# Define initial guesses for the parameters
initial_beta = np.zeros(len(features))

# Rerun the optimization with numpy properly imported and initial_beta defined
result = minimize(negative_log_likelihood, initial_beta, args=(X, choices))
Code
def calculate_probabilities(beta, X):
    utility = X.dot(beta)
    exp_util = np.exp(utility)
    sum_exp_util = np.sum(exp_util.reshape(-1, n_products), axis=1)
    probabilities = exp_util / np.repeat(sum_exp_util, n_products)
    return probabilities.reshape(-1, n_products)

estimated_beta = result.x
# Calculate the initial choice probabilities for all products and all consumers
initial_probabilities = calculate_probabilities(estimated_beta, X)

# Calculate the current market shares by taking the mean of probabilities across all consumers for each product
current_market_shares = np.mean(initial_probabilities, axis=0)

# Display the new market shares
current_market_shares_df = pd.DataFrame({
    'Product': ['Yogurt 1', 'Yogurt 2', 'Yogurt 3', 'Yogurt 4'],
    'Current Market Share': current_market_shares
})

current_market_shares_df
Product Current Market Share
0 Yogurt 1 0.341975
1 Yogurt 2 0.401235
2 Yogurt 3 0.029218
3 Yogurt 4 0.227572

The current market shares for the four yogurt products are approximately:

Yogurt 1: 34.2%

Yogurt 2: 40.1%

Yogurt 3: 2.9%

Yogurt 4: 22.8%

Next, let’s increase the price of Yogurt 1 by $0.10 and then use the fitted model to predict the new choice probabilities. We’ll see how the market shares change, particularly for Yogurt 1, as a result of this price increase.

Code
# Increase the price of Yogurt 1 by $0.10
# First, create a new X matrix with the updated price for Yogurt 1
X_new_prices = X.copy()
price_increase = 0.10
X_new_prices[:, 4][X_new_prices[:, 0] == 1] += price_increase  # Only increase the price in the entries for Yogurt 1

# Calculate the new choice probabilities with the increased price of Yogurt 1
new_probabilities = calculate_probabilities(estimated_beta, X_new_prices)

# Calculate the new market shares by taking the mean of new probabilities across all consumers for each product
new_market_shares = np.mean(new_probabilities, axis=0)

# Display the new market shares
new_market_shares_df = pd.DataFrame({
    'Product': ['Yogurt 1', 'Yogurt 2', 'Yogurt 3', 'Yogurt 4'],
    'New Market Share': new_market_shares
})

new_market_shares_df
Product New Market Share
0 Yogurt 1 0.021118
1 Yogurt 2 0.591145
2 Yogurt 3 0.044040
3 Yogurt 4 0.343697

The new market shares for the four yogurt products after increasing the price of Yogurt 1 by $0.10 are approximately:

Yogurt 1: 2.1%

Yogurt 2: 59.1%

Yogurt 3: 4.4%

Yogurt 4: 34.4%

Yogurt 1’s market share dramatically decreases from 34.2% to 2.1% due to the price increase.

Yogurt 2’s market share significantly increases, absorbing most of the share lost by Yogurt 1.

Yogurt 3 and Yogurt 4 also see some increase in their market shares. This demonstrates the sensitivity of market share to price changes in competitive markets, especially under the assumption of a Multinomial Logit model where the relative utilities directly affect the choice probabilities. Yogurt 1’s substantial price increase leads consumers to switch to the more affordable alternatives, illustrating the impact of price elasticity on consumer choice behavior

2. Estimating Minivan Preferences

Data

Load data and Describe the data

Code
minivan = pd.read_csv("rintro-chapter13conjoint.csv")
minivan
resp.id ques alt carpool seat cargo eng price choice
0 1 1 1 yes 6 2ft gas 35 0
1 1 1 2 yes 8 3ft hyb 30 0
2 1 1 3 yes 6 3ft gas 30 1
3 1 2 1 yes 6 2ft gas 30 0
4 1 2 2 yes 7 3ft gas 35 1
... ... ... ... ... ... ... ... ... ...
8995 200 14 2 no 7 3ft gas 35 1
8996 200 14 3 no 7 3ft hyb 35 0
8997 200 15 1 no 7 2ft gas 35 0
8998 200 15 2 no 8 3ft elec 40 0
8999 200 15 3 no 6 3ft gas 35 1

9000 rows × 9 columns

Variable Description
resp.id Identifier for the respondent.
ques Question or choice task number.
alt Alternative within each choice task.
carpool, seat, cargo, eng Attributes of each alternative, such as carpool availability, seating capacity, cargo space, and engine type.
price Price associated with each alternative.
choice Whether the alternative was chosen (1) or not (0).
Code
# Number of respondents
num_respondents = minivan['resp.id'].nunique()

# Number of choice tasks completed by each respondent
tasks_per_respondent = minivan.groupby('resp.id')['ques'].nunique()

# Number of alternatives per choice task
# Assuming the structure is consistent across the dataset
alternatives_per_task = minivan.groupby(['resp.id', 'ques'])['alt'].nunique().max()

print("Number of Respondents:", num_respondents)
print(tasks_per_respondent.describe())
print("Number of Alternatives per Choice Task:",alternatives_per_task)
Number of Respondents: 200
count    200.0
mean      15.0
std        0.0
min       15.0
25%       15.0
50%       15.0
75%       15.0
max       15.0
Name: ques, dtype: float64
Number of Alternatives per Choice Task: 3

Here’s a summary of the conjoint survey data:

  • Number of Respondents: There are 200 respondents who participated in the survey.

  • Number of Choice Tasks per Respondent: Each respondent completed 15 choice tasks. This number is consistent across all respondents.

  • Number of Alternatives per Choice Task: Each choice task presented 3 alternatives.

The attributes (levels) were number of seats (6,7,8), cargo space (2ft, 3ft), engine type (gas, hybrid, electric), and price (in thousands of dollars).

Model

Estimate a MNL model omitting the following levels to avoide multicollinearity (6 seats, 2ft cargo, and gas engine) and show the table of Coefficients and Standard Errors

Code
import statsmodels.api as sm
import statsmodels.formula.api as smf

# Create dummy variables for the categorical attributes, excluding base levels
minivan['seat_7'] = (minivan['seat'] == 7).astype(int)
minivan['seat_8'] = (minivan['seat'] == 8).astype(int)
minivan['cargo_3ft'] = (minivan['cargo'] == '3ft').astype(int)
minivan['eng_hybrid'] = (minivan['eng'] == 'hyb').astype(int)
minivan['eng_electric'] = (minivan['eng'] == 'elec').astype(int)
minivan = minivan.apply(pd.to_numeric, errors='coerce')

X = minivan[['price', 'seat_7', 'seat_8', 'cargo_3ft', 'eng_hybrid', 'eng_electric']]
X = sm.add_constant(X)  # Add intercept
y = minivan['choice']  # Make sure 'choice' is coded appropriately for multinomial

mnl_model = sm.MNLogit(y, X).fit()
mnl_summary = mnl_model.summary()
mnl_results_table = mnl_summary.tables[1]

# To display or print out the table
print(mnl_results_table)
Optimization terminated successfully.
         Current function value: 0.558663
         Iterations 6
================================================================================
    choice=1       coef    std err          z      P>|z|      [0.025      0.975]
--------------------------------------------------------------------------------
const            5.5322      0.224     24.677      0.000       5.093       5.972
price           -0.1591      0.006    -25.616      0.000      -0.171      -0.147
seat_7          -0.5248      0.060     -8.800      0.000      -0.642      -0.408
seat_8          -0.2931      0.059     -5.009      0.000      -0.408      -0.178
cargo_3ft        0.4385      0.049      9.004      0.000       0.343       0.534
eng_hybrid      -0.7605      0.057    -13.361      0.000      -0.872      -0.649
eng_electric    -1.4347      0.062    -23.217      0.000      -1.556      -1.314
================================================================================

Results

Interpretation:

Price: The negative coefficient (-0.1591) suggests that as the price increases by one thousand dollars, the log odds of choosing a particular car decrease, indicating a typical negative relationship between price and purchase probability.

Seat 7: Having 7 seats, compared to the baseline of 6 seats, is associated with lower odds of the car being chosen.

Seat 8: Similarly, having 8 seats is also less preferable compared to 6 seats but less so than 7 seats.

Cargo 3ft: More cargo space (3ft) increases the odds of choosing the car compared to the base level of 2ft. This feature is preferred over the baseline of 2ft cargo space, as indicated by the positive coefficient. Consumers prefer more cargo space, all else being equal.

Engine Hybrid and Electric: Both hybrid and electric engines are less preferred compared to a traditional gas engine, with electric being the least preferred among the options.

Use the price coefficient as a dollar-per-util conversion factor. We could find the dollar value of 3ft of cargo space as compared to 2ft of cargo space:

Code
 # Coefficients from the model results
cargo_coeff = 0.4385
price_coeff = -0.1591

# Calculate the dollar value of having 3ft of cargo space compared to 2ft
dollar_value_cargo = (cargo_coeff / price_coeff) * (-1)
print("The Dolla Value of having 3ft cargo space compared to 2ft:", round(dollar_value_cargo, 3))
The Dolla Value of having 3ft cargo space compared to 2ft: 2.756

The dollar value of having 3 feet of cargo space compared to 2 feet, based on the model, is approximately $2,756. This amount represents the additional value that respondents place on having an extra foot of cargo space in their vehicle choice.

Assume the market consists of the following 6 minivans. Predict the market shares of each minivan in the market.

Minivan Seats Cargo Engine Price
A 7 2 Hyb 30
B 6 2 Gas 30
C 8 2 Gas 30
D 7 3 Gas 40
E 6 2 Elec 40
F 7 2 Hyb 35
Code
# Coefficients from the MNL model
coef_const = 5.5322
coef_price = -0.1591
coef_seat_7 = -0.5248
coef_seat_8 = -0.2931
coef_cargo_3ft = 0.4385
coef_eng_elec = -1.4347
coef_eng_hyb = -0.7605

# Define the attributes of each minivan
minivans = [
    {"seats": 7, "cargo": 2, "engine": "Hyb", "price": 30},
    {"seats": 6, "cargo": 2, "engine": "Gas", "price": 30},
    {"seats": 8, "cargo": 2, "engine": "Gas", "price": 30},
    {"seats": 7, "cargo": 3, "engine": "Gas", "price": 40},
    {"seats": 6, "cargo": 2, "engine": "Elec", "price": 40},
    {"seats": 7, "cargo": 2, "engine": "Hyb", "price": 35}
]

# Function to calculate utility
def calculate_utility(minivan):
    utility = coef_const
    utility += coef_price * minivan["price"]
    if minivan["seats"] == 7:
        utility += coef_seat_7
    elif minivan["seats"] == 8:
        utility += coef_seat_8
    if minivan["cargo"] == 3:
        utility += coef_cargo_3ft
    if minivan["engine"] == "Elec":
        utility += coef_eng_elec
    elif minivan["engine"] == "Hyb":
        utility += coef_eng_hyb
    return utility

# Calculate utilities for each minivan
utilities = [calculate_utility(minivan) for minivan in minivans]

# Calculate the market shares using the softmax function
exp_utilities = np.exp(utilities)
market_shares = exp_utilities / np.sum(exp_utilities)

# Create a DataFrame for the results
minivan_names = ['A', 'B', 'C', 'D', 'E', 'F']
market_shares_df = pd.DataFrame({
    'Minivan': minivan_names,
    'Market Share': market_shares
})

market_shares_df
Minivan Market Share
0 A 0.116071
1 B 0.419684
2 C 0.313062
3 D 0.078430
4 E 0.020365
5 F 0.052389

High Market Share

Minivan B (41.97%): 6 seats, 2ft cargo, gas engine, $30. This minivan has the highest market share, suggesting that consumers highly value the combination of a lower price ($30) with standard features (6 seats, 2ft cargo, gas engine). This option appears to be the most cost-effective and appeals to the majority of consumers.

Minivan C (31.31%): 8 seats, 2ft cargo, gas engine, $30. This minivan also has a significant market share, indicating that some consumers are willing to opt for a vehicle with more seating capacity at the same price. The additional seats (8 seats) add utility, making it an attractive option despite not being the top choice.

Moderate Market Share Minivan A (11.61%): 7 seats, 2ft cargo, hybrid engine, $30. The hybrid engine in this minivan reduces its market share compared to the purely gas-powered options at the same price point. While hybrid engines are generally valued for their efficiency, the preference in this case seems to be towards conventional gas engines at a lower price.

Low Market Share Minivan F (5.24%): 7 seats, 2ft cargo, hybrid engine, $35. The increase in price to $35, coupled with similar features as Minivan A, significantly lowers its market share. This suggests that consumers are sensitive to price increases and are less inclined to pay an extra $5 for a similar hybrid vehicle with the same number of seats and cargo space.

Minivan D (7.84%): 7 seats, 3ft cargo, gas engine, $40. Despite offering more cargo space (3ft), the higher price of $40 detracts from its attractiveness. This indicates that the additional cargo space does not compensate for the higher price for most consumers.

Minivan E (2.04%): 6 seats, 2ft cargo, electric engine, $40. This minivan has the lowest market share, suggesting that consumers place a relatively low value on electric engines in this context, especially when paired with a high price. The cost does not justify the perceived benefits of the electric engine, resulting in minimal consumer interest.

Key Takeaways:

Price Sensitivity: Consumers show a strong preference for lower-priced options. Minivans priced at $30 dominate the market shares, indicating high price sensitivity.

Engine Type Preferences: Gas engines are favored over hybrid and electric engines, reflecting either cost concerns or possibly a lack of perceived additional value from alternative engine types at higher prices.

Feature Trade-offs: Additional features like more seats or cargo space are valued but have a diminishing return when paired with higher prices. Consumers appear to balance their preferences for additional features with their willingness to pay more.