## Introduction

One of the people who have gained perhaps the most media attention for his tweets, quotes, and statements regarding
different companies, is the entrepreneur and businessman behind the likes of Tesla and SpaceX, Elon Musk.
As the richest man in the world and recent winner of the Times price
2021 Person of the year. There is no doubt that he is one of the most
influential people in the world, if not the most influential.

There is reason to believe that comments made by important people affect could have an impact on companies mentioned in different ways.
In some cases these comments might even have an direct impact on the stock prices of these businesses.

### Tweets that impact

Recently, in novemeber, Elon Musk created a poll on his Twitter where he asked his followers wheter or not he should sell a portion (10%) of his shares in Tesla in order to pay tax, as a response to a "billionaires tax". Musk told that he would abide the poll no matter the outcome. Shortly after this tweet was posted the Tesla stock price fell by 4.9%. However, this is not the only time where Elon and his statements have directly affected a financial asset.### Elon talks, crypto follows

Another example of a Elon tweet was in may, when Musk tweeted that Tesla will stop accepting Bitcoin as a method of payment. Shortly after the bitcoin price dropped by 15%. A coincidence? Although quotebank does not contain tweets, it is these types of statements that we will take a closer look at, and try to find out whether they impact the stock price, throughout this data story. We will look at how Elon Musk's quotes impact the financial markets as well as the popularity of different companies. Is there an Elon Effect?### Research questions

**How does Musk impact the financial markets?**

Does Musk's quotes impact the stock price?

Is his negative quotes more hurtful than the gain from his positive ones?

Does the company size play a role in the amount of change in the stock price?

**Does Musk's quotes affect the popularity of companies?**

**Has his impact changed in any way?**

Do his quotes have a bigger impact now than what they had in prior years?

## What is Quotebank?

In order for you to understand and hang along in this data story we first want to let you now where we have gotten our data from and how we abstracted the relevant data. So let us introduce you to Quotebank. Quotebank is an open corpus of more than 178 million quotations attributed to the speakers that uttered them. The quotes are extracted from 162 million English news articles that were published between 2008 and 2020. In this project we are focusing on the data between 2015 to the beginning of 2020. In this particular data story we are insterested in the quotes that was attributed to Elon Musk. So we extracted these quotes in order for us to do the analysis. Let us briefly go through what we did in order to achieve this.

## Meaningful data from Quotebank

We were presented with a number of different zipped files containing all of the quotes in quotebank from the year 2015 to the beginning of 2020. After downloading all the zipped datafolders from Quotebank we started the process of chunking the data from the different years into multiple files in order to save time when loading in the data. After finishing the chunking we extracted all the quotes where Elon Musk was set as the speaker. Then we could combine all the data sets from the different years into one single file: "all-Elon Musk-quotes.csv.bz2". Out of these quotes we removed the quotes where the probability of Elon being the speaker was less than a certain threshold. Finally we used Spacy, a Natural Language Processing tool, to extract the quotes where Elon Musk mentioned an organization.

## the data in numbers

Here are some of the numbers from the Quotebank dataset.

Number of Elon quotes

Number of company quotes

Publicly traded companies mentioned > 5 times

## Some of Elon's quotes

## THE PUBLIC LISTED COMPANIES MOST MENTIONED BY ELON

**TESLA**

**APPLE**

**TWITTER**

**FORD**

**PAYPAL**

## Some more data about Elon's quotes

Quotebank consist of a lot of quotes by Elon Musk, and it is not very suprising that Tesla is the company he talks the most about, as he is the CEO. The figure above on the left, shows the distribution of Elon quotes from the year 2015 to 2020. As we can se from the figures there was especially many Elon quotes from the year 2018. This was also, not surprisingly, the year that Elon Musk mentioned Tesla the most, as you can see on the right table. It is this data that we will use in our further analysis of Elon's impact on the financial markets.

## Sentiment analysis

Sentiment analysis is a text analysis method that detects polarity (e.g. a positive or negative opinion) within the text. We did a Sentiment analysis on Elon's quotes using VADER, a model used for text sentiment analysis that is sensitive to polarity (positive/negative) and intensity (strength) of emotion. The analysis returns a sentiment score that is a value between -1 and 1, where -1 is very negative, 0 is neutral and 1 is very positive. We used VADER to score Elon's quotes to whether his comments are positive or neutral or negative, which then can be used to see the effect his comments had on the stock price and the popularity of the mentioned company.

This particular quote had a sentiment score of: **-0.7003**.
This is expected as he mentions Apple negatively in this particular quote.
So now let us take a closer look at how Apple's popularity changed after this quote.
The black vertical line displays the date of the Elon quote. From the figure below,
which uses Google trends, one can think that Elon's quote did not affect the popularity of
search about Apple. But we will make further analysis to prove this hypothesis. We have included
the graph for iPhone as a comparison to the Apple graph to make sure the Apple trend is related to
the technology company and not the fruit. As you can see, there is a minimal increase.
This might actually tell us that Elon does not have that big of an impact on
Apple and we believe that this can be generalized to larger companies as well.

In this example, it did not seem like Elon affected the popularity of Apple when he mentioned them. It increased a little bit, but not significantly. Now that we have looked at the popularity, let's take a look at the stock price of Apple. We expect it to drop, as Elon talks badly about Apple. The figure below shows the development of Apple stock. Once again the black vertical line displays the date of the Elon quote. We can see that the stock price actually drops, but not significantly. So this is probably not because of the Elon quote. We could also see that the stock price increased quite a lot the day before, and therefore it might not be strange that it fell a little bit the day after. Typically when a stock increases a lot one day, the next day it usually decreases a little bit as some people might sell due to the increase in price and they want to secure their gain. So the fact that Apple's price dropped a bit is most likely not because of Elon. This might tell us that Elon does not have that big of an impact on the stock price of large-cap companies.

## Regression based on sentiment analysis

Here we wanted to see if there is a significant effect of Musk's quotes on a company's stock price and popularity. To achieve this, we plotted the percentage change in stock price or percentage change popularity against the sentiment score. As a first analysis, for all cases, we observe that the data is dispersed and sparse. Doing a regression on this type of data is difficult because we don't know which model to use. We decided to model the change of stock price (or popularity) with a linear model with only one feature. The feature used is the score of the sentiment analysis for each quote. One can also see that when fitting this model, the R-squared of the model is very low. This implies that the regression model fits the change of stock (or popularity) very badly. This means that we cannot say that Elon quotes have an effect. We can state that the Elon quotes cannot describe the change in stock prices (or popularity). This is normal since we only used one feature. This might be not enough as there are important variables to take into account when we are talking about stock prices (or popularity).

## Observational study

## Matching

One way to find out whether or not Elon has an impact on the stock price of different companies is to perform an
observational study. In an observational study the researcher, in our case us, observe the effect of a risk factor
simply by using raw data and not preform a randomized experiment.

The idea is to see if Elon's quotes has an effect on the stock price. So in order to find that out, we will observe
the change in stock price of the company that Elon has mentioned, and compare that change to the change in stock price
of other similar companies that was not mentioned by Elon. By doing this for multiple companies and at different dates
we can see if it is an actual Elon Effect or not. The number of different comparisons will increase the probability of
the result being true.

We used yfinance a python library that allows you to easily
get information from Yahoo finance, for the financial information in the observational study.
In our case the quoted companies represent the “treated” population of the observational study.
To find a control population, we first select companies that we think are similar to the quoted company
which are not quoted on the same date (and quoted little in general). To perform the matching that will
optimize our model, we decide to introduce covariates for each day that can be observed such as the:

- Close, stock price of the company at the end of the day.
- Volume, how many stock has been exchanged during the day.
- MarketCap, the value of the company at the end of the day.
- Popularity, represented by the number of actual search requests made to Google during this day on the company.
- Money Volume which represents the product of the close and the volume.

These values are obtained through yahoo finance and google. We also add columns not depending on time:

- Elon: takes value 1 or 0 to indicate if Elon Musk talk about the company.
- Compare: for the controlled companies, it takes the name of the company we want to compare it with. And for the treated ones, take 'None' as value.

**The following table represents the dataframe obtained and standardized:**

Name | Date | Close | Volume | MarketCap | Popularity | Elon | Money Volume | Propensity score |
---|---|---|---|---|---|---|---|---|

Apple | 2015-02-05 | -0.077488 | 3.363161 | -0.020694 | 0.065745 | 1 | 0.013481 | 9.999954e-01 |

Microsoft | 2015-02-05 | -0.072314 | 0.083692 | -0.034855 | -0.147734 | 0 | -0.059075 | 5.085224e-01 |

Dell | 2015-02-05 | -0.084971 | -0.719630 | -0.058121 | -0.527252 | 0 | -0.086382 | 9.863380e-02 |

IBM | 2015-02-05 | -0.036986 | -0.646309 | -0.050319 | -0.788171 | 0 | -0.078658 | 1.205988e-07 |

Samsung | 2015-02-05 | 0.041447 | -0.538745 | 0.074025 | 1.488939 | 0 | -0.041529 | 3.341405e-08 |

Thanks to these covariates, the propensity score is computed for each company at each given day.
Now, we can compute the matching between the controlled and the treated companies.
To do so, a bipartite graph is constructed with edges between the controlled and treated companies
that we wish to compare, and weights equal to the similarity of these two companies. We solve the
maximum bipartite matching to gain pairs.

Here you can see the representation of the graph:

The matching is then obtained and we can now go to the next step which
consists of comparing the change of the stock of the treated vs control.

## Tests on observational study

To do the observational study we had to be careful about a couple of more things.
For example the financial data was sometimes listed in other currencies than USD.
The most prominent is Samsung in Korean Won which is about 1000x of USD. We then had to
convert all monetary features of these types of companies to USD.

This was done manually for the few companies in question, and by taking an estimated average of the
exchange rate over the time period for our data.

We then redid the propensity score for this part and got the coefficients as follows:

**Intercept** -1.0469

**Money volume** 8.3449

**MarketCap** -2.1203

**Popularity** 0.2211

Based on the outputted propensity scores we did the matching for each local group of stocks.

The next part was modeling the stock price change. We loaded in all the stock data for the companies
we have in our model and looked at the 5 consecutive days after a quote for the matched company and the
analysed company. Sometimes a quote will be during a weekend or the following span will include days
without stock history, then our model will automatically just take the days that have stock history.
We then use the stock prices to calculate the daily change (previous day price/next day price).

By comparing the distributions between a matched and control stock we can test if they have a similar distribution.
We used a Mann-Whitley u test for the distributions. The null-hypothesis is that the distributions are the same.
Here you can see the daily change for the control and 'treated' companies where the x axis is the dates in
chronological order, but not to scale as well as the results for the Mann Whitley U tests:

### Apple

**P-value (IBM)** 0.7818969961187895

**P-value (Samsung)** 0.07267412867689323

**P-value (Lenovo)** 0.20719648581932448

**P-value (Microsoft)** 0.9309874465595671

### Ford

**P-value (Renault)** 0.037136256414410235

**P-value (General Motors)** 0.5285921958576796

There is two empty figures because Toyota and Stellantis was not matched with ford. This might have been due to a large market cap and low trading volume.

### PayPal

**P-value (Western Union)** 0.26275304114766074

**P-value (Euronet)** 0.48073111045562256

**P-value (American Express)** 0.9009714934164412

**P-value (Visa)** 0.534965034965035

### Tesla

**P-value (Daimler)** 0.22820421433941152

**P-value (BMW)** 0.01735656673757902

**P-value (Volkswagen)** 0.46236044154701805

**P-value (General Motors)** 0.907735912135226

**P-value (Google)** 0.48484848484848486

**P-value (Facebook)** 0.5468909935184665

**P-value (Snapchat)** 0.19477620553450214

**P-value (Pinterest)** 0.1681531290926782

There seem to be two tests with a significant result. The P-value in the Ford test with Renault is under 0.05
which means that we can reject that they come from the same distribution. The same applies to Tesla and BMW.
However, it's hard to conclude that Elon's quotes have a significant impact on the 'treated' companies.
That is because not all the hypotheses were rejected, and it can also be the case that different companies'
stocks move in different ways.

## Impact regression

The aim of this part is to perform a linear regression on the change of stock to . Indeed, a simple linear regression was
calculated to describe the change of stock at a given day based on the market cap, the money volume, whether or not
Elon Musk has talked about the company, and especially the sentiment analysis of the quote.
We want to investigate the degree to which the variable sentiment score predicts the change of stock price.

To do so, we continue to use the data frame of the observational study but we need to make some modifications.
In the beginning, we add a column of sentiment score for each quote, and for the companies not quoted we set the
score to 0. Afterward, we noticed that there are many quotes on the same day, some of them are different quotes and
some of them are the same as the first one. In this case, we add all the different quotes that occur on the same day.
Therefore, on the same day, it can happen that there are different rows of the same company with different sentiment
scores.

We also add a column that describes the change of the stock: we take the mean of the change of the stock during
the three days following the day of the quote.

Now that all the data is gathered the linear regression can be performed:

Where x1, x2, x3,x4, x5 represent respectively the MarketCap, Popularity, Money Volume, sentiment analysis score of
the quote, and whether Elon talked about the company. The method used here is Ordinary Least Squares (OLS).
As you can see the estimation of the coefficients of the sentiment score and the Elon variable are very
low. The coefficients represent the rate of change of the stock price with respect to the variables.
This means Elon quotes still don't have an impact on the change of stock price, even when we expanded the model.

## Conclusion

In light of the above, we did not see that Elon had a significant impact on the stock prices. In fact, we conducted two analyses to check if Elon's quotes have an impact on the stock prices by constructing an observational study and to represent the impact of the quotes on the change of the stock price, by performing a linear regression.

As stated in our research questions we wanted to test if his impact (if he had any) was different for big and small companies. However, the quotes about smaller companies are not enough to make an analysis of his impact on them. One can think before this study that his impact would be significant on larger companies, except maybe Tesla, where he has an important role and therefore probably a direct impact. Interestingly enough, Tesla is also the company that had the most significant result in the Mann-Whitley U test. One can also take into consideration that Elon Musk usually uses Twitter when he utters an opinion about different companies. Unfortunately, Quote Bank does not take into account the tweets, as said at the beginning.

However, the following study proves that when taking the covariates that we defined, Elon quotes don't have an impact on the stock price regardless of the sentiment score of the quote.

## Future work

There were things that we were not able to do, due to time limits as well as other reasons, such as some limitations with Quotebank. But there are still a lot of interesting things that we could have looked at. We will list some of the stuff we think that could be interesting to look at in the future.

For the observational study, we could have added more covariates, or taken a larger set of controlled companies to have a better matching. But as noticed, the propensity scores are not very similar for the matched companies. One can think of improving the propensity score model to have a more accurate score. There is also a large data missing or not available, for example, we tried to match Tesla with Rivian for example, but like for a lot of other companies they were either private or only had too recent stock prices.

We were not able to get historical prices for cryptocurrencies which was very unfortunate for the data story.
Therefore we was not able to look at Elon's effect on cryptocurrencies.
We tried to use CoinMarketcap's API,
the most popular API for cryptocurrency, but unfortunately we needed a paid subscriptions in order to obtain the historical data.
The error message stated this: *"Your API Key subscription plan doesn't support " 'this endpoint.'*

As we briefly mentioned earlier, we were unfortunately not able to look at if Musk had a bigger impact on smaller companies compared to big companies, so-called large-cap companies, due to the fact that Quotebank did not contain quotes about smaller companies.

We also wanted to look at if Elon's impact changed over the years. However, we did not have time to look into that.