How big is social media in finance and what are some of the unique challenges?

SMA scans data looking for mentions of companies, tickers, names, products… Anything that tells us social media comments are describing a security.  We store, filter, and publish metrics on these social media conversations.  We filter for the intentions of professional investors; in the end only about 10 percent of the social media conversations make it into our metrics.

When I am discussing social media and finance one question is always asked, “How big is social media and how is it used in Finance?”

There are over 500 million Tweets per day with over 250 million active Twitter accounts per month.  The StockTwits community has over nine million page views per month.  As you can see from the chart below the growth in social media discussions spans all asset classes.

Growth in SocialMedia

The challenge is to filter the noise and return a clean statistically significant signal representing the intentions of professional investors.  To meet this objective you need to: Identify the Tweet is about a security, eliminate people trying to manipulate sentiment, identify professional investors, and calculate the sentiment correctly.

Identifying commodities and currencies is especially challenging: “This is gold” or “This is oil” mentions gold and oil respectively but they certainly aren’t taking about Gold Bullion or Crude Oil.  The Cashtag has helped in identifying the Tweet is about a security.  Typically professional investors will identify a security with “$”.  $AAPL tells the Twitter community this Tweet is about the security Apple, $NG_F is referencing Natural Gas.  It is important that the security model identifies different identifiers for the same security.  This is common in the currency market where currencies are frequently identified with slang.  When calculating sentiment on commodities it is important to identify that a specific contract, the front month, and cash may all be referenced by the same identifier.  An expansive topic model is usually applied to this challenge.

Social media is like every means of communication, there are spammers, scammers and con-artist.  Our definition of professional investor does not mean strictly analyst at major firms, identifying all active traders is important.  SMA uses advanced algorithms to qualify professional investors.  How long the account has been active, how often they Tweet about securities and how often their Tweets are retweeted are just some of the metrics used by our algorithms to identify influential professional investors and eliminate scammers.  As an example, people who Tweet sporadically and then send many Tweets in succession are more likely to be attempting manipulation.

Once professional investors are identified the key is to score and aggregate their forward-looking Tweets.  A tweet saying they bought CL_F two days ago doesn’t help predict CL_F today.  Using a natural language processor you can identify the Tweets that are forward-looking: “going long”, “buying puts”, “selling puts”, “raising price target”, “raising rating”, “hitting support”, “broke through resistance” are all representations of trader expectations of price movements.  We have found that after filtering you should end up with about 10% of the original data.

Sentiment calculation itself has evolved significantly in recent years.  The most common processes involve using a Natural language Processor (NLP) to break the text into components (Tokenization).  Then use a dictionary to assign scores to words and phrases (bi-grams & tri-grams). The scores are aggregated per Tweet and across a period of time.  There are three types of scoring.  Two Bucket Rough Grain Scoring classifies tweets as positive or negative.  This scoring method over weights neutral Tweets and under weights extreme Tweets.  Three Bucket Rough Grain scoring adds a bucket for neutral Tweets but still suffers from under weighting extreme positive or negative Tweets.  The most accurate method is Fine Grain Scoring that assigns a score to each tweet and does not have the over or under weighting problem.   These three methodologies can generate very different aggregate sentiment values.


Joe Gits


What can I do with Social Media Sentiment?

Often one of the largest hurdles with new content in the marketplace is understanding how to derive value.  I have seen this issue occur across many types of financial firms. We can show plainly there is alpha resident in our content, but that is not enough.  The process of mining value from new content is typically more complex and the answer changes depending on how the data is utilized.

At SMA we see sentiment in our S-Factors as an additive item within a decision model that can enhance the performance or risk profile of a system.  The hurdle for providers of this new content is to clearly illustrate multiple areas of value and applications for sentiment. This will reduce the inherent risk of devoting resources to incorporating a new data set in algorithmic decisions and spur on wider adoption.

Recently, we have seen promise in utilizing our S-factors to predict the likelihood of volume increases in a specific security.  This makes sense since an increase in positive or negative sentiment for a specific security will likely result in focus and activity in that specific name. Changes of note in these metrics/factors should be a precursor to increased activity in that security.

So our initial findings proved positive. When we look at our history of S-Factors and identify conditions when our SV-Score is greater than 2 or S-Delta is above 2 or lower than -2, we can identify significant changes in the underlying security volume.  With these S-factor conditions we found increased volume occurred 78% of the time during the event conditions and for two time periods after the event. The volume increase averaged 2.5 times more than the average of that same time period for that security.

This illustrates my point that sentiment can provide value in multiple areas. In this case, the fund manager or buy side trader will find this a simple and effective tool to increase the capacity of a trading system or reduce transaction costs through managing their execution algorithms.  A sell side trader can set alerts to indicate securities active in social media or to advise clients on opportunities to move inventory.

Alpha generation in short term momentum models, option volatility prediction, risk management alerts, short term portfolio selection are some of the initial uses of sentiment. It may be that one use case is ample reason to invest the time to integrate sentiment into your decision model, but this would be short sighted. There is more to be mined and new data is typically where the gold is found.

If interested in more detail on our volume study please download our research note found at .


Thank You


Kevin Close

SMA –  Social Market Analytics

Managing Director