Twitter + Sentiment Analysis

(Written January 2021)

I wanted to try a fun project where I could learn about Natural Language Processing (NLP) and Sentiment Analysis to extract insights from unstructured data in social media.

More than ever people have something to say using social media. Sentiment analysis is a way to decipher this sentiment. Are they leaving a bad review of a restaurant? A retweet or like for their favourite sports hero? Merely stating something factual that has happened?

This area has so many interesting examples ranging from calculating viewers' sentiments on movies, gauging product reputation, impact of politicians' tweets, and stock market prediction.

The stretch goal (a.k.a. part 2) will be to incorporate cloud tools such as serverless compute (AWS Lambda) and a voice chatbot for the conversational interface (AWS Lex).

Steps:

1. Request Twitter Developer access for API

2. Tweepy as Python library to Twitter API to retrieve tweets

3. Mapping Twitter user locations with ArcGIS library for Python - later discarded as Twitter user location is free form text which requires data cleansing

4. VADER as Python library for sentiment analysis on tweets

5. Present results using Matplotlib, WordCloud, Squarify, Seaborn


Attempt #1 : Toronto Police crime data

My first attempt was for a course project at York University where our team built Machine Learning unsupervised and supervised model for crime prediction in Toronto (more to report in an upcoming post).

Using Twitter to collect citizens' tweets on real or potential crime could be translated into a variable for the model's prediction.

Unfortunately I was not able to find sufficient data of crimes being reported by citizens. The tweets from Toronto Police were useful but were structured in a way that did not require sentiment analysis.

Attempt #2 : Korean dramas

I then turned my attention to Korean dramas, a (not so) secret indulgence during these COVID times.

I picked new shows based on recommendations made by family and friends. Sometimes the reviews are conflicting and I am too lazy to check on whatever the Korean equivalent is to Rotten Tomatoes.

Retrieving thousands of tweets and running the VADER sentiment analyzer to output sentiment scores with the compound score ranging from -1 (most negative) to 1 (most positive), normalizing the scores from neg (negative), neu (neutral), and pos (positive).

WordCloud on the show Sweet Home depicting keyword importance in the tweets:

Python dataframe storing tweets extracted from Twitter API via Tweepy:

The drama show Sweet Home has more positive polarity than Crash Landing on You

Attempt #2B : Stock predictions

My code could be applied for all sorts of search criteria, not just Korean dramas.

Coming from Capital Markets technology, I know there are sophisticated algorithms that initiate trade orders in reaction to news such as the outcome of a company's quarterly report and sentiments based on analysts reports and from social media.

I checked Yahoo!Finance for recent reported earnings and found one on January 14, 2021 for Taiwan Semiconductor (disclosure: I own some stock in my portfolio).

Their Q4 (fourth quarter) reporting on the day beat the expected Earnings Per Share by $0.02. When the company outperforms what "the street" expects then the stock price generally goes up as a result.


Here's a look at sentiment before and after their earnings are reported on January 14.

The positive sentiment has gone up from 38% to 46%, but given expected earnings was marginally outperformed it could also be due to other factors. For example this stock has been increasing sharply by 140% since COVID was announced as pandemic in March 2020.

Attempt #2C : Pororo vs Peppa Pig

My kid loves the Korean children's character called Pororo. His cousin is a huge fan of Peppa Pig.

Which one wins in the Twitter world?


Its really close when you look at percentages on a pie chart, but the histogram demonstrates that Pororo has more positive polarity.


I really enjoyed this exercise and it's amazing to see the variety of sentiment analyzers that have been built, making coding something quick very easy.

I ended up using VADER which is attuned to sentiments expressed in social media. In future I would like to try Stanford's Core NLP which was trained on movie reviews where a reviewer might discuss both positive and negative aspects in the same sentence.

In the future I'd also like to incorporate cloud tools such as Amazon (AWS) Lambdas for serverless compute and Lex for voice interface.


I would like to acknowledge and thank the following for their influence and inspiration:

James Hiu : his Power BI charts on sentiment analysis put my charts to shame. Thank you for proof reading this!

Ken Jee - Was Captain Marvel bad? A Sentiment Analysis of Twitter Data : starting to check out his YouTube videos for fun ideas and education

VADER Sentiment Analysis on Algorithmic Trading : reading this to get new ideas for future projects using sentiment analysis for algo trading