Twitter + Sentiment Analysis
Twitter + Sentiment Analysis
(Written January 2021)
I wanted to try a fun project where I could learn about Natural Language Processing (NLP) and Sentiment Analysis to extract insights from unstructured data in social media.
I wanted to try a fun project where I could learn about Natural Language Processing (NLP) and Sentiment Analysis to extract insights from unstructured data in social media.
More than ever people have something to say using social media. Sentiment analysis is a way to decipher this sentiment. Are they leaving a bad review of a restaurant? A retweet or like for their favourite sports hero? Merely stating something factual that has happened?
More than ever people have something to say using social media. Sentiment analysis is a way to decipher this sentiment. Are they leaving a bad review of a restaurant? A retweet or like for their favourite sports hero? Merely stating something factual that has happened?
This area has so many interesting examples ranging from calculating viewers' sentiments on movies, gauging product reputation, impact of politicians' tweets, and stock market prediction.
This area has so many interesting examples ranging from calculating viewers' sentiments on movies, gauging product reputation, impact of politicians' tweets, and stock market prediction.
The stretch goal (a.k.a. part 2) will be to incorporate cloud tools such as serverless compute (AWS Lambda) and a voice chatbot for the conversational interface (AWS Lex).
The stretch goal (a.k.a. part 2) will be to incorporate cloud tools such as serverless compute (AWS Lambda) and a voice chatbot for the conversational interface (AWS Lex).
Steps:
Steps:
1. Request Twitter Developer access for API
1. Request Twitter Developer access for API
2. Tweepy as Python library to Twitter API to retrieve tweets
2. Tweepy as Python library to Twitter API to retrieve tweets
3. Mapping Twitter user locations with ArcGIS library for Python - later discarded as Twitter user location is free form text which requires data cleansing
3. Mapping Twitter user locations with ArcGIS library for Python - later discarded as Twitter user location is free form text which requires data cleansing
4. VADER as Python library for sentiment analysis on tweets
4. VADER as Python library for sentiment analysis on tweets
5. Present results using Matplotlib, WordCloud, Squarify, Seaborn
5. Present results using Matplotlib, WordCloud, Squarify, Seaborn
Attempt #1 : Toronto Police crime data
Attempt #1 : Toronto Police crime data
My first attempt was for a course project at York University where our team built Machine Learning unsupervised and supervised model for crime prediction in Toronto (more to report in an upcoming post).
My first attempt was for a course project at York University where our team built Machine Learning unsupervised and supervised model for crime prediction in Toronto (more to report in an upcoming post).
Using Twitter to collect citizens' tweets on real or potential crime could be translated into a variable for the model's prediction.
Using Twitter to collect citizens' tweets on real or potential crime could be translated into a variable for the model's prediction.
Unfortunately I was not able to find sufficient data of crimes being reported by citizens. The tweets from Toronto Police were useful but were structured in a way that did not require sentiment analysis.
Unfortunately I was not able to find sufficient data of crimes being reported by citizens. The tweets from Toronto Police were useful but were structured in a way that did not require sentiment analysis.
Attempt #2 : Korean dramas
Attempt #2 : Korean dramas
I then turned my attention to Korean dramas, a (not so) secret indulgence during these COVID times.
I then turned my attention to Korean dramas, a (not so) secret indulgence during these COVID times.
I picked new shows based on recommendations made by family and friends. Sometimes the reviews are conflicting and I am too lazy to check on whatever the Korean equivalent is to Rotten Tomatoes.
I picked new shows based on recommendations made by family and friends. Sometimes the reviews are conflicting and I am too lazy to check on whatever the Korean equivalent is to Rotten Tomatoes.
Retrieving thousands of tweets and running the VADER sentiment analyzer to output sentiment scores with the compound score ranging from -1 (most negative) to 1 (most positive), normalizing the scores from neg (negative), neu (neutral), and pos (positive).
Retrieving thousands of tweets and running the VADER sentiment analyzer to output sentiment scores with the compound score ranging from -1 (most negative) to 1 (most positive), normalizing the scores from neg (negative), neu (neutral), and pos (positive).
WordCloud on the show Sweet Home depicting keyword importance in the tweets:
Python dataframe storing tweets extracted from Twitter API via Tweepy:
The drama show Sweet Home has more positive polarity than Crash Landing on You
Attempt #2B : Stock predictions
Attempt #2B : Stock predictions
My code could be applied for all sorts of search criteria, not just Korean dramas.
My code could be applied for all sorts of search criteria, not just Korean dramas.
Coming from Capital Markets technology, I know there are sophisticated algorithms that initiate trade orders in reaction to news such as the outcome of a company's quarterly report and sentiments based on analysts reports and from social media.
Coming from Capital Markets technology, I know there are sophisticated algorithms that initiate trade orders in reaction to news such as the outcome of a company's quarterly report and sentiments based on analysts reports and from social media.
I checked Yahoo!Finance for recent reported earnings and found one on January 14, 2021 for Taiwan Semiconductor (disclosure: I own some stock in my portfolio).
I checked Yahoo!Finance for recent reported earnings and found one on January 14, 2021 for Taiwan Semiconductor (disclosure: I own some stock in my portfolio).
Their Q4 (fourth quarter) reporting on the day beat the expected Earnings Per Share by $0.02. When the company outperforms what "the street" expects then the stock price generally goes up as a result.
Their Q4 (fourth quarter) reporting on the day beat the expected Earnings Per Share by $0.02. When the company outperforms what "the street" expects then the stock price generally goes up as a result.
Here's a look at sentiment before and after their earnings are reported on January 14.
Here's a look at sentiment before and after their earnings are reported on January 14.
The positive sentiment has gone up from 38% to 46%, but given expected earnings was marginally outperformed it could also be due to other factors. For example this stock has been increasing sharply by 140% since COVID was announced as pandemic in March 2020.
The positive sentiment has gone up from 38% to 46%, but given expected earnings was marginally outperformed it could also be due to other factors. For example this stock has been increasing sharply by 140% since COVID was announced as pandemic in March 2020.
Attempt #2C : Pororo vs Peppa Pig
Attempt #2C : Pororo vs Peppa Pig
My kid loves the Korean children's character called Pororo. His cousin is a huge fan of Peppa Pig.
My kid loves the Korean children's character called Pororo. His cousin is a huge fan of Peppa Pig.
Which one wins in the Twitter world?
Which one wins in the Twitter world?
Its really close when you look at percentages on a pie chart, but the histogram demonstrates that Pororo has more positive polarity.
Its really close when you look at percentages on a pie chart, but the histogram demonstrates that Pororo has more positive polarity.
I really enjoyed this exercise and it's amazing to see the variety of sentiment analyzers that have been built, making coding something quick very easy.
I really enjoyed this exercise and it's amazing to see the variety of sentiment analyzers that have been built, making coding something quick very easy.
I ended up using VADER which is attuned to sentiments expressed in social media. In future I would like to try Stanford's Core NLP which was trained on movie reviews where a reviewer might discuss both positive and negative aspects in the same sentence.
I ended up using VADER which is attuned to sentiments expressed in social media. In future I would like to try Stanford's Core NLP which was trained on movie reviews where a reviewer might discuss both positive and negative aspects in the same sentence.
In the future I'd also like to incorporate cloud tools such as Amazon (AWS) Lambdas for serverless compute and Lex for voice interface.
In the future I'd also like to incorporate cloud tools such as Amazon (AWS) Lambdas for serverless compute and Lex for voice interface.
I would like to acknowledge and thank the following for their influence and inspiration:
I would like to acknowledge and thank the following for their influence and inspiration:
James Hiu : his Power BI charts on sentiment analysis put my charts to shame. Thank you for proof reading this!
James Hiu : his Power BI charts on sentiment analysis put my charts to shame. Thank you for proof reading this!
Ken Jee - Was Captain Marvel bad? A Sentiment Analysis of Twitter Data : starting to check out his YouTube videos for fun ideas and education
Ken Jee - Was Captain Marvel bad? A Sentiment Analysis of Twitter Data : starting to check out his YouTube videos for fun ideas and education
VADER Sentiment Analysis on Algorithmic Trading : reading this to get new ideas for future projects using sentiment analysis for algo trading
VADER Sentiment Analysis on Algorithmic Trading : reading this to get new ideas for future projects using sentiment analysis for algo trading