GitHub link: https://github.com/developmentAC/sentiminer
Table of contents
Sentiment analysis is the semantic study of emotion in language. This form of analysis is used to determine the level of satisfaction or dissatisfaction in textual data such as reviews, survey responses, online and social media posting and similar. Due to the often high volume of feedback from the above types of textual data, sentiment analysis provides an automated solution for determining the “feeling” of the text.
This program performs sentiment analysis on textual data using the AFinn (from Finn Årup Nielsen) sentiment analysis method. The accompanying AFinn data file with this project was isolated from the R programming open source project Tidyverse (See reference below).
Each word that is able to convey a sensation of optimism or pessimism in the English language has been paired with a curated numerical value called the sentiment score. The scale of pessimism to optimism ranges from -5 to 5. A short example of some of the words of the AFinn dictionary is given below.
Word,Score abandon,-2 abandons,-2 abandoned,-2 absentee,-1 absentees,-1 aboard,1 abducted,-2 abduction,-2 abductions,-2
Sentiminer isolates each word of the the user-inputted text to determine a corresponding sentiment score. Shown below, the summation of all scores is found for the body of text, and is then divided by the total number of hits (i.e., found words) having a score for a degree of pessimism or optimism.
summation = score(word_1) + … score(word_n)
sentiment score of the text = (summation / n)
We note that the score value is the measurement of the sentiment in the text sample.
Sentiminer has been written in Python (ver3) and can be run with the commands shown below. The inputs may be a text file, or the user can be prompted to write a sentence. In both cases, the analysis is the same.
To be prompted for a sentence from the user:
python3 sentiminer.py -S
To enter a text file, the following command maybe used.
python3 sentiminer.py textFile.txt
For the sentence,
share the load, share the love, the resulting score is 1.67 which is calculated from three words as shown in the figure of the output below. The score is positive to suggest that the text is generally optimistic.
For the sentence,
bullying buys depression, the resulting score is -2 which is calculated from the word, “bullying” as shown in the figure of the output below. The score is negative to suggest that the text is generally pessimistic.
Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). “Welcome to the tidyverse.” Journal of Open Source Software, 4(43), 1686. doi: 10.21105/joss.01686.)
Check back often to see the evolution of this project!! Sentiminer is a work-in-progress and updates are likely to come.
If you would like to contribute to this project, then please do! For instance, if you see some low-hanging fruit or task that you could easily complete, that could add value to the project, then I would love to have your insight.
Otherwise, please create an Issue for bugs or errors. Since I am a teaching faculty member at Allegheny College, I may not have all the time necessary to quickly fix the bugs and so I would be very happy to have any help that I can get from the OpenSource community for any technological insight. Much thanks in advance.
If you appreciate this project, please consider clicking the project’s Star button. :-)