This article will cover up the following:
- Taking tweets as per certain hashtags(you may apply other filters if you want to).
- Saving these tweets in a text file for post analysis.
- Reading these tweets and analyzing them.
- Saving the results of 3 in a CSV file
import tweepyclass StreamListener(tweepy.StreamListener):
def on_status(self, status):
f = open("#DDvKXIP.txt",'a')
tweetText = status.text
def on_error(self, status_code):
if status_code == 420:
return Falseauth = tweepy.OAuthHandler(settingsTwitter.TWITTER_APP_KEY, settingsTwitter.TWITTER_APP_SECRET)
api = tweepy.API(auth)stream_listener = StreamListener()
stream = tweepy.Stream(auth=api.auth, listener=stream_listener)
stream.filter(languages=["en"], track=["#DDvKXIP", "#KXIPvDD"])
settingsTwitter is the file in which my twitter auths are stored.
I am ignoring the re-tweets, however, re-tweets with some additional status are accepted.
# -*- coding: utf-8 -*-
from watson_developer_cloud import NaturalLanguageUnderstandingV1
from watson_developer_cloud.natural_language_understanding_v1 import Features, EntitiesOptions, KeywordsOptions, CategoriesOptions
import timenatural_language_understanding = NaturalLanguageUnderstandingV1(
version='2018-03-16')filename = '#DDvKXIP.txt'
file = open(filename,'r')
st_results = os.stat(filename)
st_size = st_results
tweets = while 1:
where = file.tell()
line = file.readline()
if not line:
print "no line found, waiting for a 1 seconds"
if (re.search('[a-zA-Z]', line)):
print "the line is: "
response = natural_language_understanding.analyze(
response["tweet"] = line
with open('#DDvKXIP.csv', 'a') as csv_file:
writer = csv.writer(csv_file)
for key, value in response.items():
print "found a line without any alphabet in it, hence not considering."
You might have read your logs(be it of an application or a server, etc) and might have used
tail -f to read the tail of the log file. Similarly, the aforementioned code will read the
#DDvKXIP.txt file from the tail end, will process over the tweets one by one and save the results in
import settingsWatson has my auth keys. Using these keys(which are generated after I create a developer account on IBM) will enable me to use Watson.
if (re.search('[a-zA-Z]', line))
if condition is to ensure that the line read from the txt file is not a newline(not containing only
/n, does not contain only whitespaces and has the ability to read only those tweets which contain something written, something that can be comprehended for Natural Language Understanding.
This module streams Tweets too fast for Watson to handle. For eg, the match gets over at X(say) hours and the processing of tweets generated(assuming streamingTwitter.py is switched off as soon as the match gets over) takes place until X+3 hours.
This tweet will accept English only, however, Hindi written in Roman script is also processed which is nothing but making noise in my signal.
Thanks for the read!