Useful links:
Download Python
Download Tweepy
Using Twitter API & Installing Tweepy
Japanese characters in Unicode
There are two different approaches that could be taken in order to extract the tweets from the users you are either following or simply interested in.

1st option (retrieve_tweets_auto):
  • This approach requires you to follow the users from which you are going to retrieve the tweets
  • The API.friends_ids() method will get the user id from each of the users the specified Twitter account is following
  • The API.get_user() method will get the screen name from each user
I got the original code from this website; however, since the code was written for an older version of Python, I had to modify some parts of the code.

You can download the scripts here
2nd option (retrieve_tweets):
  • This approach does not require you to follow the users from which you are going to retrieve the tweets
The following script simply retrieves the tweets from the users you specify in the "screenNamesList" list.

You can download the scripts here
The following script (tweets_analysis) analyzes the tweets that were extracted either from all of the users you follow or from the users that were declared on the "screenNamesList" list. The result will be the list of the most used Japanese Kanji characters and Japanese words from the tweets.

You can download the scripts here
Read the following information in order to understand the whole process:

Step 1 - Download and install both, Python and Tweepy, and make sure you have a Twitter API. Watch this YouTube video to familiarize yourself with these concepts and the applications.

Step 2 - Download all the files and place them in a folder.



Step 3 - Run either the "retrieve_tweets_auto" or the "retrieve_tweets" script. Once you're done, a "data" folder will be created and it will store all the data that was extracted from the tweets. The more users you have, the more times you will have to re-run the script, mainly due to the "time-out" errors.





Step 4 - Run the "tweets_analysis" script. Once you're done, an "analysis" folder will be created containing two different folders: "kanji" and "words." Inside the "words" folder, there will be another folder and three text files.

The "jp_words" text file will only store all the data from your "data" folder, the "split_data" folder will split that same data into several text files, and the program will read each file, one by one.

The "jp_word_freq" text file will contain the number of repetitions of each Japanese word whereas the "300_jp_words" text file will only have the first 300 Japanese words.










Step 5 - On the other hand, inside the "kanji" folder, you will have three text files. The "kanji_chars" text file will only store all the Japanese Kanji characters that were found in the tweets.

The "kanji_freq" text file will contain the number of repetitions of each Japanese Kanji character whereas the "300_kanji" text file will only have the first 300 Japanese Kanji characters.




web counter
Last Updated: August 15, 2017
© 2011-2017, Luis A. Hernandez
comments powered by Disqus