Useful links:
Download Python
Download Selenium
Japanese characters in Unicode
Related Topics:
In order to extract the song lyrics, simply run the (retrieve_song_lyrics) script.

You can download the scripts here
The following script (song_lyrics_analysis) analyzes the song lyrics that were retrieved from the website. The result will be the list of the most used Japanese Kanji characters and Japanese words from the songs.

You can download the scripts here
Read the following information in order to understand the whole process:

Step 1 - Download and install both, Python and Selenium.

Step 2 - Download all the files and place them in a folder.



Step 3 - Run the "retrieve_song_lyrics" script. Once you're done, a "jp_songs" folder will be created and it will store all the data that was extracted from the songs.





Step 4 - Run the "song_lyrics_analysis" script. Once you're done, an "analysis" folder will be created containing two different folders: "kanji" and "words." Inside the "words" folder, there will be another folder and three text files.

The "jp_words" text file will only store all the data from your "data" folder, the "split_data" folder will split that same data into several text files, and the program will read each file, one by one.

The "official_jp_words" text file will contain the final list of the Japanese words along with their number of repetitions.










Step 5 - On the other hand, inside the "kanji" folder, you will have three text files. The "kanji_chars" text file will only store all the Japanese Kanji characters that were found in the songs.

The "official_kanji" text file will contain the final list of the Kanji characters along with their number of repetitions.




web counter
Last Updated: October 27, 2017
© 2011-2017, Luis A. Hernandez
comments powered by Disqus