This blog presents information about multiple studies that were conducted in social media, song lyrics, websites, and novels. The analysis consisted of extracting tweets from Japanese idols, comments from Japanese YouTube channels, song lyrics from Japanese songs, content from Japanese websites, and text from Japanese novels. Their purpose were to retrieve not only the most common Japanese Kanji characters but also the most common Japanese words.
What inspired you to research the most common Kanji characters?
Two different factors:
Watching this YouTube video about the history of the Japanese Kanji characters
The fact that some people claim they came out with a list of Kanji characters without providing the data collected and the source code
Why to focus on several areas at once and not just one?
Because each individual study has its own bias. For example, the social media analysis consists of people I'm following. This is due to the fact that most people don't list their location, making it impossible for someone to use the location for the analysis.
By combining the results of all of the studies, the bias is minimized.
Some people think learning the Kanji characters that are commonly used by Japanese people is useless. They base their argument in the idea that people still have to memorize uncommon characters. What is your opinion about this statement?
While their statement might be true, it is very important to realize that Kanji characters can also become obsolete.
Not all people have the opportunity to study in Japan; therefore, people have to rely on the Internet. With so many resources on the Internet, it is difficult to differentiate between outdated content and updated content.
Statistics
The projects were run using a PC with 16 GB of RAM
Python's version 3.4.3 was used for all of the studies
75,467,261 tweets and 25,168,139 YouTube comments were analyzed
50,472 Japanese song lyrics were analyzed
215 Japanese novels were analyzed
18,767 Japanese websites were analyzed
It took 145 days to process 75 million tweets, 60 days for 25 million YouTube comments, 76 days for the song lyrics, 2 days for the novels, and 192 days for the websites' text
The master project took 475 days in total, that is, 1 year, 3 months, and 20 days
Textbook
Author: Luis A. Hernandez
This textbook combines the results of four different studies that were conducted in social media, song lyrics, websites and novels. For the purpose of the analysis, a combination of 75 million tweets in Japanese, 25 million YouTube comments in Japanese, 50,000 song lyrics in Japanese, 215 Japanese novels, and 18,000 Japanese websites were used as the sample data.
E-books
Author: Luis A. Hernandez
This document presents a research that was conducted on both, Twitter and YouTube. For the purpose of the analysis, a combination of 75 million tweets and 25 million YouTube comments were used as the sample data.
Author: Luis A. Hernandez
This document presents a research that was conducted on Japanese song lyrics. For the purpose of the analysis, 50,000 song lyrics were used as the sample data.
Author: Luis A. Hernandez
This document presents a research that was conducted on Japanese websites. For the purpose of the analysis, 18,767 websites were used as the sample data.
Author: Luis A. Hernandez
This document presents a research that was conducted on Japanese novels. For the purpose of the analysis, 215 novels were used as the sample data.
References
Japanese dictionary
Japanese examples
Japanese sentences
Japanese to English translations