漢字プロジェクト - The Kanji Project

This blog presents information about multiple studies that were conducted in social media, song lyrics, websites, and novels. The analysis consisted of extracting tweets from Japanese idols, comments from Japanese YouTube channels, song lyrics from Japanese songs, content from Japanese websites, and text from Japanese novels. Their purpose were to retrieve not only the most common Japanese Kanji characters but also the most common Japanese words.

What inspired you to research the most common Kanji characters?

Two different factors:

  • Watching this YouTube video about the history of the Japanese Kanji characters
  • The fact that some people claim they came out with a list of Kanji characters without providing the data collected and the source code

  • Why to focus on several areas at once and not just one?

    Because each individual study has its own bias. For example, the social media analysis consists of people I'm following. This is due to the fact that most people don't list their location, making it impossible for someone to use the location for the analysis. By combining the results of all of the studies, the bias is minimized.

    Some people think learning the Kanji characters that are commonly used by Japanese people is useless. They base their argument in the idea that people still have to memorize uncommon characters. What is your opinion about this statement?

    While their statement might be true, it is very important to realize that Kanji characters can also become obsolete. Not all people have the opportunity to study in Japan; therefore, people have to rely on the Internet. With so many resources on the Internet, it is difficult to differentiate between outdated content and updated content.

    Statistics

  • The projects were run using a PC with 16 GB of RAM
  • Python's version 3.4.3 was used for all of the studies
  • 75,467,261 tweets and 25,168,139 YouTube comments were analyzed
  • 50,472 Japanese song lyrics were analyzed
  • 215 Japanese novels were analyzed
  • 18,767 Japanese websites were analyzed
  • It took 145 days to process 75 million tweets, 60 days for 25 million YouTube comments, 76 days for the song lyrics, 2 days for the novels, and 192 days for the websites' text
  • The master project took 475 days in total, that is, 1 year, 3 months, and 20 days
  • Textbook

    The Most Used Japanese Kanji Characters: An Analysis of Japanese Social Media, Song Lyrics, Websites, and Novels

    Author: Luis A. Hernandez



    This textbook combines the results of four different studies that were conducted in social media, song lyrics, websites and novels. For the purpose of the analysis, a combination of 75 million tweets in Japanese, 25 million YouTube comments in Japanese, 50,000 song lyrics in Japanese, 215 Japanese novels, and 18,000 Japanese websites were used as the sample data.



    E-books

    The Most Used Japanese Kanji Characters in Social Media

    Author: Luis A. Hernandez



    This document presents a research that was conducted on both, Twitter and YouTube. For the purpose of the analysis, a combination of 75 million tweets and 25 million YouTube comments were used as the sample data.



    The Most Used Japanese Kanji Characters in Song Lyrics

    Author: Luis A. Hernandez



    This document presents a research that was conducted on Japanese song lyrics. For the purpose of the analysis, 50,000 song lyrics were used as the sample data.



    The Most Used Japanese Kanji Characters in Websites

    Author: Luis A. Hernandez



    This document presents a research that was conducted on Japanese websites. For the purpose of the analysis, 18,767 websites were used as the sample data.



    The Most Used Japanese Kanji Characters in Novels

    Author: Luis A. Hernandez



    This document presents a research that was conducted on Japanese novels. For the purpose of the analysis, 215 novels were used as the sample data.





    comments powered by Disqus