漢字プロジェクト - The Kanji Project

This webpage presents information about a research that was conducted on Japanese websites. Its purpose was to retrieve not only the most common Japanese Kanji characters but also the most common Japanese words used in websites.

Advantages of the research

  • Japanese websites reflect everything related to the Japanese society
  • Limitations of the research

  • Can only extract the content from popular websites

  • It is important to point out that those Japanese Kanji characters that are exclusively used for Japanese names and the characters that are used to abbreviate words, were not included as part of the analysis.

    Statistics

  • This project was run using a PC with 16 GB of RAM
  • Python's version 3.4.3 was used for this research
  • 18,767 Japanese websites were analyzed
  • It took 192 days to process the websites' text

  • Download the complete list of the 2,136 Japanese Kanji characters and 1,027 Japanese words that are used the most in websites along with their pronunciations, meanings, examples, and sentences:





    The following list only represents the first 58 Kanji characters that were extracted from those websites, it is not the complete list.





    These were the first 58 Japanese words:





    On the other hand, the sentences that are provided for each Japanese Kanji character, were modified using the following guideline:


  • Replaced いいえ for いや
  • Replaced ええ for はい
  • Replaced the first が (when used as a particle) for は
  • Removed あなたは and 私は when they were the first words in the sentence
  • Added ? after か when asking a question
  • Highlights

    The following list is a comparison between the 2,136 Kanji characters that are taught in Japanese schools and the first 2,136 Kanji characters that are used the most in websites.
    The characters in red represent those Kanji from the websites.



    The following list is a comparison between the 2,211 Kanji characters that are taught in order to take the JLPT and the first 2,136 Kanji characters that are used the most in websites.
    The characters in red represent those Kanji from the websites.



    Download the complete list of the 2,136 Japanese Kanji characters and 1,027 Japanese words that are used the most in websites along with their pronunciations, meanings, examples, and sentences:







    comments powered by Disqus