The Japanese novels are found here; however, the website is using anti-scraping technology, which means Selenium and BeautifulSoup can't be used for retrieving the Japanese novels.
The script novels_analysis analyzes the novels that were retrieved from the website. The result will be the list of the most used Japanese Kanji characters and Japanese words from the novels.
You can download the script here
Read the following information in order to understand the whole process:
Step 1 - Download all the files and place them in a folder.
Step 2 - Run the "novels_analysis" script. Once you're done, an "analysis" folder will be created containing two different folders: "kanji" and "words." Inside the "words" folder, there will be another folder and three text files.
The "jp_words" text file will only store all the data from your "jp_novels" folder, the "split_data" folder will split that same data into several text files, and the program will read each file, one by one.
The "official_jp_words" text file will contain the final list of the Japanese words along with their number of repetitions.
Step 3 - On the other hand, inside the "kanji" folder, you will have three text files. The "kanji_chars" text file will only store all the Japanese Kanji characters that were found in the novels.
The "official_kanji" text file will contain the final list of the Kanji characters along with their number of repetitions.