In order to be able to extract the comments from a YouTube channel, four things are needed:
Where do I find the channel id?
Let's assume you have the following YouTube channel link:
The text in red represents your channel id; however, sometimes the channel id is not included in the link.
For example, if you have a link similar to the following:
You only have the user's username, not even the channel name or title.
In this case, you will have to right click anywhere on the webpage and select "View Source" or "View Page Source" (depending on the browser you are using). Once you have the source code displayed on your screen, search for the keyword "channelId" and you will find your channel id inside the "content" tag.
How can I install the LXML, the CSS SELECT, and the API Client libraries for Python?
Before attempting to install the libraries, you need to make sure the libraries were not already pre-installed when you installed Python.
Go to the directory where the Python files are located, it usually is in this path: C:\Python34\Scripts
In your PC, search Command Prompt
Try typing the following in the Command Prompt:
If for some reason you got an error, try updating the library that failed to be installed only:
If the "requests" library was not installed:
If the "lxml" library was not installed:
If the "cssselect" library was not installed:
If you did not get any errors after updating the libraries, you can proceed to create the YouTube API key using these instructions; otherwise, you will have to install the libraries individually.
Once you have downloaded and installed Python with all of its required libraries, and you have created your YouTube API key, the last thing to do will be to run the script.
The script YT_comments_analysis analyzes the comments that were extracted from the YouTube channels that were declared on the "channelIds" list. The result will be the list of the most used Japanese Kanji characters and Japanese words from the YouTube comments.
I got the original code from this website, also from this website, and from this website. I just simply combined them and modified some parts of the code to make it more user friendly for users who have no prior experience in programming.
You can download the script here
Read the following information in order to understand the whole process:
Step 1 - Download and install Python. Make sure you have created a YouTube API key (use these instructions).
Step 2 - Download all the files and place them in a folder.
Step 3 - Run the YT_comments_analysis script. Once you're done, three different folders will be created: 1) playlists, 2) comments, and 3) analysis.
The "playlists" folder will store all the data that was extracted from the YouTube channels, including the playlist name; the video name; the uploader name; the date the video was uploaded to YouTube; the video id; and the number of views, likes, dislikes, and comments each video has at the time of the analysis.
The "comments" folder will store all the comments that were typed in Japanese language.
The "analysis" folder will contain two other folders: "kanji" and "words." Inside of each of these folders, there will be another folder and three text files.
The jp_words text file will only store all the data from your "comments" folder, the "split_data" folder will split that same data into several text files, and the program will read each file, one by one.
The official_jp_words text file will contain the final list of the Japanese words along with their number of repetitions.
Step 4 - On the other hand, inside the "kanji" folder, you will have three text files. The "kanji_chars" text file will only store all the Japanese Kanji characters that were found in the YouTube comments.
The "official_kanji" text file will contain the final list of the Kanji characters along with their number of repetitions.