Useful links:
Download Python
Download Python's lxml library
Download Python's cssselect library
Create YouTube API Key
Top 100 Most Subscribed Japanese YouTube Channels
In order to be able to extract the comments from a YouTube channel, four things are needed:
  • The Channel ID
  • The LXML library
  • The CSS SELECT library
  • The API Client library
  • The YouTube API Key

Where do I find the channel id?

Let's assume you have the following YouTube channel link:

https://www.youtube.com/channel/UCW_yDyDfu1bpqDNxeobW02Q/playlists

The text in red represents your channel id; however, sometimes the channel id is not included in the link.

For example, if you have a link similar to the following:

https://www.youtube.com/user/aliceprojectcompany/playlists

You only have the user's username, not even the channel name or title.

In this case, you will have to right click anywhere on the webpage and select "View Source" or "View Page Source" (depending on the browser you are using). Once you have the source code displayed on your screen, search for the keyword "channelId" and you will find your channel id inside the "content" tag.



How can I install the LXML, the CSS SELECT, and the API Client libraries for Python?

Before attempting to install the libraries, you need to make sure the libraries were not already pre-installed when you installed Python.

Go to the directory where the Python files are located, it usually is in this path: C:\Python34\Scripts

In your PC, search Command Prompt

Try typing the following in the Command Prompt:

  • Type cd C:\Python34\Scripts and press the "Enter" key

  • Type pip install requests and press the "Enter" key

  • Type pip install lxml and press the "Enter" key

  • Type pip install cssselect and press the "Enter" key

  • Type pip install --upgrade google-api-python-client and press the "Enter" key

If for some reason you got an error, try updating the library that failed to be installed only:

If the "requests" library was not installed:
  • Type pip install --upgrade requests and press the "Enter" key

  • Type pip install requests and press the "Enter" key

If the "lxml" library was not installed:
  • Type pip install --upgrade lxml and press the "Enter" key

  • Type pip install lxml and press the "Enter" key

If the "cssselect" library was not installed:
  • Type pip install --upgrade cssselect and press the "Enter" key

  • Type pip install cssselect and press the "Enter" key

If you did not get any errors after updating the libraries, you can proceed to create the YouTube API key using these instructions; otherwise, you will have to install the libraries individually.

Once you have downloaded and installed Python with all of its required libraries, and you have created your YouTube API key, the last thing to do will be to run the script.

The following script (YT_comments_analysis) analyzes the comments that were extracted from the YouTube channels that were declared on the "channelIds" list. The result will be the list of the most used Japanese Kanji characters and Japanese words from the YouTube comments.

I got the original code from this website, also from this website, and from this website. I just simply combined them and modified some parts of the code to make it more user friendly for users who have no prior experience in programming.
Read the following information in order to understand the whole process:

Step 1 - Download and install Python. Make sure you have created a YouTube API Key.

Step 2 - Download all the files and place them in a folder.



Step 3 - Run the "YT_comments_analysis" script. Once you're done, three different folders will be created: 1) playlists, 2) comments, and 3) analysis.

The "playlists" folder will store all the data that was extracted from the YouTube channels, including the playlist name; the video name; the uploader name; the date the video was uploaded to YouTube; the video id; and the number of views, likes, dislikes, and comments each video has at the time of the analysis.

The "comments" folder will store all the comments that were typed in Japanese language.

The "analysis" folder will contain two other folders: "kanji" and "words." Inside of each of these folders, there will be another folder and three text files.









The "jp_words" text file will only store all the data from your "comments" folder, the "split_data" folder will split that same data into several text files, and the program will read each file, one by one.

The "jp_word_freq" text file will contain the number of repetitions of each Japanese word whereas the "300_jp_words" text file will only have the first 300 Japanese words.





On the other hand, inside the "kanji" folder, you will have three text files. The "kanji_chars" text file will only store all the Japanese Kanji characters that were found in the YouTube comments.

The "kanji_freq" text file will contain the number of repetitions of each Japanese Kanji character whereas the "300_kanji" text file will only have the first 300 Japanese Kanji characters.






Last Updated: January 30, 2017
© 2011-2017, Luis A. Hernandez
comments powered by Disqus