26 Apr 2024
I love YouTube. You can find free educational content about almost anything. It's how I have learned a ton about home renovation, but YouTube has become riddled with ads and you can easily end up in a death scroll — wasting your time. Furthermore videos gets removed from YouTube all the time. Both creators and the platform can have their reasons for removing videos — so get them while they are available! I am also somewhat of a datahoarder, so this project is right up my alley.youtube = googleapiclient.discovery.build(
'youtube', 'v3', developerKey = MY_DEVELOPER_KEY
)
Where you need to replace MY_DEVELOPER_KEY with your developer key.
request = youtube.channels().list(
part = 'snippet',
id = 'UCX6OQ3DkcsbYNE6H8uQQuVA'
)
response = request.execute()
channel_title = response['items'][0]['snippet']['title']
channel_description = response['items'][0]['snippet']['description']
channel_thumbnail_url = response['items'][0]['snippet']['thumbnails']['default']['url']
YouTube channels can also have playlists, which typically is a list of their videos that belong together in some way (e.g. a project). I want to grab the playlists of the channels I am interested in. You can loop through a channels playlists like this:
request = youtube.playlists().list(
part = 'snippet',
channelId = 'UCX6OQ3DkcsbYNE6H8uQQuVA'
)
response = request.execute()
for item in response['items']:
playlist_title = item['snippet']['title']
playlist_description = item['snippet']['description']
playlist_thumbnail_url = item['snippet']['thumbnails']['default']['url']
But I am also interested in certain playlists where I am not necessarily interested in the channel. You can get information about specific playlists like this:
request = youtube.playlists().list(
part = 'snippet',
id = PLAYLIST_ID
)
response = request.execute()
playlist_title = response['items'][0]['snippet']['title']
playlist_description = response['items'][0]['snippet']['description']
playlist_thumbnail_url = response['items'][0]['snippet']['thumbnails']['default']['url']
The playlist ID can be found in a similar fashion as the channel ID.
request = youtube.channels().list(
part = 'contentDetails',
id = CHANNEL_ID
)
response = request.execute()
# Grab the channels uploads.
playlistId = response['items'][0]['contentDetails']['relatedPlaylists']['uploads']
request = youtube.playlistItems().list(
part = 'snippet',
playlistId = playlistId,
)
response = request.execute()
for video in response['items']:
video_id = video['snippet']['resourceId']['videoId']
channel_title = video['snippet']['channelTitle']
title = video['snippet']['title']
thumbnail = video['snippet']['thumbnails']['default']['url']
description = video['snippet']['description']
published_at = video['snippet']['publishedAt']
For all the playlists you are interested in, you already know the playlist ID, so there you can skip the first part.
import youtube_dl
ydl_opts = {
'format': 'best',
'writesubtitles': True,
'subtitlesformat': 'srt',
'subtitleslangs': ['da', 'en']
}
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
ydl.download(['https://www.youtube.com/watch?v=' + VIDEO_ID])
The option “format: best” means that YouTube-DL grabs the best format for both video and audio. The rest of the options are for subtitles, where I want danish and english.
import urllib.request
r = urllib.request.urlopen(THUMBNAIL_URL)
with open(IMAGE_LOCATION + VIDEO_ID + '.jpg', 'wb') as f:
f.write(r.read())
THUMBNAIL_URL is the URL for the thumbnail that we grabbed from the YouTube API. IMAGE_LOCATION is the local folder where you want to save the image and VIDEO_ID is the ID of the video. This is just the way I do it, you can save the images with whatever name you want.
0 6,18 * * * python3 /PATH_TO_MY_SCRIPTS/get_channels.py
5 6,18 * * * python3 /PATH_TO_MY_SCRIPTS/get_playlists.py
10 6,18 * * * python3 /PATH_TO_MY_SCRIPTS/get_videos.py
15 6,18 * * * python3 /PATH_TO_MY_SCRIPTS/download_images.py
30 6,18 * * * python3 /PATH_TO_MY_SCRIPTS/download_videos.py
Replace PATH_TO_MY_SCRIPTS with whatever location you have your script(s) at.
location /zfs {
alias /zfs
}
This alias states that the /zfs folder (location) on the server can be accessed from my webserver with the URL extension /zfs (e.g. 192.168.1.10/zfs). They don't have to match, you can setup it up anyway you want.
How to easily web scrape any website with Python
Published 2024-05-03
Datahoarding
Notes
Python
Web development
Learn how to easily web scrape any website using Python. I go through the various techniques I use.
Read the post →Creating your own self-hosted Instagram
Published 2024-04-26
Datahoarding
Python
With the use of Instaloader and Laravel, you can create your own self-hosted Instagram. Learn how to use Instaloader to download content from Instagram.
Read the post →Removing EXIF data from an image using Python
Published 2024-09-18 — Updated 2024-11-21
Python
EXIF data is information that is embedded within digital images and is automatically generated by digital cameras and smartphones.
Read the post →