If you haven't seen it already, please checkout my article Creating your own self-hosted YouTube, where I detail how I have created my own local YouTube. This guide will be something quite similar, but for the Instagram platform instead.
I love Instagram for the same reasons that I love YouTube. There is a bunch of creative content! Especially for my interest in construction and renovation, where e.g. carpentry_bymar is a profile that has been cranking out quality content.
Again there are multiple reasons for creating your own local self-hosted Instagram, but the main ones for me are datahoarding and privacy. Even though we might like to think it, content doesn't stay on the internet forever and Facebook doesn't need to know everything about you and how you interact with the content. With this method you can also somewhat obfuscate your interests by interacting with random content you don't care about.
Anyway let's take a look at how we grab information from Instagram.
Instaloader
Instaloader is a Python tool to download pictures and videos along with their captions and other metadata from Instagram. It is possible to run it directly from the CLI, but I'll be running it from various Python scripts.
It is important to keep Instaloader updated. I have experienced a couple of times that Instaloader stops working, where an update saves the day.
Setting Instaloader up
In order to use Instaloader properly you need to get an Instagram account. You can use a dummy account. Put the login information into a .env-file.
import dotenv
import instaloader
dotenv.load_dotenv('.env')
L = instaloader.Instaloader()
L.login(os.environ.get('IG_USER'), os.environ.get('IG_PASSWORD'))
Now that we are logged in, we can start getting some information.
Getting information on an Instagram profile
I have database table with information about profiles setup. I simply add the username of the profile, that I want to download in a new row and the following scripts will grab rest of the information. You can grab the basic information about a profile like this:
profile = instaloader.Profile.from_username(L.context, PROFILE_NAME)
print(profile.profile_pic_url) # URL to the profile picture.
print(profile.biography) # Biography text
print(profile.external_url) # If the profile has setup an URL.
Replace PROFILE_NAME with the Instagram handle of the profile you want to look at. Then you can run through all the profiles posts like this:
for index, post in enumerate(profile.get_posts()):
print(datetime.datetime.now(), '- Profile', user_index + 1, '/', len(users), '-', profile.username, '- Post', index + 1, '/', profile.mediacount, end = '\r')
# Save post information.
# post.shortcode, profile.username, post.date_utc, post.typename, post.mediacount, post.caption, post.is_sponsored
for caption_hashtag in post.caption_hashtags:
# Save the hashtags.
for caption_mention in post.caption_mentions:
# Save the mentioned users.
for tagged_user in post.tagged_users:
# Save the tagged users.
This just gives information about all the profiles posts, but it doesn't download them.
Downloading Instagram posts
To download posts, you need their unique shortcode, which we just grabbed in the code above. With the shortcode we can download the post like this:
This downloads the post behind the shortcode into the download folder. A post is at least 3 files and can easily be more, because each post contains a TXT-file with the post caption and each video has a thumbnail image.
From there I use a combination of glob.glob and shutil.move to move the files to their final folder.
if not os.path.exists(INSTAGRAM_LOCATION + PROFILE_NAME):
os.makedirs(INSTAGRAM_LOCATION + PROFILE_NAME)
for file in glob.glob(str(pathlib.Path(__file__).parent.absolute()) + '/download/*'):
filename = file.split('/')[-1]
shutil.move(file, INSTAGRAM_LOCATION + PROFILE_NAME + '/' + filename)
Replace INSTAGRAM_LOCATION with the folder where you wish to store the downloaded posts. The if-statement ensures that there exists a folder for the given profile in your Instagram folder.
The for loop finds all the files in the download folder and moves them to the profiles folder within your Instagram folder. Instaloader names the posts with a timestamp when downloading, so I don't have to rename the files to keep them organized.
Throttling your downloads
Instaloader has build in throttling, but I don't think it's working properly (atleast not for me). Therefore I let my script sleep a couple of minutes between downloads. You'll have to figure out your own magic number, but you'll figure it out rather fast, because Facebook is aggressive with blocking too many requests.
import time
time.sleep(2 * 60)
You may want to notify yourself somehow, should your scripts run into any issues (for example throttling). I do this with Slack, that I use as a logging tool.
Setting up your cronjobs
New content is available on Instagram all the time, therefore you may want to run your scripts in a schedule. You can do this on Linux by adding a cronjobs. Type crontab -e into your terminal to edit them. I have mine
Remember to alter the path to match your scripts.
The “get_profiles.py”-script checks for new content and saves information about posts to the database. Then the “download_posts_daily.py”-script a bit later actually downloads the images and videos from the posts. This combination runs daily, but you can run it however you like.
Remember to limit your download script to a number of posts if you schedule it. For example if you throttle your script to run every 5 minutes, then you can only download 288 posts every 24 hours. If you download more, you may end up running 2 scripts at the same time and effectively bypassing your throttling. Alternatively you can get your script to kill already running instances of your script.
Setting up your webserver
Since I am already running a Nginx webserver on my local network, that's what I'll be using.
The Instagram videos and pictures gets downloaded to a folder outside of the root Nginx folder, therefore I need to create an alias in the Nginx configuration file (e.g. /etc/nginx/sites-available/default.conf).
location /zfs {
alias /zfs
}
This alias states that the /zfs folder (location) on the server can be accessed from my webserver with the URL extension /zfs (e.g. 192.168.1.10/zfs). They don't have to match, you can setup it up anyway you want.
Creating your own Instagram website
Just like my YouTube website, the Instagram website is rather simple. However it is not quite as straightforward. My solution has a feed (obviously), a gallery of profiles, and a status list.
The feed shows posts in a reverse chronological order (showing newest posts first). Each post is shown with the main image/thumbnail, a timestamp, the profile name, and the first 200 characters of the post text. I can then click on the image and see the entire post.
The gallery of profiles is a list of the profiles in alphabetic order, where I can see the profiles thumbnail. Clicking on their image takes me to their profile, which shows a feed with only their posts.
The status list is just a table with all the profiles. Each row shows a profile name (with link), how many posts are downloaded, how many posts are not yet downloaded, and a total amount of posts. This helps me keep an overview, especially when adding new profiles.
Different types of Instagram posts
There are 3 different types of Instagram posts.
GraphImage
GraphSidecar
GraphVideo
GraphImage is a single image. GraphVideo is a video with a thumbnail image. GraphSidecar is a post with more than 1 item, where the items can be an image or a video. Both GraphVideo and GraphSidecar has a thumbnail of the videos — keep this in mind when showing a post.
Wrapping up
So this is basically how I have made my own locally hosted Instagram website. With the Python package Instaloader you can easily grab information from Instagram and even download the content. Instaloader can be used for creating your own Instagram bot, but in this article I have mainly focused on downloading content.
Let me know in the comment section if you have done something similar, have any tips and tricks, or have any questions. Happy downloading!
You might also enjoy
How to easily web scrape any website with Python
Published 2024-05-03
Datahoarding
Notes
Python
Web development
Learn how to easily web scrape any website using Python. I go through the various techniques I use.
EXIF (Exchangeable Image File Format) data is information that is embedded within digital images and is automatically generated by digital cameras and smartphones.