Use the same command as installing with the added --upgrade command.
pip install --upgrade instaloader
Requirements.txt
Requirements.txt is a text file that is usually located in the root of a project. The file describes the minimum versions of PIP packages that is required to run the project. Each package has its own line and is listed like this:
python-dotenv==0.19.2
To install all the packages for the project you can run the following command:
pip install -r requirements.txt
This will install the required packages and ensure that you are able to run the project.
Finding your packages for the requirements.txt
Running pip list will list all PIP packages installed and display what version you have. Grep'ing from this list will help you quickly getting your version numbers. E.g.:
pip list | grep dotenv
The above command will display something like this:
python-dotenv 0.19.2
Using .env-files for storing credentials
I use the Python package dotenv to load my environment variables. It's used like this:
import dotenv
dotenv.load_dotenv()
The above loads variables saved in your .env-file, which contains credentials in the following format:
DB_PASSWORD=qwerty
You can then use the variables in your script like this:
password = os.environ.get(DB_PASSWORD)
APIs
Several well-known APIs have Python packages that help dealing with them. Below I have outlined some of the packages I use for certain APIs.
DALLE-2 was paywall-released by an extremely well-funded company. However a group of independent researches released their own model (Stable Diffusion) that you can use in just a few lines of code for free.
from torch import autocast
from diffusers import StableDiffusionPipeline, LMSDiscreteScheduler
# this will substitute the default PNDM scheduler for K-LMS
lms = LMSDiscreteScheduler(
beta_start = 0.00085,
beta_end = 0.012,
beta_schedule = scaled_linear
)
pipe = StableDiffusionPipeline.from_pretrained(
CompVis/stable-diffusion-v1-4,
scheduler = lms,
use_auth_token = True
).to(cuda)
prompt = a photo of an astronaut riding a horse on mars
with autocast(cuda):
image = pipe(prompt)[sample][0]
image.save(astronaut_rides_horse.png)
Converting a file from one encoding to another in Python
I had a project where I would receive txt-files from a Windows machine. The files would be encoded with windows-1252 when I received them. In the beginning I would manually convert them to utf-8, however this quickly grew tedious. Therefore I created the following script. The script converts windows-1252 encoded files to utf-8.
import glob
import magic
files = glob.glob('*.txt')
for file in files:
blob = open(file, 'rb').read()
m = magic.open(magic.MAGIC_MIME_ENCODING)
m.load()
encoding = m.buffer(blob)
if encoding == 'iso-8859-1':
target = open(file, 'wb')
target.write((blob.decode(encoding)).encode('utf-8'))
Multithreading in Python
It's quite easy to start utilizing multihreading or multiprocessing in Python. You simply need to define a function and a list of input that is given to your function. Here's a simple example:
from multiprocessing import Pool
def myFunction(x):
print(x * x)
myList = range(1, 11)
with Pool(processes = 4) as p:
p.map(myFunction, myList)
The script uses 4 cores, but you can define how many you want to use. On Ubuntu you can grab the number of CPU cores available with the nproc command. So to utilize the maximum available cores in your Python scripts, you can grab the number like this:
eng-to-ipa is a Python package that uses the CMU Pronouncing dictionary similar to the Pronouncing package described below. The package can be used to convert english words into International Phonetic Alphabet (IPA). It's used like so:
import eng_to_ipa as ipa
ipa.convert(The quick brown fox jumped over the lazy dog.)
# 'ðə kwɪk braʊn fɑks ʤəmpt ˈoʊvər ðə ˈleɪzi dɔg.'
English words
The Python package english-words-py contains 4 lists of english words.
english_words_set A set of English words containing both upper- and lower-case letters; with punctuation.
english_words_lower_set A set of English words containing lower-case letters; with punctuation.
english_words_alpha_set A set of English words containing both upper- and lower-case letters; with no punctuation.
english_words_lower_alpha_set A set of English words containing lower-case letters; with no punctuation.
It's used like so:
from english_words import english_words_set
'ghost' in english_words_set
# True
NLTK
Edit distance
The edit distance between words can be found like this:
nltk.edit_distance(hunpty, dumpty) # 1
Wordnet
WordNet is a lexical database for the English language. Holds all the english words, their descriptions, examples, synonyms, antonyms and so on.
from nltk.corpus import wordnet
Pronouncing
Pronouncing is a Python package that provides a simple interface for the CMU Pronouncing dictionary. It's a helpful tool for finding the syllables in words, words that rhyme, words that sound similar and so on.
OS specific notes and notes relating to the os-package.
How can I create a directory?
import os
if not os.path.exists('my_folder'):
os.makedirs('my_folder')
PDF
Splitting a x page PDF into x 1 page PDFs / Splitting a PDF into multiple PDFs
I don't really know how this situation occured, but we had a 95 page PDF that we needed to be 95 individual 1 page PDFs. The solution is based on this StackOverflow answer.
from PyPDF2 import PdfFileWriter, PdfFileReader
inputpdf = PdfFileReader(open(myfile.pdf, rb))
for i in range(inputpdf.numPages):
output = PdfFileWriter()
output.addPage(inputpdf.getPage(i))
with open(myfile-%s.pdf % i, wb) as outputStream:
output.write(outputStream)
Weasyprint
Weasyprint is a free and open source package that enables you to generate beautiful PDFs from HTML.
You might also enjoy
How to easily web scrape any website with Python
Published 2024-05-03
Datahoarding
Notes
Python
Web development
Learn how to easily web scrape any website using Python. I go through the various techniques I use.
EXIF (Exchangeable Image File Format) data is information that is embedded within digital images and is automatically generated by digital cameras and smartphones.