AI

Goodbye os.path: 15 Pathlib Methods to Shortly Grasp The File System in Python | by Bex T. | Apr, 2023

A robotic pal. — By way of Midjourney

Pathlib could also be my favourite library (after Sklearn, clearly). And given there are over 130 thousand libraries, that’s saying one thing. Pathlib helps me flip code like this written in os.path:

import os

dir_path = "/residence/person/paperwork"

# Discover all textual content information inside a listing
information = [os.path.join(dir_path, f) for f in os.listdir(dir_path)
if os.path.isfile(os.path.join(dir_path, f)) and f.endswith(".txt")]

into this:

from pathlib import Path

# Discover all textual content information inside a listing
information = record(dir_path.glob("*.txt"))

Pathlib got here out in Python 3.4 as a alternative for the nightmare that was os.path. It additionally marked an essential milestone for Python language on the entire: they lastly turned each single factor into an object (even nothing).

The most important downside of os.path was treating system paths as strings, which led to unreadable, messy code and a steep studying curve.

By representing paths as fully-fledged objects, Pathlib solves all these points and introduces magnificence, consistency, and a breath of contemporary air into path dealing with.

And this long-overdue article of mine will define a few of the greatest features/options and methods of pathlib to carry out duties that will have been really horrible experiences in os.path.

Studying these options of Pathlib will make the whole lot associated to paths and information simpler for you as an information skilled, particularly throughout information processing workflows the place you must transfer round hundreds of pictures, CSVs, or audio information.

Let’s get began!

Working with paths

1. Creating paths

Virtually all options of pathlib is accessible by means of its Path class, which you should use to create paths to information and directories.

There are a couple of methods you’ll be able to create paths with Path. First, there are class strategies like cwd and residence for the present working and the house person directories:

from pathlib import Path

Path.cwd()

PosixPath('/residence/bexgboost/articles/2023/4_april/1_pathlib')
Path.residence()
PosixPath('/residence/bexgboost')

You can even create paths from string paths:

p = Path("paperwork")

p

PosixPath('paperwork')

Becoming a member of paths is a breeze in Pathlib with the ahead slash operator:

data_dir = Path(".") / "information"
csv_file = data_dir / "file.csv"

print(data_dir)
print(csv_file)

information
information/file.csv

Please, do not let anybody ever catch you utilizing os.path.be a part of after this.

To examine whether or not a path, you should use the boolean perform exists:

data_dir.exists()
True
csv_file.exists()
True

Generally, the complete Path object gained’t be seen, and you must examine whether or not it’s a listing or a file. So, you should use is_dir or is_file features to do it:

data_dir.is_dir()
True
csv_file.is_file()
True

Most paths you’re employed with will probably be relative to your present listing. However, there are circumstances the place you must present the precise location of a file or a listing to make it accessible from any Python script. That is whenever you use absolute paths:

csv_file.absolute()
PosixPath('/residence/bexgboost/articles/2023/4_april/1_pathlib/information/file.csv')

Lastly, if in case you have the misfortune of working with libraries that also require string paths, you’ll be able to name str(path):

str(Path.residence())
'/residence/bexgboost'

Most libraries within the information stack have lengthy supported Path objects, together with sklearn, pandas, matplotlib, seaborn, and many others.

2. Path attributes

Path objects have many helpful attributes. Let’s see some examples utilizing this path object that factors to a picture file.

image_file = Path("pictures/midjourney.png").absolute()

image_file

PosixPath('/residence/bexgboost/articles/2023/4_april/1_pathlib/pictures/midjourney.png')

Let’s begin with the dad or mum. It returns a path object that’s one stage up the present working listing.

image_file.dad or mum
PosixPath('/residence/bexgboost/articles/2023/4_april/1_pathlib/pictures')

Generally, it’s your decision solely the file title as a substitute of the entire path. There’s an attribute for that:

image_file.title
'midjourney.png'

which returns solely the file title with the extension.

There’s additionally stem for the file title with out the suffix:

image_file.stem
'midjourney'

Or the suffix itself with the dot for the file extension:

image_file.suffix
'.png'

If you wish to divide a path into its parts, you should use components as a substitute of str.break up('/'):

image_file.components
('/',
'residence',
'bexgboost',
'articles',
'2023',
'4_april',
'1_pathlib',
'pictures',
'midjourney.png')

If you need these parts to be Path objects in themselves, you should use dad and mom attribute, which creates a generator:

for i in image_file.dad and mom:
print(i)
/residence/bexgboost/articles/2023/4_april/1_pathlib/pictures
/residence/bexgboost/articles/2023/4_april/1_pathlib
/residence/bexgboost/articles/2023/4_april
/residence/bexgboost/articles/2023
/residence/bexgboost/articles
/residence/bexgboost
/residence
/

Working with information

bexgboost_classified_files._8k._sharp_quality._ed73fcdc-67e6-4b3c-ace4-3092b268cc42.png
Categorized information. — Midjourney

To create information and write to them, you do not have to make use of open perform anymore. Simply create a Path object and write_text or write_btyes to them:

markdown = data_dir / "file.md"

# Create (override) and write textual content
markdown.write_text("# This can be a check markdown")

Or, if you have already got a file, you’ll be able to read_text or read_bytes:

markdown.read_text()
'# This can be a check markdown'
len(image_file.read_bytes())
1962148

Nevertheless, notice that write_text or write_bytes overrides present contents of a file.

# Write new textual content to present file
markdown.write_text("## This can be a new line")
# The file is overridden
markdown.read_text()
'## This can be a new line'

To append new data to present information, it is best to use open methodology of Path objects in a (append) mode:

# Append textual content
with markdown.open(mode="a") as file:
file.write("n### That is the second line")

markdown.read_text()

'## This can be a new linen### That is the second line'

Additionally it is frequent to rename information. rename methodology accepts the vacation spot path for the renamed file.

To create the vacation spot path within the present listing, i. e. rename the file, you should use with_stem on the prevailing path, which replaces the stem of the unique file:

renamed_md = markdown.with_stem("new_markdown")

markdown.rename(renamed_md)

PosixPath('information/new_markdown.md')

Above, file.md is become new_markdown.md.

Let’s examine the file dimension by means of stat().st_size:

# Show file dimension
renamed_md.stat().st_size
49 # in bytes

or the final time the file was modified, which was a couple of seconds in the past:

from datetime import datetime

modified_timestamp = renamed_md.stat().st_mtime

datetime.fromtimestamp(modified_timestamp)

datetime.datetime(2023, 4, 3, 13, 32, 45, 542693)

st_mtime returns a timestamp, which is the depend of seconds since January 1, 1970. To make it readable, you should use use the fromtimestamp perform of datatime.

To take away undesirable information, you’ll be able to unlink them:

renamed_md.unlink(missing_ok=True)

Setting missing_ok to True will not elevate any alarms if the file does not exist.

Working with directories

image.png
Folders in an workplace. — Midjourney

There are a couple of neat methods to work with directories in Pathlib. First, let’s have a look at easy methods to create directories recursively.

new_dir = (
Path.cwd()
/ "new_dir"
/ "child_dir"
/ "grandchild_dir"
)

new_dir.exists()

False

The new_dir does not exist, so let’s create it with all its kids:

new_dir.mkdir(dad and mom=True, exist_ok=True)

By default, mkdir creates the final youngster of the given path. If the intermediate dad and mom do not exist, you must set dad and mom to True.

To take away empty directories, you should use rmdir. If the given path object is nested, solely the final youngster listing is deleted:

# Removes the final youngster listing
new_dir.rmdir()

To record the contents of a listing like ls on the terminal, you should use iterdir. Once more, the consequence will probably be a generator object, yielding listing contents as separate path objects one by one:

for p in Path.residence().iterdir():
print(p)
/residence/bexgboost/.python_history
/residence/bexgboost/word_counter.py
/residence/bexgboost/.azure
/residence/bexgboost/.npm
/residence/bexgboost/.nv
/residence/bexgboost/.julia
...

To seize all information with a selected extension or a reputation sample, you should use the glob perform with an everyday expression.

For instance, under, we are going to discover all textual content information inside my residence listing with glob("*.txt"):

residence = Path.residence()
text_files = record(residence.glob("*.txt"))

len(text_files)

3 # Solely three

To seek for textual content information recursively, that means inside all youngster directories as effectively, you should use recursive glob with rglob:

all_text_files = [p for p in home.rglob("*.txt")]

len(all_text_files)

5116 # Now far more

Find out about common expressions here.

You can even use rglob('*') to record listing contents recursively. It’s just like the supercharged model of iterdir().

One of many use circumstances of that is counting the variety of file codecs that seem inside a listing.

To do that, we import the Counter class from collections and supply all file suffixes to it throughout the articles folder of residence:

from collections import Counter

file_counts = Counter(
path.suffix for path in (residence / "articles").rglob("*")
)

file_counts

Counter({'.py': 12,
'': 1293,
'.md': 1,
'.txt': 7,
'.ipynb': 222,
'.png': 90,
'.mp4': 39})

Working system variations

Sorry, however we have now to speak about this nightmare of a difficulty.

Up till now, we have now been coping with PosixPath objects, that are the default for UNIX-like methods:

kind(Path.residence())
pathlib.PosixPath

In case you have been on Home windows, you’ll get a WindowsPath object:

from pathlib import WindowsPath

# Consumer uncooked strings that begin with r to put in writing home windows paths
path = WindowsPath(r"C:customers")
path

NotImplementedError: can't instantiate 'WindowsPath' in your system

Instantiating one other system’s path raises an error just like the above.

However what for those who have been pressured to work with paths from one other system, like code written by coworkers who use Home windows?

As an answer, pathlib affords pure path objects like PureWindowsPath or PurePosixPath:

from pathlib import PurePosixPath, PureWindowsPath

path = PureWindowsPath(r"C:customers")
path

PureWindowsPath('C:/customers')

These are primitive path objects. You’ve got entry to some path strategies and attributes, however primarily, the trail object stays a string:

path / "bexgboost"
PureWindowsPath('C:/customers/bexgboost')
path.dad or mum
PureWindowsPath('C:/')
path.stem
'customers'
path.rename(r"C:losers") # Unsupported
AttributeError: 'PureWindowsPath' object has no attribute 'rename'

Conclusion

When you have observed, I lied within the title of the article. As a substitute of 15, I consider the depend of recent methods and features was 30ish.

I did not need to scare you off.

However I hope I’ve satisfied you sufficient to ditch os.path and begin utilizing pathlib for a lot simpler and extra readable path operations.

Forge a brand new path, if you’ll 🙂

bexgboost_Paths_and_pathlib._Extreme_quality._76f2bbe4-7c8d-45a6-abf4-ccc8d9e32144.png
Path. — Midjourney

In case you loved this text and, let’s face it, its weird writing fashion, think about supporting me by signing as much as develop into a Medium member. Membership prices 4.99$ a month and offers you limitless entry to all my tales and lots of of hundreds of articles written by extra skilled folks. In case you enroll by means of this link, I’ll earn a small fee with no additional value to your pocket.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button