7) c. Wildcards and glob

When you’re working with files, sometimes you would want to search for a file. You can achieve this using the glob method. You need to provide a starting path on your system from which you want to start searching.

The glob keyword can sound difficult to remember so knowing a little history might help here. In Unix-based OSes there used to be a global command earlier. This command would allow you to apply pattern-matching across multiple matches. In other words, if the system was searching for a pattern and found a match it wouldn’t stop there. It would keep applying the pattern “globally” until it found all matching result. Later, in Unix a utility named “glob” provided the functionality that would expand a file-pattern into a list of files. To put it in simple words, glob was used to match file patterns in Unix.

7) c. 1. Searching for file with unknown name/extension

Sometimes you want to apply a certain filter to not just one file but multiple files. And one way to filter the files you need is to use wildcards in the file name pattern.

In the following example the given pattern essentially tells to look for any file ending with a “.docx” in the “Documents” folder.

1
2
3
4
5
6
7
8
9
10
from pathlib import Path

# supply the path from where you want to begin the search
folder = Path("C:/Documents")

# use the glob method and provide a pattern
results1 = list( folder.glob("*.docx")  )

for element in results1:
    print(element)




What if you didn’t know the file extension or you knew that the filename contained the word “report”?

1
2
3
4
5
6
7
8
9
from pathlib import Path

folder = Path("C:/Downloads")

# Notice how asterisk is used in the pattern.
results1 = list( folder.glob("*report*.*")  )

for element in results1:
    print(element)


Notice how putting an asterisk ( * ) in the glob denotes a wildcard that can represent anything.

7) c. 2. Searching for file when you need match with special characters

What if you had to search for a file name that contained the characters “\n\” ?

We know that Python considers a backslash in a string as an escape sequence. So \n would get interpreted as a newline character.

So how do you ensure that Python doesn’t consider any escape characters in your string? You can do this by using “raw strings”. In Python, if you put an “r” before a string then Python considers that string as is.

In other words, the string “test\n” evaluates to test + a newline character.

But r”test\n” evaluates to the word test\n.

Here is an example that searches for a files with names containing the string “\n\”.

1
2
3
4
5
6
7
8
9
10
11
12
13
# You want to match the files that contain the string "\n":
# "exam_papers\n\.docx"
# "Weird file \n\ame.mp3"

from pathlib import Path

folder = Path("C:/Downloads")

# Notice the use of the raw string, r before the double quote.
results1 = list( folder.glob(r"\n")  )

for element in results1:
    print(element)


7) c. 3. Searching through folders recursively

What if you wanted to search through all the directories and files within a folder? That could be achieved by using a double asterisk (**) with a forward slash ( / ) in the file name.

So what does glob do when it spots the string “**/” in the search pattern? It assumes that you mean to search recursively through all folders and files contained within that location.

The following example searches the file “my_precious.txt” in the folder “Caverns”.

1
2
3
4
5
6
7
8
9
from pathlib import Path

folder = Path("C:/Caverns")

# Notice the placement of double asterisk.
results1 = list( folder.glob( "**/my_precious.txt" )  )

for element in results1:
    print(element)



You can combine these searching techniques depending upon your needs to create powerful searches.

See if you can understand the code shown in the image below. Ignore the spaces inside glob(” my* “). The spaces are added in the glob for readability only. Do NOT add those spaces in your actual program.