↩ back to the box
Matching Groups

Watch This Lesson [Course Video]
( https://t.dripemail2.com/c/eyJhY2NvdW50X2lkIjoiMzg3MzQxNiIsImRlbGl2ZXJ5X2lkIjoibmVtbDF6MHBpOHU3cHE1cXBmY3UiLCJ1cmwiOiJodHRwczovL2Jsb2cuZmlueHRlci5jb20vcHl0aG9uLXJlLWdyb3Vwcy8_dGxfaW5ib3VuZD0xXHUwMDI2dGxfdGFyZ2V0X2FsbD0xXHUwMDI2dGxfZm9ybV90eXBlPTFcdTAwMjZ0bF9wZXJpb2RfdHlwZT0zXHUwMDI2X19zPWg3ZzE2bGo4amI0aHB5NWRqNzNnIn0 )

This tutorial explains everything you need to know about matching
groups in Python's re package for regular expressions. You may
have also read the term "capture groups" which points to the same

What's a matching group?

Like you use parentheses to structure mathematical expressions,
(2 + 2) * 2 versus 2 + (2 * 2), you use parentheses to structure
regular expressions.

An example regex that does this is 'a(b|c)'.

The whole content enclosed in the opening and closing parentheses
is called matching group (or capture group). You can have
multiple matching groups in a single regex. And you can even have
hierarchical matching groups, for example 'a(b|(cd))'.

One big advantage of a matching group is that it captures the
matched substring. You can retrieve it in other parts of the
regular expression---or after analyzing the result of the whole
regex matching.

Let's have a short example for the most basic use of a matching
group---to structure the regex.

Say you create regex b?(a.)* with the matching group (a.) that
matches all patterns starting with zero or one occurrence of
character 'b' and an arbitrary number of two-character-sequences
starting with the character 'a'.

Hence, the strings 'bacacaca', 'aaaa', '' (the empty string), and
'Xababababab' all match your regex.

The use of the parentheses for structuring the regular expression
is intuitive and should come naturally to you because the same
rules apply as for arithmetic operations. However, there's a more
advanced use of regex groups: retrieval.

You can retrieve the matched content of each matching group. So
the next question naturally arises:

How to Get the First Matching Group?

There are two scenarios when you want to access the content of
your matching groups:

* Access the matching group in the regex pattern to reuse
partially matched text from one group somewhere else.
* Access the matching group after the whole match operation to
analyze the matched text in your Python code.

In the first case, you simply get the first matching group with
the special sequence using a backslash before a number:

>>> import re >>> re.search(r'(j.n) is ','jon is jon')
<re.Match object; span=(0, 10), match='jon is jon'>

Note: Some email providers don't show the backslash in this
example. Visit the blog to see correctly-displayed code.
( https://t.dripemail2.com/c/eyJhY2NvdW50X2lkIjoiMzg3MzQxNiIsImRlbGl2ZXJ5X2lkIjoibmVtbDF6MHBpOHU3cHE1cXBmY3UiLCJ1cmwiOiJodHRwczovL2Jsb2cuZmlueHRlci5jb20vcHl0aG9uLXJlLWdyb3Vwcy8_dGxfaW5ib3VuZD0xXHUwMDI2dGxfdGFyZ2V0X2FsbD0xXHUwMDI2dGxfZm9ybV90eXBlPTFcdTAwMjZ0bF9wZXJpb2RfdHlwZT0zXHUwMDI2X19zPWg3ZzE2bGo4amI0aHB5NWRqNzNnIn0 )

You'll use this feature a lot because it gives you much more
expression power: for example, you can search for a name in a
text and then process specifically this name in the rest of the
text (and not all other names that would also fit the pattern).

Note that the numbering of the groups start with 1 and not with 0
-- a rare exception to the rule that in programming, all
numbering starts with 0.

In the second case, you want to know the contents of the first
group after the whole match. How do you do that?

The answer is also simple: use the m.group(0) method on the
matching object m. Here's an example:

>>> import re >>> m = re.search(r'(j.n)','jon is jon') >>>
m.group(1) 'jon'

The numbering works consistently with the previously introduced
regex group numbering: start with identifier 1 to access the
contents of the first group.

How to Get All Other Matching Groups?

Again, there are two different intentions when asking this

* Access the matching group in the regex pattern to reuse
partially matched text from one group somewhere else using the
backslash-number sequence.
* Access the matching group after the whole match operation to
analyze the matched text in your Python code.

Here's an example:

>>> import re >>> re.search(r'(j..) (j..)\s+', 'jon jim jim')
<re.Match object; span=(0, 11), match='jon jim jim'> >>>
re.search(r'(j..) (j..)\s+', 'jon jim jon') >>>

As you can see, the special sequence "backslash 2" refers to the
matching contents of the second group 'jim'.

In the second case, you can simply increase the identifier too to
access the other matching groups in your Python code:

>>> import re >>> m = re.search(r'(j..) (j..)\s+', 'jon jim
jim') >>> m.group(0) 'jon jim jim' >>> m.group(1) 'jon' >>>
m.group(2) 'jim'

This code also shows an interesting feature: if you use the
identifier 0 as an argument to the m.group(0) method, the regex
module will give you the contents of the whole match. You can
think of it as the first group being the whole match.

If you want to learn more about groups, check out this tutorial
on the Finxter blog.
( https://t.dripemail2.com/c/eyJhY2NvdW50X2lkIjoiMzg3MzQxNiIsImRlbGl2ZXJ5X2lkIjoibmVtbDF6MHBpOHU3cHE1cXBmY3UiLCJ1cmwiOiJodHRwczovL2Jsb2cuZmlueHRlci5jb20vcHl0aG9uLXJlLWdyb3Vwcy8_dGxfaW5ib3VuZD0xXHUwMDI2dGxfdGFyZ2V0X2FsbD0xXHUwMDI2dGxfZm9ybV90eXBlPTFcdTAwMjZ0bF9wZXJpb2RfdHlwZT0zXHUwMDI2X19zPWg3ZzE2bGo4amI0aHB5NWRqNzNnIn0 )

In the next email, you'll get a few new regex puzzles so that you
can strengthen your regex power - so stay tuned!
Regex to the Rescue! :D
ChrisHow are we doing? ( https://t.dripemail2.com/c/eyJhY2NvdW50X2lkIjoiMzg3MzQxNiIsImRlbGl2ZXJ5X2lkIjoibmVtbDF6MHBpOHU3cHE1cXBmY3UiLCJ1cmwiOiJodHRwczovL2Zvcm1zLmdsZS9pZnN5azlZa1o0azVBOWZqNz9fX3M9aDdnMTZsajhqYjRocHk1ZGo3M2cifQ )
⭐ ( https://t.dripemail2.com/c/eyJhY2NvdW50X2lkIjoiMzg3MzQxNiIsImRlbGl2ZXJ5X2lkIjoibmVtbDF6MHBpOHU3cHE1cXBmY3UiLCJ1cmwiOiJodHRwczovL2Zvcm1zLmdsZS9qUDE3aml4QmhFNEJ6NlZxOD9fX3M9aDdnMTZsajhqYjRocHk1ZGo3M2cifQ )
⭐⭐ ( https://t.dripemail2.com/c/eyJhY2NvdW50X2lkIjoiMzg3MzQxNiIsImRlbGl2ZXJ5X2lkIjoibmVtbDF6MHBpOHU3cHE1cXBmY3UiLCJ1cmwiOiJodHRwczovL2Zvcm1zLmdsZS9qUDE3aml4QmhFNEJ6NlZxOD9fX3M9aDdnMTZsajhqYjRocHk1ZGo3M2cifQ )
⭐⭐⭐ ( https://t.dripemail2.com/c/eyJhY2NvdW50X2lkIjoiMzg3MzQxNiIsImRlbGl2ZXJ5X2lkIjoibmVtbDF6MHBpOHU3cHE1cXBmY3UiLCJ1cmwiOiJodHRwczovL2Zvcm1zLmdsZS9qUDE3aml4QmhFNEJ6NlZxOD9fX3M9aDdnMTZsajhqYjRocHk1ZGo3M2cifQ )

To make sure you keep getting these emails, please
add chris@finxter.com to your address book or whitelist us.

I'd love to hear your feedback so that I can improve this free
email course over time. Please reply to this email and share
everything that's on your mind!

If you find the Finxter Email Academy useful, please invite a
friend or colleague! ❤
Here's the subscription link you can share:

Download the Ultimate Python Cheat Sheet here (direct PDF
download): 🐍
The Ultimate Python Cheat Sheet
( https://t.dripemail2.com/c/eyJhY2NvdW50X2lkIjoiMzg3MzQxNiIsImRlbGl2ZXJ5X2lkIjoibmVtbDF6MHBpOHU3cHE1cXBmY3UiLCJ1cmwiOiJodHRwczovL2Jsb2cuZmlueHRlci5jb20vd3AtY29udGVudC91cGxvYWRzLzIwMjAvMDcvRmlueHRlcl9Xb3JsZHNNb3N0RGVuc2VQeXRob25DaGVhdFNoZWV0LnBkZj9fX3M9aDdnMTZsajhqYjRocHk1ZGo3M2cifQ )

Want out of the loop? https://t.dripemail2.com/c/eyJhY2NvdW50X2lkIjoiMzg3MzQxNiIsImRlbGl2ZXJ5X2lkIjoibmVtbDF6MHBpOHU3cHE1cXBmY3UiLCJ1cmwiOiJodHRwczovL3d3dy5nZXRkcmlwLmNvbS9zdWJzY3JpYmVycy9oN2cxNmxqOGpiNGhweTVkajczZy9zdWJzY3JpcHRpb25zP2Q9bmVtbDF6MHBpOHU3cHE1cXBmY3VcdTAwMjZleGNsdWRlX2NsaWNrPTFcdTAwMjZpbnRlbnQ9dW5zdWJzY3JpYmVcdTAwMjZfX3M9aDdnMTZsajhqYjRocHk1ZGo3M2cifQ. I'm so sad to see you go. 😢
How could we have done better? To help future Finxters, please
hit reply and tell us! 🤗

Not very motivated to learn today? Consider this:
"Knowledge compounds!" -- Warren Buffett

Consequently, if you improve your skills by 1% every day, you'd
36x your programming skills within a year!

Schurwaldstrasse 61, 70186 Stuttgart

Warning: the message above can be a phishing scam. See: legal notes