wrapping subsections of text with tags in BeautifulSoup

I want the BeautifulSoup equivalent of this jQuery question.

I’d like to find a particular regex match in BeautifulSoup text and then replace that segment of text with a wrapped version. I can do this with plaintext wrapping:

# replace all words ending in "ug" wrapped in quotes,
# with "ug" replaced with "ook"

>>> soup = BeautifulSoup("Snug as a bug in a rug")
>>> soup
<html><body><p>Snug as a bug in a rug</p></body></html>
>>> for text in soup.findAll(text=True):
...   if re.search(r'ug\b',text):
...     text.replaceWith(re.sub(r'(\w*)ug\b',r'"\1ook"',text))
...
u'Snug as a bug in a rug'
>>> soup
<html><body><p>"Snook" as a "book" in a "rook"</p></body></html>

But what if I want boldface rather than quotes? e.g. desired result =

<html><body><p><b>Snook</b> as a <b>book</b> in a <b>rook</b></p></body></html>

Best answer

for text in soup.findAll(text=True):
   if re.search(r'ug\b',text):
     text.replaceWith(BeautifulSoup(re.sub(r'(\w*)ug\b',r'<b>\1ook</b>',text),'html.parser'))

soup
Out[117]: <html><body><p><b>Snook</b> as a <b>book</b> in a <b>rook</b></p></body></html>

The idea here is that we’re replacing a tag with a fully-formed parse tree. The easiest way to do that is to just call BeautifulSoup on our regex-subbed string.

The somewhat-magical 'html.parser' argument to the inner BeautifulSoup call is to prevent it from adding <html><body><p> tags, like bs4 (well, lxml really) normally does. More reading on that.