To get correct self closing tags, use parser xml
when creating new soup that is going to replace old tag.
Also, to preserve ac
and ri
namespaces, xml
parser requires to define xmlns:ac
and xmlns:ri
parameters. We define these parameters in a dummy tag that is removed after processing.
For example:
from bs4 import BeautifulSoup
import xml
txt = '''
<div class="my-class">
<a src="some address">
<img src="attlasian_logo.gif" />
</a>
</div>
<div class="my-class">
<a src="some address2">
<img src="other_logo.gif" />
</a>
</div>
'''
template = '''
<div class="_remove_me" xmlns_ac="http://namespace1/" xmlns_ri="http://namespace2/">
<ac:image>
<ri:attachment ri_filename="{img_src}" />
</ac:image>
</div>
'''
soup = BeautifulSoup(txt, 'html.parser')
for a in soup.select('a'):
a=a.replace_with(BeautifulSoup(template.format(img_src=a.img['src']), 'xml')) # <-- select `xml` parser, the template needs to have xmlns:* parameters to preserve namespaces
for div in soup.select('div._remove_me'):
dump=div.unwrap()
print(soup.prettify())
Prints:
<div class="my-class">
<ac:image>
<ri:attachment ri_filename="attlasian_logo.gif"/>
</ac:image>
</div>
<div class="my-class">
<ac:image>
<ri:attachment ri_filename="other_logo.gif"/>
</ac:image>
</div>