Solution 1 :

I want to filter out groups that begin with

This regex shall match only groups that don’t contain <strong> :

<p class="p p[1-9][0-9]{0,1}">([a-zA-Z0-9, -]+?)</p>

Problem :

The text I would like to parse is as follows:

<p class="p p1"><strong>Analysts</strong></p>n<p class="p p1">Mark Troman - BofA Merrill Lynch, Research Division</p>n<p class="p p1">Ben Uglow - Morgan Stanley, Research Division</p>

Using reg = <p class="p p[1-9][0-9]{0,1}">(.+?)</p>, I can get two groups:

  • <strong>Analysts</strong>
  • Ben Uglow - Morgan Stanley, Research Division

However, I want to filter out groups that begin with <strong> and end with </strong>, and just keep
Ben Uglow - Morgan Stanley, Research Division.

Is there any way to rewrite the regex expression and accomplish the filtering in one-line regex?

By

Leave a Reply

Your email address will not be published. Required fields are marked *