Solution 1 :

If the string “TOTAL : number” is unique then use a regular expression to first search this substring and then extract the number from it.

import re

string = 'test test="3" test="search_summary_figure WHR WVM">TOTAL : 286</test>'

reg__expr = r'TOTALs:sd+'  # TOTAL<whitespace>:<whitespace><number>
# find the substring
result = re.findall(reg__expr, string)
if result:

   substring = result[0]

   reg__expr = r'd+'  # <number>
   result = re.findall(reg__expr, substring)
   number = int(result[0])


You can test your own regular expressions here

Solution 2 :

in your document you can try this:

import re
my_string="TOTAL : 286"
int('d+', my_string).group())


Solution 3 :

You can try the following like this below:

    line = "TOTAL : 286"
    if line.startswith('TOTAL : '):

Output :


Solution 4 :

You can use string partitioning to extract a “number” string from the whole HTML string like this (assuming HTML code is in html_string variable):


there you get num_string with the number as a string, then simply convert it to an integer or whatever you want.
Keep in mind that this will process the first occurence of anything that looks like “TOTAL: anything_goes_here <“, so you want to make sure that this pattern is unique.

Solution 5 :

If your HTML String is this:

html_string = """<test test="3" test="search_summary_figure WHR WVM">TOTAL : 286</test>

Try this:

int(html_string.split("</test>")[0].split(":")[-1].replace(" ", ""))

Problem :

i want to extract a number from a html string (i usually do not know the number).

The crucial part looks like this:

<test test="3" test="search_summary_figure WHR WVM">TOTAL : 286</test>

And i want to extract the “286”. I want to do something like “start after “L :” and stop before “<“.
How can i do this ? Thank you very much in advance.


Comment posted by Extract string from HTML String

Does this answer your question?