Solution 1 :

If I understand your question correctly, you could put the code in a function, to which you could pass the name you need as an argument and use that variable to construct your search strings.
for example:

def func(name_to_find):
# some code
    for answer in'p:contains("Question-and-Answer Session") ~ strong:contains("{n}") + p'.format(n=name_to_find)):
# some other code

and call it like so:

func('Dror Ben Asher')

Problem :

In my (downloaded) HTMLs i have in the top of every file executives mentioned (like Dror Ben Asher” in the code below):

<DIV id=article_participants class="content_part hid">
<P>Redhill Biopharma Ltd. (NASDAQ:<A title="" href="" symbolSlug="RDHL">RDHL</A>)</P>
<P>Q4 2014 <SPAN class=transcript-search-span style="BACKGROUND-COLOR: yellow">Earnings</SPAN> Conference <SPAN class=transcript-search-span style="BACKGROUND-COLOR: #f38686">Call</SPAN></P>
<P>February 26, 2015 9:00 AM ET</P>
<P>Dror Ben Asher - CEO</P>
<P>Ori Shilo - Deputy CEO, Finance and Operations</P>
<P>Guy Goldberg - Chief Business Officer</P>

Further along the html these executives name reaccurs multiple times where after the name follows an text element i want to parse

<STRONG> Dror Ben Asher </STRONG>
<P>Yeah, in terms of production in first quarter, we’re going to be lower than we had forecasted mainly due to our grade.  We’ve had a couple of higher grade stopes in our Seabee complex that we’ve had some significant problems in terms of ground failures and dilution effects.  In addition, not helping out, we’ve had some equipment downtime on some of our smaller silt development, so the combination of those two issues are affecting us.

For now i have a code (see below) which identifies one executive “Dror Ben Asher” and graps all the text which accurs after in the P element. But I would like this to work for all executives and for Multiple html files where different executives are mentioned (different company).

import textwrap
import os
from bs4 import BeautifulSoup

directory ='C:/Research syntheses - Meta analysis/SeekingAlpha/out'
for filename in os.listdir(directory):
    if filename.endswith('.html'):
        fname = os.path.join(directory,filename)
        with open(fname, 'r') as f:
            soup = BeautifulSoup(,'html.parser')

print('{:<30} {:<70}'.format('Name', 'Answer'))
print('-' * 101)
for answer in'p:contains("Question-and-Answer Session") ~ strong:contains("Dror Ben Asher") + p'):
    txt = answer.get_text(strip=True)

    s = answer.find_next_sibling()
    while s:
        if == 'strong' or s.find('strong'):
        if == 'p':
            txt += ' ' + s.get_text(strip=True)
        s = s.find_next_sibling()

    txt = ('n' + ' '*31).join(textwrap.wrap(txt))

    print('{:<30} {:<70}'.format('Dror Ben Asher - CEO', txt), file=open("output.txt", "a")

Does anyone have a suggestion to tackle this challenge?


Comment posted by nikos

Ok, but could it also go automatic for all names mentioned in the executives overview in the top of the html. Now i would need to call all the executives in all my HTML files.

Comment posted by tohanov

I don’t exactly understand what your code does. Do you want to take the names of the executives from the header of the html and search for text with their names in the rest of the html?

Comment posted by nikos

Yes and i would like this to be possible for multiple HTMLs (from different companies with different executives). Thank you for help!

Comment posted by tohanov

you can first read all the names of the executives to a

Comment posted by nikos

oeh i don’t know how to translate this in my code, could you give an example?