Solution 1 :

String From innerHTML

  • This is a bad solution from top to bottom, but my investigation led to it and hopefully it should do the trick.

The Code

Option Explicit

Sub String_Checker()

' I only ran this from VBE. Sometimes the following error would occur:
' Run-time error '2125463506 (8150002e)':
' The text associated with this error code could not be found.
' I don't know why.
    
    Dim IE As Object
    Set IE = CreateObject("InternetExplorer.Application")
    'IE.Visible = True
    IE.navigate "https://www.wikipedia.com"
    Do Until (IE.readyState = 4 And Not IE.Busy)
        DoEvents
    Loop
    Dim objdoc As Object
    Set objdoc = IE.document
    Dim strMyPage As String
    strMyPage = objdoc.body.innerHTML
    IE.Quit
    
    Const pLeft As Long = 20
    
    Dim wb As Workbook
    Set wb = ThisWorkbook ' The workbook containing this code.
    Dim ws As Worksheet
    Set ws = wb.Worksheets("Sheet1")
    
    Dim cel As Range
    Set cel = ws.Range("A2")
    Dim s As String
    Dim pStart As Long
    Dim pLen As Long
    
    Do Until IsEmpty(cel)
        s = cel.Offset(0, 1).Value
        pStart = InStr(1, strMyPage, s, vbTextCompare) - pLeft
        If pStart > 0 Then
        ' The string ('s') was found.
            pLen = InStr(1, strMyPage, s, vbTextCompare) + Len(s) - pStart
            s = Mid(strMyPage, pStart, pLen)
            On Error Resume Next
            ' Here I would receive the following error:
            ' Run-time error '1004': Application-defined or object-defined error
            ' It would occur when the first character would be "=".
            cel.Offset(0, 2).Value = s
            If Err.Number <> 0 Then
                cel.Offset(0, 2).Value = "'" & s ' Maybe this can always be used.
            End If
            On Error GoTo 0
        Else
        ' The string ('s') was NOT found.
        End If
        Set cel = cel.Offset(1)
    Loop

End Sub

Problem :

I need to have VBA look into the HTML of a website, find a certain string within the text, and place into an Excel cell a value containing that string, and X characters to the left of that string, let’s say 20 for the sake of example.

For example, if I need to find the string “elit” in a site containing the following string:

Lorem ipsum dolor sit amet, consectetur adipiscing elit.

The code would need to return a value of “sectetur adipiscing elit” to a specified cell. That is, the string itself, and 20 characters to the left of the string.

Here’s what I’ve come up with so far (I know .select is not best practice but it works for me):

Sub String_Checker()
Sheets("Sheet1").Range("a2").Select
Dim IE As Object
Do Until IsEmpty(ActiveCell)
    Set IE = CreateObject("internetexplorer.Application")
    IE.Visible = True
    IE.navigate "https://website.com"
     Do Until (IE.readyState = 4 And Not IE.Busy)
    
          DoEvents
Loop
Set objDoc = IE.document
strMyPage = objDoc.body.innerHTML
Dim s As String: s = ActiveCell.Offset(0, 1).Value
ActiveCell.Offset(0, 2).Value = Left(strMyPage, 20)

IE.Quit
ActiveCell.Offset(1, 0).Select
Loop
End Sub

That gives me the last 20 characters of the HTML, but I need to get the code to start “looking” at the specified string, which would consistently be ActiveCell.Offset(0,1).Value in the Excel. Any help would be appreciated. Thanks!

Comments

Comment posted by VBasic2008

When the string is found, what would you like to return in the cell? The part left of or the part right of the found string?

Comment posted by Foxfire And Burns And Burns

I think you should use

Comment posted by Learner04390

@VBasic2008: The part left of the found string, as well as the string itself.

Comment posted by QHarr

Is there an actual url to go with this please? Are you using string functions rather than an html parser because you are working with different webpages?

Comment posted by Learner04390

@QHarr: Unfortunately I cannot provide the specific website, as it is confidential. While the website I’m going to be pulling data from will be the same each time, there is a wildcard at the end of the URL that will change with each use. I have already accounted for this wildcard in my code so the script will navigate to the correct webpage. If an HTML parser makes more sense to you (it certainly sounds like what I should be using), I’d appreciate more info on how to do that.

Comment posted by Learner04390

Hey, a bad solution that works is better than a good one that doesn’t! This did the trick with a couple of minor tweaks. I set the workbook definitions above the Loop instructions and it ran like a charm. Thanks to you and everyone who commented for your insights!

By