Not all web-scraping has to be challenging (such as a massive set of python scripts). Some of it can be done just by going to a URL with a GET request. (Some of the best APIs respond to the GET HTTP verb)
The challenge is comparing the data on all the pages at once. It would be easier if all the information was in something like Excel - and it can be! Most pages use HTML, which is easily understood using the Document Object Model (DOM). For these cases that don't need security, it is very easy to load data using excel macros.
The first step is to save your file as an XLSM (macro enabled). Then open the visual basic editor (alt+f11) and create a new module. Inside the module, you can use the following code:
Option Explicit ' forces you to declare all the variables Function GetValueFromURL(url As String, id As String) As Variant ' to refer to the running copy of Internet Explorer Dim ie As InternetExplorer ' to refer to the HTML document returned Dim html As HTMLDocument ' open Internet Explorer in memory, and go to website Set ie = New InternetExplorer ' turn off visibility so the IE window does not open ie.Visible = False ie.navigate url ' Wait until IE is done loading page Do While ie.READYSTATE <> READYSTATE_COMPLETE Application.StatusBar = "Trying to go to " & url & " and get the value of " & id & " ..." DoEvents Loop ' get the HTML document returned Set html = ie.document ' close down IE Set ie = Nothing ' and reset status bar Application.StatusBar = "" ' In VBA, set the result equal to the method name to return the value GetValueFromURL = html.getElementById(id).innerText End Function
Nice! Now I can do something like get the date of every blog post, and make a histogram to see when I was busiest. Well, that's just one example.
So, the above example will work with HTML, but what about loading things like JSON that are NOT HTML documents. We don't want to parse anything, just get a string
Function GetWebText(url As String) As String Dim objHttp As Object Set objHttp = CreateObject("MSXML2.ServerXMLHTTP") ' use the HTTP GET verb (POST is recommended to send data) ' when calling the url ' asynchronously = False objHttp.Open "GET", url, False ' since this is GET, we don't send data to the server objHttp.send "" GetWebText = objHttp.responseText End Function
This method is a little simplified, but you don't have access to the DOM. However, changing the GET verb to POST, one can send data to a server using this macro.