Guide to Scraping Data from Websites to Excel with Web Query

Flipnode on May 22 2023

blog-image

Microsoft Excel provides a convenient feature called Web Query, which enables data extraction from websites directly to Excel. This article explores the various methods for utilizing Web Query to extract public web data effectively.

How web scraping in Excel works

Microsoft Web Query relies on the Operating System's Web Browser, such as Internet Explorer or Edge, to load websites. This ensures seamless rendering of JavaScript-intensive websites. After the website is fully loaded, Web Query automatically scans the page, identifies relevant static HTML tables, and highlights them for selection. Users can then choose the desired table and extract its data using Web Query.

How to scrape website data using Web Query Excel

In this tutorial, we will walk through the process of scraping data from the "books.toscrape.com" website and saving it to an Excel spreadsheet using Web Query. Before we begin, make sure you have the following:

  1. An active internet connection.
  2. Microsoft Office is installed on your computer to use Microsoft Excel. If you don't have Microsoft Office, you can download and install it from here.

After installing Microsoft Office, you can proceed with the following steps:

Step 1: Start by opening a new, empty spreadsheet.

Open Microsoft Excel and click on the "Data" tab in the menu.

Step 2: Choose the "From Web" option.

After clicking on "Data," a new menu will appear offering several data extraction options. Select the "From Web" option and click on it to open a new window.

Step 3: In the address bar of the new window, enter the URL of the web page and click on the "Go" button.

In the "New Web Query" window, enter the URL "https://books.toscrape.com" in the address bar and click the "Go" button.

Step 4: Navigate to the book page

Once you click the "Go" button, a website will appear in the mini web browser. You can interact with and navigate the website within this browser. Scroll down the page and click on a book link to open the corresponding book page.

Step 5: Select the desired table to scrape

As you scroll down the book page, you'll come across a table. Look for a small yellow arrow icon next to the table. Clicking on this icon will select the table. Once the table is selected, simply click the "Import" button located below it.

Step 6: Select the "Existing worksheet" option and click "OK"

After clicking the "Import" button, a small window will appear. Make sure to select the "Existing worksheet" option and click "OK" to import the data.

And there you have it! Web Query will initiate a background process to execute the web request and retrieve the website. Once the website is fetched, it will automatically parse the table and extract the data into the Excel columns. You will see the output similar to the example provided below.

Output

You can cross-reference it with the website to ensure the accuracy of the table data. Each column and row will be linked to the web query, allowing Microsoft Excel to update the corresponding rows and columns whenever you manually or automatically refresh the data. The following section delves into various methods for refreshing and updating the data.

How to update and refresh data

There are two primary methods for refreshing the Web Query:

  1. Automatic Refresh:
    In automatic mode, Excel will regularly retrieve the data in the background and update the sheet accordingly. You can customize the refresh frequency to suit your requirements, which will be explained later in this article.
  2. Manual Refresh:
    Now, let's explore different ways to manually refresh the data. We'll start by deleting a few items from the extracted data to ensure that the refresh function operates as expected.

Notice that the price, tax, and availability data have been removed.

Option 1: Click "Refresh" from the top menu

In the top menu, under the "Data" submenu, there is a "Refresh" button. You can update the data by clicking this button or using the shortcut on your keyboard: "CTRL + ALT + F5". Once you perform the refresh, the missing price, tax, and availability values will be updated with the latest information.

Option 2: Refresh from the context menu

To test a different method of data refresh, let's remove a few fields once again. This time, we'll use the "Refresh" button from the context menu instead of the menu button. Right-click on a cell that is managed by Web Query and select "Refresh." Keep in mind that the "Refresh" button will only appear when you right-click on cells managed by Web Query, not on other cells.

Option 3: Edit & rerun the Query

To update the data using another method, let's remove a few fields again. After removing the data, right-click on the cell and select the "Edit Query" option.

It will open a new window called "Edit Web Query." By clicking "Import," the web query will initiate a background process to retrieve the latest updates from the website and refresh the existing data, replacing outdated information with new data.

This method is particularly useful when you need to update the scraper, such as modifying the website URL or adjusting the query to fetch a different table or page.

Now, let's explore how to automate the refresh/update process.

Configure automatic refresh from properties

The steps to configure the automatic refresh process from properties are straightforward. To begin, right-click on a cell linked to the Web Query to open the context menu. Then, choose "Data Range Properties" from the menu.

This will open the "External Data Range Properties" window

In this window, locate the "Refresh Controls" section. By adjusting the settings in this section, you can enable automatic data refresh. Firstly, select the "Enable background refresh" checkbox. Then, using the second checkbox, you can specify the frequency at which the data should be refreshed. For instance, if you set it to refresh every 5 minutes, Excel will automatically retrieve the data at 5-minute intervals, utilizing a background process to update the table automatically.

If you activate the third checkbox, "Refresh data when opening the file," Excel will retrieve the web data and update the sheet every time you open the spreadsheet.

Conclusion

Web Query streamlines the process of extracting web data in Excel, particularly for websites containing tables. It offers automation capabilities, allowing you to extract web data with minimal or no manual intervention. Moreover, Web Query supports scraping data from dynamic websites that utilize JavaScript.

News and updates

Stay up-to-date with the latest web scraping guides and news by subscribing to our newsletter.

Subscribe

Related articles

thumbnail
ProxiesWhat is a Private Proxy? [Quick Guide 2023]

A private proxy, also referred to as a dedicated proxy, is exclusively assigned to a single user. It offers complete control over its usage, as it is not shared with any other individuals or entities.

Flipnode
author avatar
Flipnode
9 min read
thumbnail
ProxiesHow to Use Chrome Browser Proxy Settings

Learn how to configure and utilize Chrome browser proxy settings effectively for enhanced privacy and optimized web browsing.

Flipnode
author avatar
Flipnode
8 min read
thumbnail
How to Use DataXPath vs CSS Selectors

Read this article to learn what XPath and CSS selectors are and how to create them. Find out the differences between XPath vs CSS, and know which option to choose.

Flipnode
author avatar
Flipnode
12 min read