Python Web Scraping Tutorial: Step-By-Step

Flipnode on May 31 2023

blog-image

In the realm of web scraping, getting started can be straightforward, but sometimes it can be challenging, which is why you're here seeking guidance. Python offers a user-friendly approach, serving as an ideal starting point with its object-oriented nature. Compared to other programming languages, Python's classes and objects are remarkably intuitive to work with. Moreover, Python boasts a wide range of libraries that simplify the process of building web scraping tools.

In this tutorial, we will provide a comprehensive overview of web scraping using Python. We'll walk you through the essential steps to create a basic application that extracts text-based data from web pages, saves it to a file, and organizes the output based on specified parameters. Towards the end, we'll touch upon more advanced features and offer suggestions for their implementation. By following the outlined steps in this tutorial, you'll gain a solid understanding of web scraping using Python.

What do we call web scraping?

Web scraping refers to the automated extraction of publicly available data. With the help of a webpage scraper, vast amounts of data can be extracted from target websites within seconds.

This Python web scraping tutorial is compatible with all operating systems. The installation process for Python and development environments may vary slightly, but the core concepts remain consistent across platforms.

Building a web scraper: Python prepwork

This web scraping tutorial is based on Python version 3.4 or higher. Specifically, we used version 3.8.3, but any version from 3.4 onwards should work perfectly fine.

For Windows installations, it is important to select the "PATH installation" option during the Python installation process. This ensures that the executables are added to the default Windows Command Prompt search, allowing you to use commands like "pip" or "python" without manually specifying the executable's directory (e.g., C:/tools/python/.../python.exe). If you have already installed Python without selecting this option, you can simply rerun the installation and choose the "modify" option. On the second screen, select "Add to environment variables" to enable the PATH installation.

Getting to the libraries

Python offers a wide range of powerful libraries specifically designed for web scraping. With thousands of Python projects and over 300,000 projects on PyPI (Python Package Index), the availability of web scraping libraries is extensive. Here are some notable Python web scraping libraries to consider:

  • Requests
  • Beautiful Soup
  • lxml
  • Selenium

These libraries provide various functionalities and features that make web scraping tasks more efficient and effective. Whether you need to make HTTP requests, parse HTML documents, extract data, or automate browser interactions, these libraries offer the tools you need to accomplish your web scraping goals.

Request library

The Requests library is a valuable tool for web scraping, particularly when it comes to making HTTP requests. It simplifies the process by reducing the amount of code required, making it easier to understand and debug. To install the library, you can use the following command:

pip install requests

With the Requests library, you can easily send HTTP GET and POST requests. For instance, the get() function allows you to send an HTTP GET request:

import requests

response = requests.get('https://flipnode.io/')

print(response.text)

If you need to post form data, you can use the post() method and provide the data as a dictionary:

form_data = {'key1': 'value1', 'key2': 'value2'}

response = requests.post('https://flipnode.io/', data=form_data)

print(response.text)

The Requests library also offers convenient support for proxies that require authentication. You can specify the proxy information as follows:

proxies = {'http': 'http://user:[email protected]:7777'}

response = requests.get('https://ip.flipnode.io/', proxies=proxies)

print(response.text)

However, it's important to note that the Requests library does not parse the extracted HTML data into a more readable format for analysis. Additionally, it may not be suitable for scraping websites that heavily rely on JavaScript for their content.

Beautiful Soup

Beautiful Soup is a powerful Python library that specializes in parsing HTML and extracting data from it, even if the markup is invalid. However, it should be noted that Beautiful Soup is primarily focused on parsing and does not have the capability to request HTML data from web servers. As a result, it is commonly used in conjunction with the Python Requests library.

To demonstrate its usage, let's go through an example using Beautiful Soup along with the html.parser module from the Python Standard Library.

First, we use the Requests library to retrieve the HTML content of a webpage:

import requests

url = 'https://flipnode.io/blog'

response = requests.get(url)

Next, we import Beautiful Soup and create a BeautifulSoup object to parse the HTML:

from bs4 import BeautifulSoup

soup = BeautifulSoup(response.text, 'html.parser')

print(soup.title)

This will print the title element of the webpage:

<title>Flipnode Blog | Flipnode</title>

Beautiful Soup simplifies the process of navigating, searching, and modifying the parse tree, making it particularly useful for developers, even those with limited experience. For example, we can use the findAll() method to print all the blog titles on the page. In this case, the blog titles are contained within <h3>, <h4>, and <h5> elements. We can specify these elements to the findAll() method:

blog_titles = soup.findAll(['h3', 'h4', 'h5'])

for title in blog_titles:

print(title.text)

Beautiful Soup also provides convenient support for CSS selectors. Developers familiar with CSS selectors can utilize them directly instead of using the find() or find_all() methods. Here's an example that removes the "Most popular articles" title, which is contained within an <h5> element, from the blog_titles object using CSS selectors:

blog_titles = soup.select('h3, h4, h5:not(:contains("Most popular articles"))')

for title in blog_titles:

print(title.text)

Aside from its ability to parse broken HTML, Beautiful Soup offers various functions, including the ability to detect page encoding, which enhances the accuracy of the extracted data.

Furthermore, it can be easily configured with just a few lines of code to extract custom publicly available data or identify specific data types.

lxml

lxml is a high-performance parsing library that supports both HTML and XML files. It is known for its speed, power, and user-friendly interface, making it a popular choice for data extraction tasks, especially when dealing with large datasets. However, compared to Beautiful Soup, lxml's parsing capabilities can be affected by poorly structured HTML, causing some limitations.

To install the lxml library, you can use the following pip command in the terminal:

pip install lxml

Once installed, you can import the html module from lxml to work with HTML content. Assuming you have already obtained the HTML string using the Requests library as explained in the previous section, you can create an HTML tree using the fromstring method:

from lxml import html

tree = html.fromstring(response.text)

With the tree object in place, you can now utilize XPath to query and extract specific elements. Building on the example from earlier, if you want to retrieve the blog titles excluding the "Most popular articles" title, you can use the following XPath expression:

blog_titles = tree.xpath('//h3 | //h4 | //h5[not(contains(text(), "Most popular articles"))]')

for title in blog_titles:

print(title.text)

In the above code, the XPath expression //h3 | //h4 | //h5[not(contains(text(), "Most popular articles"))] selects all <h3>, <h4>, and <h5> elements that do not contain the text "Most popular articles".

If you are interested in learning more about using the lxml library and incorporating it into your web scraping workflows, we recommend checking out our comprehensive lxml tutorial, which provides detailed guidance and insights to enhance your expertise.

Selenium

When it comes to websites built with JavaScript, traditional Python libraries struggle to extract data from dynamic web pages. However, Selenium comes to the rescue and thrives in this scenario.

Selenium is an open-source browser automation tool, often used for test automation, that can execute JavaScript and render web pages just like a regular browser. This makes it a powerful tool for web scraping, as it can handle JavaScript-driven content that standard web crawlers cannot. Selenium has gained widespread popularity among developers due to its versatility.

To get started with Selenium, you need three components:

  • Web Browser - Selenium supports popular browsers like Chrome, Edge, Firefox, and Safari.
  • Browser Driver - You can find the appropriate driver for your chosen browser on the Selenium website.
  • The Selenium package itself - Install it using the following command:
pip install selenium

Once installed, you can import the necessary class for your chosen browser and create an object of that class. Here's an example using the Chrome browser:

from selenium.webdriver import Chrome

from selenium.webdriver.common.by import By

driver = Chrome(executable_path='/path/to/driver')

You can then load any web page using the get() method:

driver.get('https://flipnode.io/blog')

Selenium allows you to extract elements using CSS selectors or XPath. The following example prints all the blog titles using CSS selectors, excluding the "Most popular articles" title:

blog_titles = driver.find_elements(By.CSS_SELECTOR, 'h3, h4, h5')

for title in blog_titles:

if title.text != 'Most popular articles':

print(title.text)

Finally, don't forget to close the browser when you're done:

driver.quit()

It's important to note that Selenium's execution of JavaScript and dynamic rendering can slow down the scraping process, making it less suitable for large-scale data extraction. However, if you're working on smaller-scale projects or prioritize accuracy over speed, Selenium is an excellent choice with its ability to mimic human behavior and handle JavaScript-driven content.

WebDrivers and browsers

In web scraping, a browser is necessary to connect to the target URL. For beginners, it is highly recommended to use a regular browser (non-headless) during testing. This allows you to observe how the written code interacts with the application, making troubleshooting and debugging simpler and providing a better understanding of the entire process.

Headless browsers can be utilized for more advanced and complex tasks at a later stage. In this web scraping tutorial, we will be using the Chrome web browser, although the process is almost identical with Firefox.

To begin, use your preferred search engine to find the appropriate "webdriver for Chrome" (or Firefox) based on your browser's current version. Once found, download the corresponding webdriver.

If necessary, select the required package, download it, and unzip it. Copy the executable file of the webdriver to a readily accessible directory. We will verify if everything is set up correctly in the subsequent steps.

Finding a cozy place for our Python web scraper

Before we dive into the programming part of this web scraping tutorial, there is one final step to take: setting up a suitable coding environment. You have various options to choose from, ranging from a basic text editor where you can create a *.py file and write your code directly, to a comprehensive Integrated Development Environment (IDE).

If you already have Visual Studio Code installed, opting for this IDE would be the easiest choice. However, for newcomers, I highly recommend using PyCharm due to its user-friendly interface and minimal learning curve. For the remainder of this web scraping tutorial, we will assume that PyCharm is being used.

In PyCharm, you can create a new Python file by right-clicking in the project area and selecting "New -> Python File". Be sure to give your file a descriptive name! This will be your workspace for writing and running the web scraping code.

Importing and using libraries

Now it's time to make use of the Python packages we installed earlier. Let's proceed with the following code:

import pandas as pd

from bs4 import BeautifulSoup

from selenium import webdriver

Note that in PyCharm, these import statements might appear in grey, as the IDE automatically marks unused libraries. However, I recommend not accepting its suggestion to remove unused libraries for now.

Next, we need to define our browser based on the webdriver we selected earlier in the "WebDriver and browsers" section. Depending on your choice, you can use one of the following lines of code:

For Chrome:

driver = webdriver.Chrome(executable_path='c:\path\to\windows\webdriver\executable.exe')

For Firefox:

driver = webdriver.Firefox(executable_path='/nix/path/to/webdriver/executable')

Make sure to replace the paths with the actual paths to your webdriver executables. This will initialize the browser and prepare it for web scraping tasks.

Picking a URL

Before running our first test, let's choose a suitable URL. Since this web scraping tutorial focuses on creating a basic application, it's recommended to select a simple target URL. Keep the following considerations in mind:

  • Avoid scraping data hidden within JavaScript elements, as they often require specific actions to display the desired data. Scraping such elements requires more advanced Python techniques and logic.
  • Image scraping is not covered in this tutorial, as Selenium allows direct downloading of images.
  • Always ensure that you are scraping public data and not violating any third-party rights. Additionally, check the website's robots.txt file for any scraping guidelines.

Once you have chosen a landing page to visit, input the URL as a parameter in the driver.get('URL') method. Remember to include the connection protocol (e.g., "http://" or "https://") in the URL.

Example:

driver.get('https://your.url/here?yes=brilliant')

To perform a test run, click the green arrow at the bottom left of your coding environment or right-click the coding environment and select "Run".

If you encounter an error stating that a file is missing, double-check that the path provided in the webdriver.* matches the location of the webdriver executable. If you receive a version mismatch error, make sure to redownload the correct webdriver executable.

Defining objects and building lists

In Python, objects can be created without specifying an exact type. By assigning a value to a variable, an object is created.

Example:

# Object is "results", and the brackets create an empty list.

# We will store our data in this list.

results = []

Lists in Python are ordered, mutable, and allow duplicate elements. While other collections like sets or dictionaries can be used, lists are the simplest to work with. Now, let's create more objects!

# Add the page source to the variable "content".

content = driver.page_source

# Load the page's source into BeautifulSoup, which analyzes the HTML as a nested data structure.

# It allows us to select elements using various selectors.

soup = BeautifulSoup(content)

Let's recap our code so far:

import pandas as pd

from bs4 import BeautifulSoup

from selenium import webdriver

driver = webdriver.Chrome(executable_path='/nix/path/to/webdriver/executable')

driver.get('https://your.url/here?yes=brilliant')

results = []

content = driver.page_source

soup = BeautifulSoup(content)

Try running the application again. There should be no errors displayed. If any errors occur, refer back to the earlier chapters for troubleshooting options. Also, don't forget to specify the parser that the BeautifulSoup function should use.

Extracting data with our Python web scraper

Now we have reached the fun and challenging part - extracting data from the HTML file. Since we want to retrieve small sections from different parts of the page and store them in a list, we need to process each section individually and then add it to the list:

# Loop over all elements returned by the `findAll` call with the specified filter.

for element in soup.findAll(attrs={'class': 'list-item'}):

...

The soup.findAll method accepts various arguments. In this tutorial, we are using the attrs argument to filter elements based on their attributes. By specifying {'class': 'list-item'}, we limit the data returned to elements with a specific class. Classes are commonly used and easy to work with, so we'll use them in this example.

Before proceeding, let's visit the chosen URL in a real browser. Open the page source using CTRL+U (Chrome) or right-click and select "View Page Source." Find the closest class where the desired data is nested. Alternatively, you can press F12 to open DevTools and use the Element Picker. For example, the nested structure might look like this:

<h4 class="title">

<a href="...">This is a Title</a>

</h4>

In this case, the attribute we are interested in is class, and its value is title. If you have selected a simple target, the data is likely nested in a similar way. Complex targets may require more effort to extract the data. Let's continue coding and add the class we found in the source:

# Change 'list-item' to 'title'.

for element in soup.findAll(attrs={'class': 'title'}):

...

Now our loop will iterate over all elements with the class "title" in the page source. We will process each of them:

name = element.find('a')

Let's see how our loop navigates the HTML:

<h4 class="title">

<a href="...">This is a Title</a>

</h4>

The first statement in the loop finds all elements that match the specified tags and have the "class" attribute containing "title". We then perform another search within that class to find all <a> tags in the document (matching exactly <a>, excluding partial matches like <span>). Finally, the object is assigned to the variable "name".

If we directly assign the "name" object to our previously created list, "results", it will include the entire <a href...> tag with the text inside it as one element. In most cases, we only need the text itself without any additional tags.

# Add the text of the "name" object to the "results" list.

# `<element>.text` extracts the text within the element, excluding HTML tags.

results.append(name.text)

Our loop will traverse the entire page source, find all occurrences of the specified classes, and append the nested data to our list. Here's the updated code:

import pandas as pd

from bs4 import BeautifulSoup

from selenium import webdriver

driver = webdriver.Chrome(executable_path='/nix/path/to/webdriver/executable')

driver.get('https://your.url/here?yes=brilliant')

results = []

content = driver.page_source

soup = BeautifulSoup(content)

for element in soup.findAll(attrs={'class': 'title'}):

name = element.find('a')

results.append(name.text)

Note that the two statements after the loop are indented. Loops require indentation to indicate nesting. Any consistent indentation will be considered valid. If you omit indentation, you will encounter an "IndentationError" with an arrow pointing to the offending statement.

Exporting the data to CSV

Even if there are no syntax or runtime errors when running our program, there could still be semantic errors. It's important to check if the data is correctly assigned to the right objects and if it's being stored in the array properly.

One simple way to verify if the acquired data is collected correctly is by using the print function. Since arrays have multiple values, a loop is often used to print each entry on a separate line:

for x in results:

print(x)

Both print and for should be familiar by now. We use this loop for quick testing and debugging purposes. Alternatively, we can directly print the results array:

print(results)

So far, our code should look like this:

driver = webdriver.Chrome(executable_path='/nix/path/to/webdriver/executable')

driver.get('https://your.url/here?yes=brilliant')

results = []

content = driver.page_source

soup = BeautifulSoup(content)

for a in soup.findAll(attrs={'class': 'class'}):

name = a.find('a')

if name not in results:

results.append(name.text)

for x in results:

print(x)

Running our program now should not display any errors and should show the acquired data in the console. While print is useful for testing, it may not be the best approach for parsing and analyzing data.

You might have noticed that the import pandas statement is still greyed out. We will finally make use of the library. I recommend removing the print loop for now, as we will be doing something similar but moving our data to a CSV file.

df = pd.DataFrame({'Names': results})

df.to_csv('names.csv', index=False, encoding='utf-8')

These two new statements rely on the pandas library. The first statement creates a variable df and converts its object into a two-dimensional data table. "Names" is the name of our column, while results is the list to be printed out. Note that pandas can create multiple columns, but we don't have enough lists to utilize those parameters (yet).

The second statement moves the data from the df variable to a specific file type, in this case, a CSV file. The first parameter assigns a name and extension to our file. Including an extension is necessary, as pandas will otherwise output a file without one, requiring manual renaming. The index parameter can be used to assign specific starting numbers to columns. The encoding parameter specifies the character encoding format. UTF-8 is sufficient for most cases.

The updated code will look like this:

import pandas as pd

from bs4 import BeautifulSoup

from selenium import webdriver

driver = webdriver.Chrome(executable_path='/nix/path/to/webdriver/executable')

driver.get('https://your.url/here?yes=brilliant')

results = []

content = driver.page_source

soup = BeautifulSoup(content)

for a in soup.findAll(attrs={'class': 'class'}):

name = a.find('a')

if name not in results:

results.append(name.text)

df = pd.DataFrame({'Names': results})

df.to_csv('names.csv', index=False, encoding='utf-8')

No imports should be greyed out now, and running our application will output a "names.csv" file in our project directory. Note that a "Guessed At Parser" warning may still appear, but for the purposes of this Python web scraping tutorial, the default HTML parser will suffice.

Exporting the data to Excel

The Pandas library provides a convenient function to export data to Excel, which simplifies the process of transferring data to an Excel file. However, before using this function, you need to install the openpyxl library. You can install it by running the following command in your terminal:

pip install openpyxl

Now, let's see how we can use Pandas to write data to an Excel file:

df = pd.DataFrame({'Names': results})

df.to_excel('names.xlsx', index=False, encoding='utf-8')

In the above code, we create a DataFrame, which is a two-dimensional tabular data structure. The column label is "Names," and the rows include data from the results array. Although Pandas can handle multiple columns, we only have a single column of data in this case.

The second statement converts the DataFrame into an Excel file (".xlsx"). The first argument specifies the filename as "names.xlsx". We set the index argument to False to avoid numbering the rows. Finally, the encoding is set to "utf-8" to support a broader range of characters.

Here's the updated code including the Excel export functionality:

import pandas as pd

from bs4 import BeautifulSoup

from selenium import webdriver

driver = webdriver.Chrome(executable_path='/path/to/webdriver/executable')

driver.get('https://your.url/here?yes=brilliant')

content = driver.page_source

soup = BeautifulSoup(content)

results = []

for a in soup.findAll(attrs={'class': 'class'}):

name = a.find('a')

if name not in results:

results.append(name.text)

df = pd.DataFrame({'Names': results})

df.to_excel('names.xlsx', index=False, encoding='utf-8')

In summary, the above code creates a "names.xlsx" file with a "Names" column that includes all the data we have in the results array so far.

More lists. More!

When performing web scraping, it is often necessary to extract multiple sets of data to gather meaningful information and draw conclusions. In this tutorial, we will extract data from a different class while maintaining the structure of our table.

To store the additional data, we need another list:

import pandas as pd

from bs4 import BeautifulSoup

from selenium import webdriver

driver = webdriver.Chrome(executable_path='/nix/path/to/webdriver/executable')

driver.get('https://your.url/here?yes=brilliant')

results = []

other_results = []

for b in soup.findAll(attrs={'class': 'otherclass'}):

# Assume that data is nested in 'span'.

name2 = b.find('span')

other_results.append(name.text)

Since we are extracting an additional data point from a different part of the HTML, we need an additional loop. If needed, we can also add another conditional statement (if) to control for duplicate entries.

Next, we need to update how our data table is formed:

df = pd.DataFrame({'Names': results, 'Categories': other_results})

Now, the updated code should look something like this:

import pandas as pd

from bs4 import BeautifulSoup

from selenium import webdriver

driver = webdriver.Chrome(executable_path='/nix/path/to/webdriver/executable')

driver.get('https://your.url/here?yes=brilliant')

results = []

other_results = []

content = driver.page_source

soup = BeautifulSoup(content)

for a in soup.findAll(attrs={'class': 'class'}):

name = a.find('a')

if name not in results:

results.append(name.text)

for b in soup.findAll(attrs={'class': 'otherclass'}):

name2 = b.find('span')

other_results.append(name.text)

df = pd.DataFrame({'Names': results, 'Categories': other_results})

df.to_csv('names.csv', index=False, encoding='utf-8')

If you are lucky, running this code will not produce any errors. However, in some cases, you may encounter a "ValueError: arrays must all be the same length" message. This occurs when the length of the results and other_results lists is unequal, preventing Pandas from creating a two-dimensional table.

To address this issue, we can create two series:

series1 = pd.Series(results, name='Names')

series2 = pd.Series(other_results, name='Categories')

df = pd.DataFrame({'Names': series1, 'Categories': series2})

df.to_csv('names.csv', index=False, encoding='utf-8')

By using two series, we can resolve the "ValueError" if two data points are needed. The final code should look something like this:

import pandas as pd

from bs4 import BeautifulSoup

from selenium import webdriver

driver = webdriver.Chrome(executable_path='/nix/path/to/webdriver/executable')

driver.get('https://your.url/here?yes=brilliant')

results = []

other_results = []

content = driver.page_source

soup = BeautifulSoup(content)

for a in soup.findAll(attrs={'class': 'class'}):

name = a.find('a')

if name not in results:

results.append(name.text)

for b in soup.findAll(attrs={'class': 'otherclass'}):

name2 = b.find('span')

other_results.append(name.text)

series1 = pd.Series(results, name='Names')

series2 = pd.Series(other_results, name='Categories')

df = pd.DataFrame({'Names': series1, 'Categories': series2})

df.to_csv('names.csv', index=False, encoding='utf-8')

Running this code will create a CSV file named "names" with two columns of data.

Web scraping with Python best practices

Now that our basic web scraper is fully functional, it's time to explore some additional features and enhancements to make it more robust and versatile. Here are some recommendations:

  • Create matched data extraction by ensuring lists have an even length. This can be achieved by checking the length of each data list and handling any unevenness appropriately. For example, you can skip or discard the last element if the lengths don't match.
  • Scrape multiple URLs in one go by implementing a loop and an array of URLs to visit. Instead of repeating the code for each URL, use a loop to iterate over the array of URLs and perform the scraping operation for each URL.
  • Store different sets of data in separate arrays and output them into one file with different rows. This allows you to scrape and organize multiple types of information simultaneously. Create separate arrays for each type of data, and then combine them into a single DataFrame or CSV file.
  • Switch to headless versions of Chrome or Firefox browsers to reduce load times. Headless browsers operate without a graphical user interface, making them faster and more efficient for web scraping. You can use tools like Selenium with headless mode enabled.
  • Develop a scraping pattern that emulates a regular user's browsing behavior. Think about how a user would interact with the website and automate those actions. Utilize additional libraries like "time" and "random" to introduce wait times between page requests, simulate scrolling, or send specific key inputs.
  • Implement a monitoring process to regularly check and scrape data from specific URLs at set intervals. This is useful for websites with time-sensitive or dynamically changing data. Create a loop that periodically rechecks the URLs and scrapes fresh data to ensure your dataset is always up to date.
  • Leverage the Python Requests library for optimized HTTP requests. Requests is a powerful library for making HTTP requests, and it can be a valuable addition to your web scraping toolkit. It provides flexibility and control over the request methods and parameters sent to the servers.
  • Integrate proxies into your web scraper to access location-specific data or bypass restrictions. Proxies allow you to route your requests through different IP addresses, enabling you to acquire data that might be otherwise inaccessible due to geographical or access limitations.

By incorporating these recommendations, you can enhance your web scraper's capabilities and make it more efficient and effective in acquiring the data you need.

Conclusion

From this point onward, the journey is yours to embark upon. Building web scrapers in Python, acquiring data, and extracting insights from vast amounts of information is undeniably an intriguing yet intricate process. It requires a combination of technical expertise, problem-solving skills, and a deep understanding of the data you seek to uncover. As you delve further into the world of web scraping, remember to stay curious, explore different techniques, and adapt your approach to suit the unique challenges each project presents. Embrace the intricacies of this field, for it is through overcoming these complexities that you will unlock the full potential of web scraping and uncover valuable insights hidden within the digital landscape.

News and updates

Stay up-to-date with the latest web scraping guides and news by subscribing to our newsletter.

Subscribe

Related articles

thumbnail
ScrapersWeb Scraping With RegEx

Regular Expressions (RegEx) are powerful pattern matching tools that allow you to filter and extract specific combinations of data, providing the desired output.

Flipnode
author avatar
Flipnode
5 min read
thumbnail
How to Use DataXPath vs CSS Selectors

Read this article to learn what XPath and CSS selectors are and how to create them. Find out the differences between XPath vs CSS, and know which option to choose.

Flipnode
author avatar
Flipnode
12 min read
thumbnail
ScrapersScraping Amazon Product Data: A Complete Guide

Master the art of building an Amazon scraper from scratch with this practical, step-by-step tutorial.

Flipnode
author avatar
Flipnode
11 min read