Is Web Scraping Legal?
Flipnode on Apr 11 2023
The demand for big data has surged in recent years, with the revenue generated by the big data market size constantly growing every year, according to Statista. As a result, the web scraping industry has also gained popularity as one of the most common data collection methods. However, the legality of web scraping is a much-debated topic among developers and those who work in the data gathering field.
In this article, we will explore the legal questions surrounding web scraping, as well as the potential legal issues one may encounter when scraping certain websites. It is important to note that this article is for informational purposes only and does not constitute legal advice. Therefore, before engaging in any scraping activities, it is essential to seek professional legal advice regarding your specific situation.
Is web scraping legal or illegal?
The legality of web scraping is a complex topic, and the answer is not a straightforward yes or no. While scraping your website is generally acceptable, businesses often use bots for their benefit but do not want others to use web scrapers against them. If you are concerned about the legal implications of web scraping, it is crucial to understand what types of scraping activities are illegal.
Here are some examples of web scraping activities that are prohibited:
- Logging into websites or web pages with your web scraper and downloading data is illegal. Users must agree to the website's Terms of Service (ToS) when logging in, and these terms may prohibit automated data collection.
- There is a common misconception that publicly available data can be used however you want. Although there may be fewer restrictions on scraping publicly available data compared to private information, you must ensure that you are not breaking any laws that may be applicable to the data, such as downloading copyrighted content. This can include designs, layouts, articles, videos, and other creative works.
- Even if the data is for personal use, the website's Terms of Service may forbid any type of automated data collection. In this case, the scraping activity itself may be illegal, not just the use of the data.
Why does web scraping activity sometimes appear negatively?
Web scraping can be legal, provided that the process adheres to the rules and applicable laws surrounding the targeted websites or data being gathered. However, web scraping can be abused by malicious actors or hackers, which can raise suspicions about its legitimacy. Here are some reasons why web scraping may be considered suspicious:
- While the public gathering of data for business improvement may not seem unethical, it can be perceived differently when someone else is scraping your website for the same reasons.
- There are instances where individuals or companies abuse web scraping, violating ToS, copyright laws, or other applicable regulations. In such cases, web scraping can be viewed as malicious and unethical. This can make it challenging to explain and prove that the primary goal of web crawling and scraping is to make data-driven decisions from publicly available information.
- Web scraping involves sending multiple requests to websites to obtain the desired information. As the process is automated, scraping tools can make more requests than a regular user would, potentially causing a heavy load on the website. Websites have security measures in place to prevent this, which is why excessive scraping can raise red flags.
How do privacy laws affect web scraping?
When scraping publicly available data, it is crucial to take into account various privacy laws.
The GDPR and CCPA
The GDPR, enacted on May 25, 2018, is a data privacy and security law implemented by the EU. Its primary goal is to grant EU citizens control over their personally identifiable information by limiting organizations that target and collect such data. Web scraping itself is not illegal under the GDPR; however, it imposes restrictions on how businesses can use the personal data they collect. For instance, in some cases, businesses must obtain explicit consent from the data subjects before collecting and utilizing their personal data.
Similarly, California enacted the California Consumer Privacy Act (CCPA), which imposes similar stringent requirements on businesses that collect personal data (e.g., consumers have the right to delete their personal information and opt-out of the sale of their data, as well as the right to non-discrimination for exercising their CCPA rights).
General advices for the best web scraping practices
We strongly recommend seeking legal advice before engaging in any scraping activities. However, to ensure compliance when web scraping, here are some practical tips to follow:
- If possible, use the provided API instead of scraping data. Keep in mind that using an API is not the same as web scraping.
- Always adhere to the Terms of Service (ToS) for each website.
- Follow the rules outlined in the robots.txt file. If you need data from a website that forbids automatic data collection, consider requesting permission from the site owner.
- Before using scraped data, ensure that it is not copyrighted. If necessary, obtain written permission from the copyright holder before publishing the data.
Web scraping cases
Ryanair v. PR Aviation (2018)
The Ryanair vs. PR Aviation case sheds light on how web scraping may be viewed in European courts. Ryanair's website has ToU that expressly forbids scraping. PR Aviation scraped Ryanair's site, and Ryanair sued them for breach of contract in the Netherlands.
The Dutch court ruled against Ryanair, finding that there was no valid contract between the companies. The court used an interesting analogy, likening the situation to a poster in a shop window that says "Whoever reads further, must pay € 5," which cannot be enforced.
However, this ruling doesn't mean that ToU wouldn't be enforceable in a different scenario. Many factors worked against Ryanair in this case, including the fact that its ToU was presented in a browsewrap format, which courts generally don't consider legally binding, and that the scraped data was freely available to the public.
Ryanair v. Expedia (2019)
After receiving a C&D letter, Expedia, a U.S. flight comparison company, continued scraping Ryanair's data, resulting in a lawsuit by Ryanair for violating the CFAA. Expedia argued that being an Irish company, Ryanair could not apply the U.S. statute of the CFAA. The courts ruled that the CFAA may apply to U.S. companies operating internationally. Following this, Ryanair and Expedia reached a confidential settlement, and Expedia's website no longer offers Ryanair flights.
HiQ labs v. LinkedIn (2019)
The company HiQ labs scraped data from LinkedIn profiles to provide tools and insights on employees to businesses. After several years of data collection, in 2017, LinkedIn issued a C&D letter to HiQ and launched a tool similar to HiQ's functionality. HiQ sought an injunction in court, leading to LinkedIn being asked to withdraw the C&D letter and stop applying any blocking measures against HiQ.
LinkedIn appealed the decision, arguing that HiQ's scraping breached the CFAA. However, the court decided that HiQ did not breach the CFAA as the data scraped from LinkedIn was public. The court noted that companies should not be able to revoke authorization where it was not needed in the first place and that allowing companies like LinkedIn to decide who can collect and use publicly available data would be against the public interest.
This decision in HiQ labs v. LinkedIn was favorable to scraping companies and reconsidered some previous court practices regarding the applicability of the CFAA, narrowing its relevance to public data. However, scraping activities still carry potential risks of breaches of the CFAA or other grounds, such as trespass to chattels, copyright, or breach of contract if not done with caution.
Later in 2022, the Court ruled that HiQ's creation of fake accounts ("Turkers") to scrape LinkedIn's data violated LinkedIn's User Agreement. As a result, in December 2022, LinkedIn and HiQ reached a settlement in which HiQ agreed to a permanent injunction requiring them to stop scraping LinkedIn.
Next, we will examine several recent legal cases related to web scraping that warrant your consideration.
Meta
Meta, the parent company of Facebook and Instagram, has recently filed two separate legal actions on July 5, 2022, against Octopus and Ekrem Ateş, accusing them of unlawfully scraping data from their social media platforms. The allegations against both parties are similar - that they extracted the personal information of individuals without authorization for inappropriate use.
Meta v. Octopus
According to Meta, Octopus, which is a U.S. subsidiary of a Chinese high-tech enterprise, provides web scraping services and software that allows customers to collect data from any website. Meta alleges that Octopus' software enables the extraction of personal information from Facebook and Instagram users, including gender, date of birth, email address, user profile URL, and location, thereby violating Meta's terms and conditions.
Meta v. Ekrem Ateş
Meta filed a second legal action against Ekrem Ateş, a Turkish national who used automated Instagram profiles to scrape data from over 350,000 users. Ateş then posted this information on a clone site, which is a third-party website that copies and displays data from original websites without authorization. While Meta claims that both cases violate their terms and conditions, the legal implications of web data scraping are intricate. Therefore, it is crucial to stay informed about these cases as they could shape the future of web scraping legality.
Wrapping up
Determining the legality of web scraping is not straightforward and depends on whether the data being scraped violates any applicable laws. Therefore, this article is intended solely for informational and educational purposes and should not be substituted for independent legal advice and judgment. The statements and opinions presented herein are those of the authors and do not necessarily represent the views or opinions of Flipnode unless expressly stated otherwise.