Questions tagged [web-scraping]

Web scraping is the process of extracting specific information from websites that do not readily provide an API or other methods of automated data retrieval. Questions about "How To Get Started With Scraping" (e.g. with Excel VBA) should be *thoroughly researched* as numerous functional code samples are available. Web scraping methods include 3rd-party applications, development of custom software, or even manual data collection in a standardized way.

43,960 questions
0votes
0answers
6views

How to Loop URL from A List in Scrappy and output only the response body to be downloaded into a XML/TXT file

I have this issue where I have tried the Pipeline method but I am not sure if I am doing it right based on tutorial since most pick some portions from the response.body using selectors. I however can ...
0votes
1answer
17views

Can't scrape more than 2 indeed pages

I am doing some web scraping in Python to find certain Keywords in Job descriptions in indeed job postings. However I can only scrape through 2 pages. If I increment the number of pages to 3 (variable ...
0votes
0answers
19views

Web scrapping XML page but structure is incorrect using BeautifulSoup

enter image description hereI am trying to web-scrap Shazam, but having difficulty with the structure it returns. My code is: url= shazam_page requested= requests.get(url) soup= BeautifulSoup(...
0votes
0answers
6views

Heroku external web requests limitations

We are using a Heroku Hobby dyno to do web scraping in python - using a proxy - and we have the impression that Heroku slows down the number of calls we can do from their Hobby dynos. We are ...
0votes
0answers
19views

How to get all the reviews and reviewers of every product on Amazon using Selenium Python

I want to scrape all Amazon products reviewers and their reviews using selenium python. For example . Suppose I searched for 'Smartphones' in the search bar and there will be list of all product ...
-1votes
0answers
13views

Scrape large scale data from Wikipedia

I am training a large machine learning model and need to scrape a lot of data for the same. I want to train my model on domain specific tasks and hence, given a domain, I will need to scrape Wikipedia ...
1vote
1answer
31views

Struggling to Scrape Table using rvest Package

I've recently starting using R again after a long hiatus and I'm extremely rusty, especially when it comes to html and scraping data (w/rvest). My main issue right now is identifying the correct nodes/...
-2votes
1answer
27views

Python | Web Scraping: Issue to use Web Scraping when HTML code mostly uses same classes, without any ID or Name Attributes

so the page I'm trying to use Web Scraping on is Private. It uses two-way authentication, which will not let me open the link through selenium. When I open the page manually I'm not asked for extra ...
1vote
2answers
39views

BeautifulSoup can not find "h3;" tags

The URL in this question is : https://www.empireonline.com/movies/features/best-movies-2/ As you can see the h3 tags are present in it but the beautiful soap don't print the h3 tag.
-1votes
1answer
19views

Connection Aborted Error when scraping with requests module

I'm using the request module to do some web scraping in python, but everytime i send the requests with headers and proxies, i get a connection aborted error, even though i've been told that it would ...
-2votes
2answers
60views

XPATH: Different inputs for multiple elements with same xpath, based on value of parent element

I have a relatively complex problem with xpath on Python. Here is my HTML layout: <label name=A> <span name=B> "Your Salary" <div name=C> <div name=D> ...
0votes
1answer
26views

Scrapy tracking and scraping third page

after trying to add third page to this shenanigas i got an error "You can't mix str and non-str arguments". My goal is to use url from 'website' and scrap data from it. How do i do it? Here ...
1vote
3answers
33views

Unable to obtain table info through python selenium

I am new bee on python selenium environment. I am trying to get the SQL version table from enter link description here from selenium.webdriver.common.by import By from selenium import webdriver # ...
-1votes
0answers
14views

Python scraper change IP addresses

Can someone tell me how to change IP addresses in Python? I wrote my code as this proxies = { 'http': 'http://' + proxy, 'https': 'http://' + proxy } response = ...
0votes
2answers
41views

Can't download file with Playwright Chromium or Webkit

I want to download some file for example sitemap.xml.gz. I want to do it only with playwright 1.22. I tried to do it with chromium browser, but it fails. Also it doesn't work with webkit. With webkit ...
  • 472

153050per page