2

how to get access to this API:

import requests url ='https://b2c-api-premiumlabel-production.azurewebsites.net/api/b2c/page/menu?id_loja=2691' print(requests.get(url)) 

I'm trying to retrieve data from this site via API, I found the url above and I can see its data , however I can't seem to get it right because I'm running into code 403. This is the website url: https://www.nagumo.com.br/osasco-lj46-osasco-ayrosa-rua-avestruz/departamentos

I'm trying to retrieve items category, they are visible for me, but I'm unable to take them. Later I'll use these categories to iterate over products API.

API Category

Obs: please be gentle it's my first post here =]

Share
2
  • What are you trying to get from that website? Is there a product(s)?
    – QHarr
    Jun 15 at 4:36
  • @QHarr just edited my post, please check the image "API Category", which get product's category data, that I can use to iterate over products later. The call to access this endpoint with category will be the same to access the products list, because their request headers are the same or at least very similar.Jun 15 at 20:57

3 Answers 3

Reset to default

Introducing: Trending sort

You can now choose to sort by Trending, which boosts votes that have happened recently, helping to surface more up-to-date answers.

Trending is based off of the highest score sort and falls back to it if no posts are trending.

1

To get the data as you shown in your image the following headers and endpoint are needed:

import requests headers = { 'sm-token': '{"IdLoja":2691,"IdRede":884}', 'User-Agent': 'Mozilla/5.0', 'Referer': 'https://www.nagumo.com.br/osasco-lj46-osasco-ayrosa-rua-avestruz/departamentos', } params = { 'id_loja': '2691', } r = requests.get('https://www.nagumo.com.br/api/b2c/page/menu', params=params, headers=headers) r.json() 
Share
2
  • Thank you very much!! I tried out some headers by my-self, but I didn't succeed. How did you manage to find out the right request header, the ones that gets in? I'm just asking to know if you tried out one by one manually, or you have some kind of trick to find the right ones.Jun 15 at 22:42
  • I removed ones that from experience I knew were unlikely to be needed. Then commented out others one by one. When I had the needed set I then tested removing parameters within headers as well. You could also use a tool like Postman, WireShark or Insomnia.
    – QHarr
    Jun 16 at 2:13
0

Not sure exactly what your issue is here. Bu if you want to see the content of the response and not just the 200/400 reponses. You need to add '.content' to your print.

Eg.

#Create Session s = requests.Session() #Example Connection Variables, probably not required for your use case. setCookieUrl ='https://www...' HeadersJson = {'Accept-Language':'en-us'} bodyJson = {"__type":"xxx","applicationName":"xxx","userID":"User01","password":"password2021"} #Get Request p = s.get(otherUrl, json=otherBodyJson, headers=otherHeadersJson) print(p) #Print response (200 etc) #print(p.headers) #print(p.content) #Print the content of the response. #print(s.cookies) 
Share
4
  • this is the problem the code is 403Jun 15 at 1:36
  • I see. If you check the response headers/content etc can you see why it is responding that way? Usually it will say something like 'malformed header' or something like that. Then you will need to find out why.Jun 15 at 1:38
  • That is why I put in the 'setCookieUrl' etc. Each site you interact with may handle requests differently, and expect certain headeres/cookies etc to be set when making a request. Or it may respond with a Cookie which you will need for future requests. It can be a pain, but if you persist you will get there.Jun 15 at 1:40
  • he HTTP 403 is an HTTP status code meaning access to the requested resource is forbidden. The server understood the request, but will not fulfill it. This means your request is malformed in some way.Jun 15 at 1:43
0

I'm also new here haha, but besides this requests library, you'll also need another one like beautiful soup for what you're trying to do.

bs4 installation: https:https://www.crummy.com/software/BeautifulSoup/bs4/doc/

Once you install it and import it, it's just continuing what you were doing to actively get your data.

response = requests.get(url) soup = BeautifulSoup(response.text, "html.parser") 

this gets the entire HTML content of the page, and so, you can get your data from this page based on their css selectors like this:

site_data = soup.select('selector') 

site_data is an array of things with that 'selector', so a simple for loop and an array to add your items in would suffice (as an example, getting links for each book on a bookstore site)

For example, if i was trying to get links from a site:

import requests from bs4 import BeautifulSoup sites = [] URL ='https://b2c-api-premiumlabel-production.azurewebsites.net/api/b2c/page/menu?id_loja=2691' response = requests.get(url) soup = BeautifulSoup(response.text, "html.parser") links = soup.select("a") # list of all items with this selector for link in links: sites.append(link) 

Also, a helpful tip is when you inspect the page (right click and at the bottom press 'inspect'), you can see the code for the page. Go to the HTML and find the data you want and right click it and select copy -> copy selector. This will make it really easy for you to get the data you want on that site.

helpful sites: https://oxylabs.io/blog/python-web-scrapinghttps://realpython.com/beautiful-soup-web-scraper-python/

Share
3
  • O thanks, but still showing 403 error: print(soup) terminal: <h1 id="unavailable">Error 403 - Forbidden</h1>Jun 15 at 1:40
  • ah i thought you meant something else, that's strange. i tried the requests code and it managed to work for me. your error means that its rejecting your get request. Try visiting this, i think it'll help: stackoverflow.com/questions/38489386/…
    – kayak
    Jun 15 at 1:45
  • there is no need from BeautifulSoup in API response in JSON
    – Hanna
    Jun 15 at 23:22

Not the answer you're looking for? Browse other questions tagged or ask your own question.