+ 1

How can we get urls(sitemap) from website, if it's a JSON website?

Direct me to write script for that, if possible give me a script, which fetches all the urls from JSON pages or else help me to fetch all the intermediate links.

4th Feb 2017, 6:00 PM
RajeshKanna P
RajeshKanna P - avatar
1 Answer
+ 3
In Python, you should start from: import json, urllib.parse, urllib.request absolute_url = '(enter the proper json link here)' url = absolute_url + urllib.parse.urlencode() # you should encode the url data to be fetched: data = urllib.request.urlopen(url).read().decode # remember to decode the encoded format content in json format will be gathered by: content = json.loads(str(data)) The rest is actually dependent on the json tree content and shape. You can traverse the tree by using brackets [] for example - if absoulute links are held under a tree of: LINKS: Name: Name1 - webpage name Link: Link1 - link to be retrieved Description: Description1 - some other data Name: Name2 Link: Link2 Description: Description2 ... you can access the link by: link = content["LINKS"]["Name"]["Link"] of course if you don't know the number of links to retrieve, you should use the while True loop and catch the potential error of not retrieving the content. I recommend first trying to fetch all the content and print it to see how it's structured.
4th Feb 2017, 8:54 PM
Kuba SiekierzyƄski
Kuba SiekierzyƄski - avatar