0

A bytes like object is required?

Hi, I've tried this web scraping code: import urllib.request page = urllib.request.urlopen('http://www.robpercival.co.uk/sampledata.html') website = page.read() row_list = website.split("<tr>") print(row_list) However, getting this error --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-3-626463755c02> in <module> 2 page = urllib.request.urlopen('http://www.robpercival.co.uk/sampledata.html') 3 website = page.read() ----> 4 row_list = website.split("<tr>") 5 print(row_list) TypeError: a bytes-like object is required, not 'str' What is wrong?

19th Oct 2019, 9:26 AM
Terry
Terry - avatar
4 odpowiedzi
+ 3
i did a test with type(). it seems that read() is returning byte, not string. add .decode('utf-8') should fix it
19th Oct 2019, 10:36 AM
Taste
Taste - avatar
+ 1
both methods worked, thanks for the answers!
31st Oct 2019, 1:42 AM
Terry
Terry - avatar
0
The reason for this error is that in Python 3, strings are Unicode, but when transmitting on the network, the data needs to be bytes instead. We can convert bytes to string using bytes class decode() instance method, So you need to decode the bytes object to produce a string. In Python 3 , the default encoding is "utf-8" , so you can use directly: b"python byte to string".decode("utf-8") Python makes a clear distinction between bytes and strings . Bytes objects contain raw data — a sequence of octets — whereas strings are Unicode sequences . Conversion between these two types is explicit: you encode a string to get bytes, specifying an encoding (which defaults to UTF-8); and you decode bytes to get a string. Clients of these functions should be aware that such conversions may fail, and should consider how failures are handled. http://net-informations.com/python/iq/byte.htm
29th Mar 2021, 5:43 AM
romankris