18 odpowiedzi
+ 3
All have to be the same work?? 😁😁 see https://www.sololearn.com/Discuss/1046025/?ref=app
+ 1
Its correlated... If he will parse an 4gb file probably the parsing process will consume too memory (string allocations, copy and reallocations)... I gived in more memory conservative way
+ 1
this is a common topic in batch processing. I haven't practice it myself because there are etl out there on the market that handles such requests. the concept that you might want to research is chunking. first understand the data. are all lines related in some way? what must be done to some lines to be considered work done? after some work is done, keep track of the progress written in storage if process is killed so the process can resume where it left off. also monitor memory and garbage collection.
0
read blocks of fixed size... if you are parsing some data you must trace the context of parsing in this case
0
i have to read a line by line. in the same time i need to prevent loading whole file to memory. could you show me an example please? @KrOW
0
file contain data to be parsed?
0
since the file has no delimiters, like a gap or comma or any, it's xml tags in txt file, so i need to read a full line in order to parsing the line
0
ok but data to parse are sequential? (in html-xml you can parse the file in sequential way if store the context of parsing)
0
i made a code to parse one line in a file i made separately. however, as my code deals with the real big file, it shows a memory error!
0
the line format is complex? can you send an example?
0
my problem is not with parsing
the problem is as my code requesting the file to be read, a massive file creates a memory error
i need to make my code to keep the whole file itself rather to read a line in a time, process it then reads the second line and so on
0
I have inderstand this but ram is finite and if you parse too large files (or too large lines) you must considerate to parse they in a more memory conservative way
0
This is how I’m parsing
# ## read text file
with open(filename, 'r') as daF:
# read line from text file
for line in daF:
############BIGGING OF LINE PROCESSING
lineLeng = len(line)
if lineLeng > 0: # to escape empty lines -usually at the end of file-
line_cntr += 1
# read a character from the line
for ch in line:
ch_cntr += 1
# ## test if ch == '/' -bigging of closing tag-
# '/' bigging of closing tag
# '>' end of closing tag
if ch == deli2:
pos1 = ch_cntr # get ch position
pos2 = line.find(deli3, pos1) # '>' position
key = line[pos1:pos2] # key = \key>
keyLeng = len(key)
pos3 = line.rfind(key, 0, pos1) # search reversly from ('/' position1) up to bigging of line to find a match of 'key'
if line[pos3 + keyLeng + 1] != deli1: # if <key_match>the_value</key>
value = line[pos3 + keyLeng + 1:pos1 - 2] # value = the_value
0
note:
< is deli1
/ is deli2
> is deli3
0
it works perfectly with a file of a few lines, but not with a big file!
0
try to see python lxml on google and read docs... In your case, search also for incremental parsing in the docs
0
Any update with anyone?
0
Hi guys, the question is about dealing with memory not xml parsing. If you would like to note