0

How to read line by line from 4GB txt file without loading the whole file to RAM? In python

I've a mssive 4GB txt file or more in size ->600 line-. need to read line by line from that file without loading the whole file in RAM to prevent errors, using python 3.x

1st Feb 2018, 7:33 AM
Bkeefk
18 Réponses
+ 3
All have to be the same work?? 😁😁 see https://www.sololearn.com/Discuss/1046025/?ref=app
3rd Feb 2018, 6:44 PM
KrOW
KrOW - avatar
+ 1
Its correlated... If he will parse an 4gb file probably the parsing process will consume too memory (string allocations, copy and reallocations)... I gived in more memory conservative way
3rd Feb 2018, 7:21 PM
KrOW
KrOW - avatar
+ 1
this is a common topic in batch processing. I haven't practice it myself because there are etl out there on the market that handles such requests. the concept that you might want to research is chunking. first understand the data. are all lines related in some way? what must be done to some lines to be considered work done? after some work is done, keep track of the progress written in storage if process is killed so the process can resume where it left off. also monitor memory and garbage collection.
13th Apr 2018, 2:40 AM
Wei Chen
Wei Chen - avatar
0
read blocks of fixed size... if you are parsing some data you must trace the context of parsing in this case
1st Feb 2018, 8:38 AM
KrOW
KrOW - avatar
0
i have to read a line by line. in the same time i need to prevent loading whole file to memory. could you show me an example please? @KrOW
1st Feb 2018, 8:41 AM
Bkeefk
0
file contain data to be parsed?
1st Feb 2018, 8:42 AM
KrOW
KrOW - avatar
0
since the file has no delimiters, like a gap or comma or any, it's xml tags in txt file, so i need to read a full line in order to parsing the line
1st Feb 2018, 8:43 AM
Bkeefk
0
ok but data to parse are sequential? (in html-xml you can parse the file in sequential way if store the context of parsing)
1st Feb 2018, 8:45 AM
KrOW
KrOW - avatar
0
i made a code to parse one line in a file i made separately. however, as my code deals with the real big file, it shows a memory error!
1st Feb 2018, 8:48 AM
Bkeefk
0
the line format is complex? can you send an example?
1st Feb 2018, 8:52 AM
KrOW
KrOW - avatar
0
my problem is not with parsing the problem is as my code requesting the file to be read, a massive file creates a memory error i need to make my code to keep the whole file itself rather to read a line in a time, process it then reads the second line and so on
1st Feb 2018, 8:57 AM
Bkeefk
0
I have inderstand this but ram is finite and if you parse too large files (or too large lines) you must considerate to parse they in a more memory conservative way
1st Feb 2018, 9:02 AM
KrOW
KrOW - avatar
0
This is how I’m parsing # ## read text file with open(filename, 'r') as daF: # read line from text file for line in daF: ############BIGGING OF LINE PROCESSING lineLeng = len(line) if lineLeng > 0: # to escape empty lines -usually at the end of file- line_cntr += 1 # read a character from the line for ch in line: ch_cntr += 1 # ## test if ch == '/' -bigging of closing tag- # '/' bigging of closing tag # '>' end of closing tag if ch == deli2: pos1 = ch_cntr # get ch position pos2 = line.find(deli3, pos1) # '>' position key = line[pos1:pos2] # key = \key> keyLeng = len(key) pos3 = line.rfind(key, 0, pos1) # search reversly from ('/' position1) up to bigging of line to find a match of 'key' if line[pos3 + keyLeng + 1] != deli1: # if <key_match>the_value</key> value = line[pos3 + keyLeng + 1:pos1 - 2] # value = the_value
1st Feb 2018, 9:05 AM
Bkeefk
0
note: < is deli1 / is deli2 > is deli3
1st Feb 2018, 9:06 AM
Bkeefk
0
it works perfectly with a file of a few lines, but not with a big file!
1st Feb 2018, 9:07 AM
Bkeefk
0
try to see python lxml on google and read docs... In your case, search also for incremental parsing in the docs
1st Feb 2018, 9:12 AM
KrOW
KrOW - avatar
0
Any update with anyone?
3rd Feb 2018, 6:42 PM
EBA
EBA - avatar
0
Hi guys, the question is about dealing with memory not xml parsing. If you would like to note
3rd Feb 2018, 7:09 PM
EBA
EBA - avatar