How to read line by line from 4GB txt file without loading the whole file to RAM? In python

I've a mssive 4GB txt file or more in size ->600 line-. need to read line by line from that file without loading the whole file in RAM to prevent errors, using python 3.x

python file txt massive

1st Feb 2018, 7:33 AM

Bkeefk

18 odpowiedzi

+ 3

All have to be the same work?? 😁😁 see https://www.sololearn.com/Discuss/1046025/?ref=app

3rd Feb 2018, 6:44 PM

KrOW

+ 1

Its correlated... If he will parse an 4gb file probably the parsing process will consume too memory (string allocations, copy and reallocations)... I gived in more memory conservative way

3rd Feb 2018, 7:21 PM

KrOW

+ 1

this is a common topic in batch processing. I haven't practice it myself because there are etl out there on the market that handles such requests. the concept that you might want to research is chunking. first understand the data. are all lines related in some way? what must be done to some lines to be considered work done? after some work is done, keep track of the progress written in storage if process is killed so the process can resume where it left off. also monitor memory and garbage collection.

13th Apr 2018, 2:40 AM

Wei Chen

read blocks of fixed size... if you are parsing some data you must trace the context of parsing in this case

1st Feb 2018, 8:38 AM

KrOW

i have to read a line by line. in the same time i need to prevent loading whole file to memory. could you show me an example please? @KrOW

1st Feb 2018, 8:41 AM

Bkeefk

file contain data to be parsed?

1st Feb 2018, 8:42 AM

KrOW

since the file has no delimiters, like a gap or comma or any, it's xml tags in txt file, so i need to read a full line in order to parsing the line

1st Feb 2018, 8:43 AM

Bkeefk

ok but data to parse are sequential? (in html-xml you can parse the file in sequential way if store the context of parsing)

1st Feb 2018, 8:45 AM

KrOW

i made a code to parse one line in a file i made separately. however, as my code deals with the real big file, it shows a memory error!

1st Feb 2018, 8:48 AM

Bkeefk

the line format is complex? can you send an example?

1st Feb 2018, 8:52 AM

KrOW

my problem is not with parsing the problem is as my code requesting the file to be read, a massive file creates a memory error i need to make my code to keep the whole file itself rather to read a line in a time, process it then reads the second line and so on

1st Feb 2018, 8:57 AM

Bkeefk

I have inderstand this but ram is finite and if you parse too large files (or too large lines) you must considerate to parse they in a more memory conservative way

1st Feb 2018, 9:02 AM

KrOW

This is how I’m parsing # ## read text file with open(filename, 'r') as daF: # read line from text file for line in daF: ############BIGGING OF LINE PROCESSING lineLeng = len(line) if lineLeng > 0: # to escape empty lines -usually at the end of file- line_cntr += 1 # read a character from the line for ch in line: ch_cntr += 1 # ## test if ch == '/' -bigging of closing tag- # '/' bigging of closing tag # '>' end of closing tag if ch == deli2: pos1 = ch_cntr # get ch position pos2 = line.find(deli3, pos1) # '>' position key = line[pos1:pos2] # key = \key> keyLeng = len(key) pos3 = line.rfind(key, 0, pos1) # search reversly from ('/' position1) up to bigging of line to find a match of 'key' if line[pos3 + keyLeng + 1] != deli1: # if <key_match>the_value</key> value = line[pos3 + keyLeng + 1:pos1 - 2] # value = the_value

1st Feb 2018, 9:05 AM

Bkeefk

note: < is deli1 / is deli2 > is deli3

1st Feb 2018, 9:06 AM

Bkeefk

it works perfectly with a file of a few lines, but not with a big file!

1st Feb 2018, 9:07 AM

Bkeefk

try to see python lxml on google and read docs... In your case, search also for incremental parsing in the docs

1st Feb 2018, 9:12 AM

KrOW

Any update with anyone?