+ 1

Python - a script to improve the size of a text file

Hi there I'm fairly new to Python. I have at work a script (written by my predecessor) to reduce the size of an input text file. From what I can see after going through 4 moduels of learning here today, all it seems to be doing is to append existing content onto a new file. Does this actually help improve the size of the file or is it also doing something else (please see script below). #!usr/bin/env from sys import argv data_file = raw_input("Data_File: ") output_file = raw_input("Output_File: ") data = open(data_file,"r") line = data.readline() new_file = open(output_file,"a") while line: values = line.split(",") new_line = [] for value in values: entry = value.strip() new_line.append(entry) inew_line = ",".join(new_line) new_file.write(inew_line + "\n") line = data.readline() new_file.close() Also, it may be a silly question but what does the module "from sys import argv" do? Thanks!!

5th Jan 2019, 11:49 AM
Jin
Jin - avatar
3 Antworten
+ 3
When you load a Programm, it gives arguments to that program (for example command -help or a path to a file). Module sys provides you to get this argument and use it. To see the arguments given to your program, write : print(" ".join(argv))
5th Jan 2019, 12:31 PM
Théophile
Théophile - avatar
+ 3
The code will read each line from the input file, remove whitespace from the beginning and the end of the line (but not from the middle of the line) and save it to the output file, so it will change this: this line contains a lot of whitespace #<- line ends here to this: this line contains a lot of whitespace#<- line ends here This is because of entry = value.strip(). So, yes, the output file might be smaller in size than the original size. It could be even further reduced in size by adding something like: while ' ' in entry: entry = entry.replace(' ', ' ') after entry = value.strip(). This will lead to this line contains a lot of whitespace#<- line ends here. /Edit: Yes, it keeps appending text to the output file every time you run the program. That's because the output file is opened in "a" mode (a = append). If you want it to replace the existing output file (and create a new one in case it doesn't exist), open the output file in "w" mode (w = write). Please note that this will delete the content in the previous file without any warning, so if this is an actual script you use at work better be careful...
5th Jan 2019, 1:14 PM
Anna
Anna - avatar
+ 1
Thank you Anna and Theophile for your help! If I may just ask one more question, in the "While" loop which it repeats actions for every line, why does it have to do "line = data.readline()" at the end? Thanks EDITS: Ok I figured it out, the readline pretty much moves to the next line, which is what the while loop is doing. Thanks!
5th Jan 2019, 11:50 PM
Jin
Jin - avatar