+ 1

Auto Remove all duplicate words from a text file

i have a file names "test.txt" which have alot username, and some usernames are repeated and i don't know which usernames are repeated. i just want to remove similar usernames from that file using python.. can anyone help me please?

25th Oct 2022, 4:05 AM
Rohit
Rohit - avatar
9 Answers
+ 8
You did not specify the structure of the file: does it contain only usernames, each in a new line? Or separated by some other character? Is there any other data that the search should ignore? Is the order of the lines important? And you did not provide your code attempt. So you can get only generic advice. There is no support in the filesystem level to simply remove content from the middle of a file and "shift" the rest of the file. If the file is not too big, you can recreate it in memory, do some modifications, and save with the same name. As a general idea, using a set data structure can be a good solution to remove duplicates.
25th Oct 2022, 5:09 AM
Tibor Santa
Tibor Santa - avatar
+ 6
Not sure if this suits your situation, but I just think that if the names appear as rows in the file, you can read the lines, and put the line into a 'set' which will only keep unique lines. You can join the 'set' contents as new string afterwards, then save it back in the file. Just an idea.
25th Oct 2022, 4:36 AM
Ipang
+ 4
Also you wrote that you want to "remove similar usernames". Similar has a different meaning than "same". You need to be more specific. Checking for similarity is more complicated than checking for equality. Do you consider any difference in uppercase / lowercase as similar? What if one character is different?
25th Oct 2022, 5:13 AM
Tibor Santa
Tibor Santa - avatar
+ 3
with open('usernames.txt', 'r') as file: names = file.readlines() unique_names = set(names) concat_names = '\n'.join(unique_names) with open('usernames.txt', 'w') as file: file.write(concat_names) # I have not tested this, make sure to create a backup of your file before you try it :)
25th Oct 2022, 5:24 AM
Tibor Santa
Tibor Santa - avatar
+ 2
okay tysm Tibor Santa
25th Oct 2022, 5:25 AM
Rohit
Rohit - avatar
+ 1
Tibor Santa each name starts on new line, and i just want to remove names which are similar/duplicate, order of lines is not important just every username should start with new line
25th Oct 2022, 5:12 AM
Rohit
Rohit - avatar
+ 1
Tibor Santa all usernames are in lower case format, and if there's a small difference in 2 usernames, search can ignore it
25th Oct 2022, 5:16 AM
Rohit
Rohit - avatar
+ 1
"Similar usernames" is too abstract... If you want remove duplicate (and the file is not very large) you can do as suggested else you have to be more precise on "similar" mean
25th Oct 2022, 5:46 AM
KrOW
KrOW - avatar
+ 1
Use sort and uniq command from linux Something like this: Os.system(“sort fname | uniq >> foo.txt”)
26th Oct 2022, 4:44 PM
Ashok314
Ashok314 - avatar