+ 1
Auto Remove all duplicate words from a text file
i have a file names "test.txt" which have alot username, and some usernames are repeated and i don't know which usernames are repeated. i just want to remove similar usernames from that file using python.. can anyone help me please?
9 Antworten
+ 8
You did not specify the structure of the file: does it contain only usernames, each in a new line? Or separated by some other character? Is there any other data that the search should ignore? Is the order of the lines important?
And you did not provide your code attempt. So you can get only generic advice.
There is no support in the filesystem level to simply remove content from the middle of a file and "shift" the rest of the file.
If the file is not too big, you can recreate it in memory, do some modifications, and save with the same name.
As a general idea, using a set data structure can be a good solution to remove duplicates.
+ 6
Not sure if this suits your situation, but I just think that if the names appear as rows in the file, you can read the lines, and put the line into a 'set' which will only keep unique lines. You can join the 'set' contents as new string afterwards, then save it back in the file. Just an idea.
+ 4
Also you wrote that you want to "remove similar usernames".
Similar has a different meaning than "same". You need to be more specific. Checking for similarity is more complicated than checking for equality. Do you consider any difference in uppercase / lowercase as similar? What if one character is different?
+ 3
with open('usernames.txt', 'r') as file:
names = file.readlines()
unique_names = set(names)
concat_names = '\n'.join(unique_names)
with open('usernames.txt', 'w') as file:
file.write(concat_names)
# I have not tested this, make sure to create a backup of your file before you try it :)
+ 2
okay tysm Tibor Santa
+ 1
Tibor Santa each name starts on new line, and i just want to remove names which are similar/duplicate,
order of lines is not important just every username should start with new line
+ 1
Tibor Santa all usernames are in lower case format, and if there's a small difference in 2 usernames, search can ignore it
+ 1
"Similar usernames" is too abstract... If you want remove duplicate (and the file is not very large) you can do as suggested else you have to be more precise on "similar" mean
+ 1
Use sort and uniq command from linux
Something like this:
Os.system(“sort fname | uniq >> foo.txt”)