+ 2
Developing a web based similarity or plagiarism checker between submitted documents [HELP]
Good day everyone, I'm working on a web based similarity or plagiarism checker between severally submitted word documents locally in the database and am confused on what algorithm to use. The major technologies am using for this project is node js and mongodb. I will really appreciate any input on how to get this working. Note that am comparing within the submitted word documents locally stored in the database and not the general internet. Thank you. CC: Sololearn family
2 odpowiedzi
0
Hi. I don't know what is common approach by which people solve this. I would do the next.
1. Split the text into sentences by the dot symbol.
2. Compare each sentence in document with all sentences from another ones.
I would use this for comparsion https://en.m.wikipedia.org/wiki/Levenshtein_distance
3. Summarize somehow comparsion results.
I would use mean of "levenshtein distance/sentence length" ratio. This ratio represents the difference between sentences. The less ratio the higher similarity. The documents with minimal mean ratio are the most similar.
The weak point of this algorithm is word order changes. But it will detect not modified copypasta.
0
Thanks for the response...will be looking forward for other suggestions...thank you once again