+ 3

Python string clean-up

How would you start going about when you have a bunch of dirty OCR text files from archive.org and you want to throw out every word that is not included in a list of words in a dictionary? I want to do that in Python... Any help appreciated! ;)

16th May 2019, 8:41 PM
::sĐș::
::sĐș:: - avatar
4 Answers
+ 5
words = [word for word in list if word in dictionary] Steven I think those are both syntactically wrong đŸ€”
17th May 2019, 5:03 AM
Anna
Anna - avatar
+ 2
Steven thank you for taking the time and answering in an abstract yet detailed way. I will see what I can do!
17th May 2019, 4:44 AM
::sĐș::
::sĐș:: - avatar
+ 2
Hehe thanks Anna
17th May 2019, 6:16 AM
::sĐș::
::sĐș:: - avatar
+ 2
Following up on my own question: I figured out a way that works for me without much hassle, comparing the text as a set using difference (): https://code.sololearn.com/cQ12Vw72r4ro/?ref=app
5th Jun 2019, 6:30 AM
::sĐș::
::sĐș:: - avatar