how to Process the text in the document

Question

tokenization of text document
I know the file with the file, but I do not answer it in pycharm

Answer

You want to simply break the text into words, or need a more complex analysis?

In the first case you can use the split() function to break a string into a list of strings. You can also get rid of punctuation with replace().

If you need some better data science tools, Python probably has them in some module. Maybe this helps:

http://www.nltk.org

Answer

Exactly what Pedro said. Nltk's tokenizer is really good and the lib itself can get you going through the whole process -- plus if you want to do a semantic analysis, you can employ word2vec, which goes smoothly with nltk corpus.

Answer

Please explain, your question is a bit vague, else I hope this helps.

with open('text_file.txt') as f:
    file_contents = f.readlines()

# This should print out the contents of the file named 'text_file.txt'
print (file_contents)

how to Process the text in the document

Often have questions like this?