Working with natural languages

Question

I would like to analyze large amounts of text for the contained vocabulary.

Therefore, I'd like a tool that recognizes all sorts of shapes of words and connects them back to the basic word, so that they are only counted once.

For example the words "counting", "count", "counted", "counts" would all be recognized as "count" and... counted only once.

Is there some framework with the appropriate databases that can do that sort of thing, preferably an easy-to-use one?

Accepted Answer

https://code.sololearn.com/cUNN85EmXzRN/?ref=app

Answer

So you have a text and want to extract word stems out of it (its sentences)?

Did you try nltk (Python )? It should enable you to do something like that for English at least...

Answer

Simon Sauter the ability to use Snowball for different languages

Answer

I've never used it myself, but this looks like it might do what you're looking for:
https://machinelearningknowledge.ai/learn-lemmatization-in-ntlk-with-examples/

Answer

Vitaly Sokol, wow, thank you, the example shows it clearly!

Answer

Arif Dastager That's exactly the code posted above, isn't it?

Answer

Hm, cool, that does look like the general thing I need...

Would be great if it worked for other languages, foremost German and Japanese.

Thanks, I'll check that out!

Answer

Vitaly Sokol is there a reason why you used stemming instead of lemmatization?

Answer

That's real nice 👍

Working with natural languages

Often have questions like this?