+ 18
Working with natural languages
I would like to analyze large amounts of text for the contained vocabulary. Therefore, I'd like a tool that recognizes all sorts of shapes of words and connects them back to the basic word, so that they are only counted once. For example the words "counting", "count", "counted", "counts" would all be recognized as "count" and... counted only once. Is there some framework with the appropriate databases that can do that sort of thing, preferably an easy-to-use one?
9 ответов
+ 9
So you have a text and want to extract word stems out of it (its sentences)?
Did you try nltk (Python )? It should enable you to do something like that for English at least...
+ 7
Simon Sauter the ability to use Snowball for different languages
+ 5
I've never used it myself, but this looks like it might do what you're looking for:
https://machinelearningknowledge.ai/learn-lemmatization-in-ntlk-with-examples/
+ 4
Vitaly Sokol, wow, thank you, the example shows it clearly!
+ 4
Arif Dastager That's exactly the code posted above, isn't it?
+ 2
Hm, cool, that does look like the general thing I need...
Would be great if it worked for other languages, foremost German and Japanese.
Thanks, I'll check that out!
+ 1
Vitaly Sokol is there a reason why you used stemming instead of lemmatization?
+ 1
That's real nice 👍