+ 1

How to implement word count if my file has hyphens?

language Scala or Java If I want count every word and their frequencey which appears in, for example, *.txt, however,if there is a hyphen , and the .txt reads: hello everyone, I am Bill Ga- tes My aim is to print out each word and their frequency in the *.txt by using a Map[String,Int] So how to deal with the word "Gates"? I am a rookie in Scala and The only method I know is sourse.getline() to read this file so My output is not one "Gates" but two words. How to read a hyphen "-" in a file and count one word , but not two? Appreciate your help! Scala will be better Java is also accepted thanks!

12th Sep 2020, 3:39 PM
Tinm jac
Tinm jac - avatar
6 Réponses
+ 5
A hyphen represents two words, which if a word with hyphen is counted as two words, we can remove the number of hyphens from the final result. Let's say you had this text: Hello I am Ay-mane and I wa-nt to say some-thing. We have 10 words. The 'count word' merhod will count 13, then we remove the number of hyphens to get 10.
12th Sep 2020, 3:57 PM
Aymane Boukrouh
Aymane Boukrouh - avatar
+ 4
Tinm jac you can follow these steps: - read the whole file - count then remove hyphens - remove "\n" (they represent newline) I'm sure there are methods to do that, you can use what Aleksandrs Kalinins suggested, it will he something like: string.replaceAll("-", ""), which will remove all your hyphens. Same goes for "\n". I think thay is the result you want, but I do not know scala so I can't write a functioning script (but I can probably help you in dms, then you share final solution here)
12th Sep 2020, 4:15 PM
Aymane Boukrouh
Aymane Boukrouh - avatar
+ 1
Well, in Java, before putting string into a map, you can call replaceAll method, to remove any characters you want. I bet this method exists in Scala also. https://code.sololearn.com/cijFhzvfH35Z/?ref=app
12th Sep 2020, 3:55 PM
Aleksandrs
Aleksandrs - avatar
0
But what if the file is very long? I cannot claim them and replaceAll them one by one 😂
12th Sep 2020, 4:00 PM
Tinm jac
Tinm jac - avatar
0
Aymane's method is brilliant to count the total words But what if I want more?😂😂 Each word and its frequency! Maybe I ask for too much😂😂
12th Sep 2020, 4:07 PM
Tinm jac
Tinm jac - avatar
0
My idea is convert all the content of this file into a huge String which can be printed in one line! so I can replace hyphen into "" and two words merge into one word BUT HOW TO IMPLEMENT?That's hard for me😂😂😂😂
12th Sep 2020, 4:08 PM
Tinm jac
Tinm jac - avatar