+ 2
❓❓ ❓ How to output an array of words from a string??
I need a function that output an array of words from a string - without punctuation marks - string can be in any language EXAMPLE: Input: "This is Jo! Jo - is my friend. He speaks леотуту language." Output: ["This", "is", "Jo", "Jo", "is", "my", "friend", "He", "speaks", "леотуту", "language"] THANK YOU!
22 Réponses
+ 2
You can try this approach.
https://code.sololearn.com/cEpBh4lO44kc
const WORDPATTERN = /[^\s!?.,']+/gu;
const toWords = (text) => {
let words = text.match(WORDPATTERN);
return words.filter(w => w == w.replace(/[-&]/g, ""))
}
1. match groups of characters that exclude whitespace and the listed punctuation marks
2. filter the result to remove words which only consist of special chars such as - or &
+ 5
// Hope this helps you
let str = "HTML is the standard markup language for Web pages"
let arrStr = str.split(" ")
for (let w of arrStr) {
console.log(w)
document.write(w + '<br />')
}
arrStr.forEach((item, index, array) => {
console.log(item, index);
});
+ 3
// Try this code
for (let w of arrStr) {
w = w.replaceAll(/[.,-?!]/ig, '')
if (w == "") continue
console.log(w)
document.write(w + '<br />')
}
+ 2
SoloProg Thank you, but unfortunately, this is not exactly what I need ((
- Output: there must be an array data type.
- And punctuation should not be output to an array.
- and user can write a string in any language.
For example input:
"This is Jo! Jo - is my friend. He speaks леотуту language."
Output: ["This", "is", "Jo", "Jo", "is", "my", "friend", "He", "speaks", "леотуту", "language"]
+ 2
Use array.filter(...) function to remove unwanted items from an array.
https://code.sololearn.com/WK2887l09r4H
+ 2
User-made regex lessons on Sololearn:
https://www.sololearn.com/learn/9704/?ref=app
+ 1
SoloProg better, thank you:) But data type of output must be array
+ 1
This can be solved with regular expressions too.
const sentence = "This is Jo! Jo - is my friend. He speaks леотуту language.";
const pattern = /\p{Letter}+/gu;
const words = sentence.match(pattern);
console.log(words);
To understand how the \p works with unicode, see this:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Unicode_character_class_escape
It's needed because of the cyrillic characters.
The result of match() is an array of the regex match results, or null if no match was found.
+ 1
SoloProg Thank you!!!
+ 1
Tibor Santa Thank you for solution and website!!! That is what I need!
But how to include to "match" numbers and underscore? Or any characters that are in the word (after or before the letter)?
For example words like: queue#3, 7Eleven, Samsung_a52, Xiaomi-12, Tom&Jerry, ...
But, for example, if we have "Tom & Jerry" - here & is not a word.
P.S.
I also tried sentence.match(/\w+/g) but it doesn`t work for cyrillic characters. So "match" is the only method for this?
And maybe you know some more websites/apps/YouTube channels to learn about ReGex for newbies?) It's really cool to know how to use them!
+ 1
Thank you, SoloProg!!!
+ 1
PR PRGR yes the pattern /\w+/g would work nicely for text that has only English characters, and it only captures the letters.
The \p{Letter} category does capture also other languages and it must be used together with the 'u' modifier which is for Unicode mode.
Also \S matches any non-whitespace characters. For this case this would not really work because you want to exclude some punctuation.
To make it more precise you can use a "character set" in square brackets, where you list all applicable characters that can be part of the word.
/[\d\p{Letter}#&_]+/gu
\d means a digit which can also be expressed as [0-9]
But in this case the & character in "Tom & Jerry" would be considered an individual word, and I would find it really complex to handle this problem inside the regex world. So I would apply some post processing on the result array and remove or adjust words which do not really meet your conditions. (There could be tricky edge cases, like what if the word ends with &)
+ 1
Tibor Santa maybe we can do something like this:
This is the word if:
- if there is a character (or several characters) followed by a letter(s),
- and if there is a letter(s) followed by another character(s).
If there is a character(s) (other than numbers) without a single letter, then it is not a word.
But I don't know how to code it..
+ 1
let anytext = "I love coding";
let arr = Array.from(anytext);
console.log(arr);
// Hope this helps
+ 1
With indexing and slicing by python. See
string = 'this is good'
list = []
list.insert(0, string[0:4])
list.append(string[5:7])
list.append(string[8:12])
And now you have a list but if you want it more automatic, just learn more with split() function.
+ 1
Tibor Santa Thanks a lot!!! That's what I need. Thank you very much for help!
0
Hhh
0
@Mariano Thank you, but this way is only suitable for one particular string. And we don’t know what string the user will enter, so we need to make a more universal code (in js).
0
@Aradhna Thank you, but unfortunately, this does not fit the task.