+ 2

❓❓ ❓ How to output an array of words from a string??

I need a function that output an array of words from a string - without punctuation marks - string can be in any language EXAMPLE: Input: "This is Jo! Jo - is my friend. He speaks леотуту language." Output: ["This", "is", "Jo", "Jo", "is", "my", "friend", "He", "speaks", "леотуту", "language"] THANK YOU!

8th Jul 2023, 12:55 PM

PR PRGR

22 Antworten

+ 2

You can try this approach. https://code.sololearn.com/cEpBh4lO44kc const WORDPATTERN = /[^\s!?.,']+/gu; const toWords = (text) => { let words = text.match(WORDPATTERN); return words.filter(w => w == w.replace(/[-&]/g, "")) } 1. match groups of characters that exclude whitespace and the listed punctuation marks 2. filter the result to remove words which only consist of special chars such as - or &

9th Jul 2023, 4:32 PM

Tibor Santa

+ 5

// Hope this helps you let str = "HTML is the standard markup language for Web pages" let arrStr = str.split(" ") for (let w of arrStr) { console.log(w) document.write(w + '<br />') } arrStr.forEach((item, index, array) => { console.log(item, index); });

8th Jul 2023, 2:11 PM

SoloProg

+ 3

// Try this code for (let w of arrStr) { w = w.replaceAll(/[.,-?!]/ig, '') if (w == "") continue console.log(w) document.write(w + '<br />') }

8th Jul 2023, 2:56 PM

SoloProg

+ 2

SoloProg Thank you, but unfortunately, this is not exactly what I need (( - Output: there must be an array data type. - And punctuation should not be output to an array. - and user can write a string in any language. For example input: "This is Jo! Jo - is my friend. He speaks леотуту language." Output: ["This", "is", "Jo", "Jo", "is", "my", "friend", "He", "speaks", "леотуту", "language"]

8th Jul 2023, 2:27 PM

PR PRGR

+ 2

Use array.filter(...) function to remove unwanted items from an array. https://code.sololearn.com/WK2887l09r4H

8th Jul 2023, 7:19 PM

SoloProg

+ 2

https://regexlearn.com

8th Jul 2023, 10:40 PM

SoloProg

+ 2

User-made regex lessons on Sololearn: https://www.sololearn.com/learn/9704/?ref=app

9th Jul 2023, 5:43 AM

Tibor Santa

+ 1

SoloProg better, thank you:) But data type of output must be array

8th Jul 2023, 3:18 PM

PR PRGR

+ 1

This can be solved with regular expressions too. const sentence = "This is Jo! Jo - is my friend. He speaks леотуту language."; const pattern = /\p{Letter}+/gu; const words = sentence.match(pattern); console.log(words); To understand how the \p works with unicode, see this: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Unicode_character_class_escape It's needed because of the cyrillic characters. The result of match() is an array of the regex match results, or null if no match was found.

8th Jul 2023, 5:03 PM

Tibor Santa

+ 1

SoloProg Thank you!!!

8th Jul 2023, 10:24 PM

PR PRGR

+ 1

Tibor Santa Thank you for solution and website!!! That is what I need! But how to include to "match" numbers and underscore? Or any characters that are in the word (after or before the letter)? For example words like: queue#3, 7Eleven, Samsung_a52, Xiaomi-12, Tom&Jerry, ... But, for example, if we have "Tom & Jerry" - here & is not a word. P.S. I also tried sentence.match(/\w+/g) but it doesn`t work for cyrillic characters. So "match" is the only method for this? And maybe you know some more websites/apps/YouTube channels to learn about ReGex for newbies?) It's really cool to know how to use them!

8th Jul 2023, 10:26 PM

PR PRGR

+ 1

Thank you, SoloProg!!!

8th Jul 2023, 11:18 PM

PR PRGR

+ 1

PR PRGR yes the pattern /\w+/g would work nicely for text that has only English characters, and it only captures the letters. The \p{Letter} category does capture also other languages and it must be used together with the 'u' modifier which is for Unicode mode. Also \S matches any non-whitespace characters. For this case this would not really work because you want to exclude some punctuation. To make it more precise you can use a "character set" in square brackets, where you list all applicable characters that can be part of the word. /[\d\p{Letter}#&_]+/gu \d means a digit which can also be expressed as [0-9] But in this case the & character in "Tom & Jerry" would be considered an individual word, and I would find it really complex to handle this problem inside the regex world. So I would apply some post processing on the result array and remove or adjust words which do not really meet your conditions. (There could be tricky edge cases, like what if the word ends with &)

9th Jul 2023, 5:19 AM

Tibor Santa

+ 1

Tibor Santa maybe we can do something like this: This is the word if: - if there is a character (or several characters) followed by a letter(s), - and if there is a letter(s) followed by another character(s). If there is a character(s) (other than numbers) without a single letter, then it is not a word. But I don't know how to code it..

9th Jul 2023, 1:32 PM

PR PRGR

+ 1

let anytext = "I love coding"; let arr = Array.from(anytext); console.log(arr); // Hope this helps

9th Jul 2023, 2:13 PM

Aradhna Pandey

+ 1

With indexing and slicing by python. See string = 'this is good' list = [] list.insert(0, string[0:4]) list.append(string[5:7]) list.append(string[8:12]) And now you have a list but if you want it more automatic, just learn more with split() function.

9th Jul 2023, 3:46 PM

Mariano BONOUGBO

+ 1

Tibor Santa Thanks a lot!!! That's what I need. Thank you very much for help!

11th Jul 2023, 4:23 PM

PR PRGR

Hhh

9th Jul 2023, 6:48 AM

Deepali Shukla

@Mariano Thank you, but this way is only suitable for one particular string. And we don’t know what string the user will enter, so we need to make a more universal code (in js).

9th Jul 2023, 3:58 PM

PR PRGR

@Aradhna Thank you, but unfortunately, this does not fit the task.

9th Jul 2023, 4:02 PM

PR PRGR