+ 1

Text Parsing

ArticlesDataset.txtfilecontains all the meta data information of documents.unigramCountcontains all unique words and their number of occurrencesfor eachdocument. There are 1500 publications recorded in the txt file. Find total frequency of all the unigramsused in all publications and print top 10 frequentwords in these documents.Here is an example entry for a document: {"creator":["Romain Allais","Julie Gobert"],"datePublished":"2018-05-30","docType":"article","doi":"10.1051\/mattech\/2018010","id":"ark:\/\/27927\/phz10hn2bh3","isPartOf":"Mat\u00e9riaux & Techniques","issueNumber":"5-6","language":["eng"],"outputFormat":["unigram","bigram","trigram"],"pageCount":7,"pagination":"pp. null-null","provider":"portico","publicationYear":2018,"publisher":"EDP Sciences","sequence":3.0,"tdmCategory":["Applied sciences -Engineering"],"title":"Environmental assessment of PSS","url":"http:\/\/doi.org\/10.1051\/mattech\/2018010","volumeNumber":"105","wordCount":4446,"unigramCount":{"others":1,"air":1,"networks,":1,"conventional":1,"IEEE":1}} My purpose is to pull out the unigram counts for each document and store them in a suitable array. How can I do it by using fstream library?

22nd Jan 2022, 9:41 PM
titan
1 Odpowiedź
+ 3
It looks like JSON, so I'd try using a JSON parsing library. I only used one, so that's the only one I can recommend: https://github.com/nlohmann/json
22nd Jan 2022, 10:02 PM
inxanedev!
inxanedev! - avatar