+ 1

How to extract "charan" text from below xml file using python(minidom)?

how to extract contents between text tags(</text:p>). Please anyone help me. <office:body> <office:spreadsheet> <table:table table:name="Sheet1" table:style-name="ta1"> <table:table-column table:style-name="co1" table:default-cell-style-name="Default"/> <table:table-row table:style-name="ro1"> <table:table-cell calcext:value-type="string" office:value-type="string"> <text:p>charan</text:p> </table:table-cell> </table:table-row> </table:table> <table:named-expressions/> </office:spreadsheet> </office:body> https://code.sololearn.com/cKy0QbW5n71G/?ref=app

6th Feb 2019, 3:01 AM
Raj Charan
4 Réponses
+ 2
Hi Raj Okay cool, I think I made some progress understanding your data structure. This xml seems to be only the "guts" of a complete file, as the xml headers are missing. Based on the tag names it looks like some open office document (oasis). Its specification is here: http://www.datypic.com/sc/odf/ss.html To understand how a full xml document should look like, check this: https://www.w3schools.com/xml/xml_namespaces.asp Now I think the solution could be, If you dont have the headers with the xmlns reference, that we need to trick python's minidom parser and add some headers to the data which must be processed. I will continue when I have some more time :)
7th Feb 2019, 5:42 AM
Tibor Santa
Tibor Santa - avatar
+ 1
Hi I tried to replicate your scenario by using parseString on the xml you pasted, and I got an expat error. Seems to be caused by missing namespace references. Any xml tag that has : (colon) in it, should have it namespace declared before. Is this the complete xml? Or do you have any headers that contain xmlns? Ultimately it would be much easier to parse the file with regular expressions instead of using minidom.
6th Feb 2019, 2:43 PM
Tibor Santa
Tibor Santa - avatar
7th Feb 2019, 2:48 AM
Raj Charan
0
Tibor Santa thank you :) and even I'll try
7th Feb 2019, 8:54 AM
Raj Charan