+ 1
How to extract "charan" text from below xml file using python(minidom)?
how to extract contents between text tags(</text:p>). Please anyone help me. <office:body> <office:spreadsheet> <table:table table:name="Sheet1" table:style-name="ta1"> <table:table-column table:style-name="co1" table:default-cell-style-name="Default"/> <table:table-row table:style-name="ro1"> <table:table-cell calcext:value-type="string" office:value-type="string"> <text:p>charan</text:p> </table:table-cell> </table:table-row> </table:table> <table:named-expressions/> </office:spreadsheet> </office:body> https://code.sololearn.com/cKy0QbW5n71G/?ref=app
4 Answers
+ 2
Hi Raj
Okay cool, I think I made some progress understanding your data structure.
This xml seems to be only the "guts" of a complete file, as the xml headers are missing.
Based on the tag names it looks like some open office document (oasis). Its specification is here:
http://www.datypic.com/sc/odf/ss.html
To understand how a full xml document should look like, check this:
https://www.w3schools.com/xml/xml_namespaces.asp
Now I think the solution could be, If you dont have the headers with the xmlns reference, that we need to trick python's minidom parser and add some headers to the data which must be processed.
I will continue when I have some more time :)
+ 1
Hi
I tried to replicate your scenario by using parseString on the xml you pasted, and I got an expat error. Seems to be caused by missing namespace references. Any xml tag that has : (colon) in it, should have it namespace declared before.
Is this the complete xml? Or do you have any headers that contain xmlns?
Ultimately it would be much easier to parse the file with regular expressions instead of using minidom.
0
Tibor Santa
Please find the attachment
https://code.sololearn.com/Wmd6OTQ4F8FZ/?ref=app
0
Tibor Santa thank you :) and even I'll try