拿到的数据是 dwr.engine 接口提供的，请问怎么用 python 分析，或者解析到里面的 xml？

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

现在注册

已注册用户请登录

这是一个创建于 2939 天前的主题，其中的信息可能已经有所发展或是发生改变。

如题，从学校新闻接口抓到的数据，是在浏览器网络监听的 portalAjax.getNewsXml.dwr 这里看到的响应，用 python request post 方法调用的返回文本是：

//#DWR-INSERT //#DWR-REPLY dwr.engine._remoteHandleCallback('0','0',"\n<list><pagecount>3641</pagecount><item>\n\n<link></link>\n<description></description>\n<category></category>\n<pubdate>Thu, 17 Nov 2016 00:23:21 GMT</pubdate>\n<guid></guid>\n<dc:creator xmlns:dc="<a href=" http:="" <a="" href="<a href=" http:="" purl.org"="" rel="nofollow">http://purl.org" rel="nofollow"><a href="http://purl.org" rel="nofollow">purl.org</a>="" dc="" elements="" 1.1="" "="" rel="nofollow">http://purl.org/dc/elements/1.1/"></dc:creator>\n<dc:date xmlns:dc="<a href=" http:="" <a="" href="<a href=" http:="" purl.org"="" rel="nofollow">http://purl.org" rel="nofollow">purl.org="" dc="" elements="" 1.1="" "="" rel="nofollow">http://purl.org/dc/elements/1.1/">Thu, 17 Nov 2016 00:23:21 GMT</dc:date>\n<xwbh>147934233182128368</xwbh>\n<color>null</color>\n<spanpic>pic</spanpic>\n<lmmc></lmmc>\n<enclosure url="<a href=" http:="" <a="" href="<a href=" http:="" www.ynnu.edu.cn"="" rel="nofollow">http://www.ynnu.edu.cn" rel="nofollow">www.ynnu.edu.cn="" UserFiles="" Image="" 147934226721288544.png"="" rel="nofollow">http://www.ynnu.edu.cn/UserFiles/Image/147934226721288544.png" type="image/pjpeg"/>\n</item></list>");

请问如何使用 py 截取里面的 xml ，我试着用字符串寻找到 xml 头部和尾部，然后调用 xml.etree 分析，但初始化 xml 时报错：

xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1, column 784

如何解决？

5 条回复 • 2016-12-01 17:39:39 +08:00