DevHeads.net

PyXML package - deprecate it?

Hi all,
looks like PyXML package is deprecated since python itself provides xml
mechanisms.
When you look deeper,
python's xml provides:
"dom", "parsers", "sax", "etree"
and PyXML provides:
'dom', 'marshal', 'parsers', 'sax', 'schema', 'utils', 'xpath', 'xslt'

So, PyXML duplicates dom, parsers and sax (and looks like python's is in
better shape). Is any package using marshall, schema or any other not in
python itself?

Deprecate PyXML or just remove duplicated parts?

RR

Comments

Re: PyXML package - deprecate it?

By Marian Ganisin at 03/23/2012 - 05:45

On Tue, Feb 21, 2012 at 06:48:11PM +0100, Roman Rakus wrote:
PyXML is not maintained by upstream for many years it should not be used
hence. Distribution specific PyXML-0.8.4-python2.6.patch included in
srpm is warning of ongoing issues.

Anyway it provides many features which are missing in python stdlib as
far as I know (for example xpath or magnificent HTML-to-DOM reader).

There are alternatives outside of stdlib described here:
<a href="http://wiki.python.org/moin/PythonXml" title="http://wiki.python.org/moin/PythonXml">http://wiki.python.org/moin/PythonXml</a>

(lxml is my personal favorite, however it is not compatible with PyXML and
it isn't pure python).

I believe PyXML should be kept unchanged (my personal code rely on that
as well) but deprecated and its users should be strongly encouraged to
switch to some alternative if stdlib doesn't satisfy their requirements.

Re: PyXML package - deprecate it?

By Toshio Kuratomi at 02/23/2012 - 11:54

On Tue, Feb 21, 2012 at 06:48:11PM +0100, Roman Rakus wrote:
Looking at the sourceforge page, the authors of PyXML are heavily involved
in python core development so it's likely that they worked to merge the
useful bits of PyXML into the stdlib before they abandoned it:
<a href="http://sourceforge.net/projects/pyxml/" title="http://sourceforge.net/projects/pyxml/">http://sourceforge.net/projects/pyxml/</a>

However, it also looks like PyXML is a collection of works that the
sourceforge authors didn't necessarily originate. With that in mind, they
may not have been able to get permission of the various original authors to
merge a particular module into the python stdlib. However, there may exist
independent upstream versions of those modules that would be better to ship
than shipping PyXML in those cases.

The best way to proceed is likely similar to how I looked at whether it
would be okay to retire python-sqlite2: Take all the packages that depend on
PyXML and grep through their sources to find where they use the PyXML
modules (rpm -ql PyXML shows that everything in PyXML is in an "_xmlplus"
python package so you'll see things like "import _xmlplus.dom" and "from
_xmlplus import dom". grepping for _xmlplus will probably work). In some
cass, you'll likely find this was an old dep and the source no longer uses
it. Others may be conditionalized:

try:
from xml import sax
except ImportError:
from _xmlplus import sax

Testing these packages to know that they behave properly is good but at
least you can be confident that the upstream code is intended to work with
the stdlib modules instead of the PyXML modules.

If you find any code that's using _xmlplus unconditionally, you'll have to
write patches to use the stdlib or separate modules. Then test your changes
and send the patches upstream. Given the upstream note that PyXML is
deprecated and the authors do not intend for people to use it, this category
would hopefully be very small. But you won't know until you look.

-Toshio

Re: PyXML package - deprecate it?

By Matej Cepl at 02/23/2012 - 13:00

On 23.2.2012 16:54, Toshio Kuratomi wrote:
Completely agree with what you were writing, just to note that IIRC
PyXML was the first Python XML library, so I would believe plenty of
projects use it just because of conservatism/laziness/fear of change.

Matěj

Re: PyXML package - deprecate it?

By Roman Rakus at 02/23/2012 - 12:19

On 02/23/2012 04:54 PM, Toshio Kuratomi wrote:
Anyway, you're right that better is to ask upstream how different are
stdlib and _xmlplus.

RR

Re: PyXML package - deprecate it?

By Toshio Kuratomi at 02/23/2012 - 13:09

On Thu, Feb 23, 2012 at 05:19:01PM +0100, Roman Rakus wrote:
This is untrue, but many of the changes won't affect what the code does:
"if value in dict" vs "if dict.has_key(value)" type stuff and work arounds
for older python releases. There are a few changes which might affect the
output of the code but someone would need to analyze it more to know for
sure.

The method used also makes it harder for people to simply grep the sources
and tell if code really depends on PyXML or not as an import of the stdlib's
xml module may be getting _xmlplus instead of the stdlib code.

-Toshio

Re: PyXML package - deprecate it?

By Toshio Kuratomi at 07/20/2012 - 19:28

On Thu, Feb 23, 2012 at 9:09 AM, Toshio Kuratomi <a. ... at gmail dot com> wrote:
Did you get anywhere with this? I've just found that the latest
version of docutils doesn't run if pyxml is installed (due to the
stdlib replacing its own implementations with pyxml's implementation
if pyxml is installed.) For now, in rawhide, I've added a Conflicts
on PyXml but that's not going to work into the future (not the least
because you can't install packages that (perhaps bogusly) require
pyxml at the same time as docutils.)

I can see several ways forward --

* Deprecate pyxml as you were thinking of doing
* Locally patch the python stdlib to not replace its implementation of
xml with pyxml's. I don't know if upstream python will take that as
python-2.7 is in maintainance mode but they may as it's somewhat
agreed that this importing of pyxml is not kosher in upstream (and has
been removed in python-3.x)
* I can file a bug and we can figure out how to fix PyXML so that
docutils works when it's installed. (We may have to do this anyway as
there's a chance I may need to push a docutils update to older Fedora
for some bugfixes.)

Thanks,
-Toshio

Re: PyXML package - deprecate it?

By Toshio Kuratomi at 07/23/2012 - 14:05

On Fri, Jul 20, 2012 at 4:28 PM, Toshio Kuratomi <a. ... at gmail dot com> wrote:
Some followup. I've made a page for removing pyxml:

<a href="https://fedoraproject.org/wiki/User:Toshio/Remove_PyXML" title="https://fedoraproject.org/wiki/User:Toshio/Remove_PyXML">https://fedoraproject.org/wiki/User:Toshio/Remove_PyXML</a>

dmalcolm added the dep tree. I remember that fedora-business-card is
a false dep from when this came up in February. I'll remove the dep
shortly. Since SOAPpy affects so much stuff I had a look. It seems
like the dep is false there as well. The README says that PyXML is
needed but inspection of the code and the ChangeLog, ReleaseNotes, and
the upstream scm point to that being a documentation bug; PyXML
requirement was supposedly removed in 2003.

-Toshio

Re: PyXML package - deprecate it?

By Toshio Kuratomi at 07/24/2012 - 18:16

Just finished analyzing all of the packages that claim a PyXML dependency:

<a href="https://fedoraproject.org/wiki/User:Toshio/Remove_PyXML#Dep_analysis" title="https://fedoraproject.org/wiki/User:Toshio/Remove_PyXML#Dep_analysis">https://fedoraproject.org/wiki/User:Toshio/Remove_PyXML#Dep_analysis</a>

There seem to be a hefty number of packages where we can remove the
dependency (the Easy fixes section). I've opened bugs for those in
case the maintainers know something about the PyXML dep that I'm
unaware of.

Due to the fact that PyXML overwrites the python stdlib xml module the
possibility does exist that I've missed some feature that PyXML adds
to some portion of the stdlib xml module. As an example of what level
of replacement that can occur: Both modules have sax.handler. The sax
handlers can both be configured by setting sax.handler.feature_*
attributes. However, PyXML has feature_namespace_prefixes while the
stdlib module does not. If you run across one of these in course of
trying to remove the PyXML dependency, please let me know, update the
wiki page, add to the bug report what's blocking things, etc. I can
try to check for these things across all of the PyXML-using packages
if I'm made aware that they exist.

The Require Coding section is more problematic. I'll list them all here:

* bkchem: uses xpath. If this is all, it's probably doable to write a
patch that uses lxml's xpath instead. Since it's a plugin, it's also
possible to stop shipping that plugin.
* libopensync-plugin-google-calendar: uses xpath. Writing a patch
shoul dbe as doable as bkchem
* python-ZSI: I think we could fix with a package update. Upstream
has a 2.1alpha1 release that Debian and Gentoo both ship that would
fix this for us.
* subscription-manager looks like it just needs a port to a different
library (the code isn't even xml related.. it's date parsing)
* spacewalk-backend: both uses of PyXML-only code were in test cases.
One looks like it could be ported to some other library (it's for
converting between character encodings). The other one is for writing
xml using a SAX api. I haven't looked at that one too hard yet.
* openxcap makes use of a non-stdlib feature of the sax reader. I
haven't looked into this one too hard either but I suspect there's not
just a drop-in replacement for this. Some sort of custom code will
have to be written to handle what they're trying to do.
* comoonics* -- these packages make heavy use of the PyXML-only API.
I think someone will need to spend a while getting to know the code
and porting it to use a different library. We could also drop the
comoonics packages until upstream ports.

rrakus, dmalcolm, others -- how do you want to proceed? I can open
bugs for the remaining packages and we can plow forward on this but
some of the packages needing code changes may not make it for F18 and
have to be blocked. We can also pursue one of the alternatives (for
instance, making the python2 stdlib not replace itself with PyXML and
patching those package swhich can't be ported in time to import
_xmlplus instead of import xml.

-Toshio

Unsubscribe remove delete

By Otto Rey at 07/23/2012 - 14:18

Re: PyXML package - deprecate it?

By Matej Cepl at 02/22/2012 - 06:11

On 21.2.2012 18:48, Roman Rakus wrote:
What packages require PyXML? Could they be rebuilt just with xml tools
in stock Python (I think so)? Did you try?

Matěj

Re: PyXML package - deprecate it?

By Roman Rakus at 02/22/2012 - 06:21

On 02/22/2012 11:11 AM, Matej Cepl wrote:
$ repoquery --whatrequires PyXML
SOAPpy-0:0.11.6-12.fc16.noarch
bkchem-0:0.14.0-3.pre2.fc15.noarch
comoonics-cdsl-py-0:0.2-18.noarch
comoonics-cluster-py-0:0.1-24.noarch
fedora-business-cards-0:0.2.4.3-2.fc15.noarch
grc-0:0.70-7.fc15.noarch
heartbeat-0:3.0.4-1.fc15.1.x86_64
inksmoto-0:0.7.0-5.fc15.noarch
libopensync-plugin-google-calendar-1:0.22-5.fc15.x86_64
openxcap-0:1.1.2-3.fc15.noarch
pida-0:0.5.1-13.fc15.x86_64
pypar2-0:1.4-7.fc15.noarch
python-MythTV-0:0.24.1-4.fc16.x86_64
python-MythTV-0:0.24.2-1.fc16.x86_64
python-ZSI-0:2.0-9.fc15.noarch
python-nova-0:2011.3-4.fc16.noarch
python-nova-0:2011.3.1-2.fc16.noarch
python-webdav-library-0:0.3.0-1.fc16.noarch
salt-0:0.9.6-2.fc16.noarch
spacewalk-backend-tools-0:1.4.39-1.fc16.noarch
subscription-manager-0:0.99.4-1.fc16.x86_64
synce-sync-engine-0:0.15.1-1.fc16.x86_64
xen-0:4.1.1-8.fc16.x86_64
xen-0:4.1.2-6.fc16.x86_64
zeroinstall-injector-0:1.2-1.fc16.noarch

RR

Re: PyXML package - deprecate it?

By Matej Cepl at 02/26/2012 - 20:03

On 22.2.2012 11:21, Roman Rakus wrote:
I wonder why this is on the list: if I am not mistaken, it doesn't use
anything else than xml.dom.minidom (in generate.py), which was already
present in python 2.4 (the oldest Python currently living in Fedora/EPEL
universe). And yes PyXML is hard-coded in Requires: (maybe this is just
one more example why hard coded Requires are evil).

Putting the maintainer on Cc: to ask for the reasons why this package
wants PyXML at all.

Best,

Matěj

Re: PyXML package - deprecate it?

By Ian Weller at 02/27/2012 - 01:33

On Mon, Feb 27, 2012 at 01:03:32AM +0100, Matej Cepl wrote:
I'll remove the dependency sometime this week.

Re: PyXML package - deprecate it?

By Stanislav Ochotnicky at 02/24/2012 - 08:32

Quoting Roman Rakus (2012-02-22 11:21:38)
Fixed in rawhide/F17. Thanks for pointing it out