Search This Blog

31 January 2012

default value for text function using lxml

Say we need to parse this XML

<pack xmlns="">

if you want to parse it and retrieve the values in tuples

root = etree.fromstring(xml)
namespaces = {'i':""}
packitems_duration = root.xpath('//i:pack/i:packitem/i:duration/text()', 
packitems_max_count = root.xpath('//b:pack/i:packitem/i:max_count/text()',
packitems = zip(packitems_duration, packitems_max_count)

>>> packitems

The problem is the zip result miss a value. That's because lxml returns nothing instead of None or empty string. Let's change that.

def lxml_empty_str(context, nodes):
    for node in nodes:
        node.text = node.text or ""
    return nodes

ns = etree.FunctionNamespace('')
ns['lxml_empty_str'] = lxml_empty_str

namespaces = {'i':"",
              'f': ""}
packitems_duration = root.xpath('f:lxml_empty_str('//b:pack/i:packitem/i:duration)/text()',
    namespaces={'b':billing_ns, 'f' : ''})
packitems_max_count = root.xpath('f:lxml_empty_str('//b:pack/i:packitem/i:max_count)/text()',
    namespaces={'b':billing_ns, 'f' : ''})
packitems = zip(packitems_duration, packitems_max_count)

>>> packitems
[('520','14'), ('','23')]

more info on extending lxml