http-stream-xml

Parse XML in HTTP response on the fly, by chunks.

It’s essential if you want only beginning of huge document.

For example if you deal with NCBI PubMed biomedical articles corpus with Entrez API. The Enrez API tends to return very big documents (megabytes). And even if you need just some headers you have to download whole document just to parse it.

The http-stream-xml library helps you to partially download response and parse them.

It does not matter if the server use HTTP protocol chunks.

Installation

pip install http-stream-xml --upgrade

Usage sample

Receives data from NCBI PubMed biomedical articles corpus with Entrez API.

The code downloads only small part of Entrez response, just to extract some summary data. So you do not have to download whole huge Entrez answer to get just basic gene description.

python -m http_stream_xml.entrez

API

Source code

GitHub

Indices and tables