http-stream-xml

Parse XML in HTTP response on the fly, by chunks.

It’s essential if you want only beginning of huge document.

For example if you deal with NCBI PubMed biomedical articles corpus with Entrez API. The Enrez API tends to return very big documents (megabytes). And even if you need just some headers you have to download whole document just to parse it.

The http-stream-xml library helps you to partially download response and parse them.

It does not matter if the server use HTTP protocol chunks.

Installation

pip install http-stream-xml --upgrade

Usage sample

Implement custom loader

XML streaming chunks load

Step by step instruction how to implement custom XML loader on Entrez gene DB example.

Use Entrez PubMed loader

Entrez gene streaming with http-stream-xml

Use entrez class to receives data from NCBI PubMed biomedical articles corpus with downloading only small part of Entrez response, just to extract some summary data.

So you do not have to download whole huge Entrez answer to get just basic gene description.

API

Source code

GitHub

Indices and tables