http-stream-xml¶
Parse XML in HTTP response on the fly, by chunks.
It’s essential if you want only beginning of huge document.
For example if you deal with NCBI PubMed biomedical articles corpus with Entrez API. The Enrez API tends to return very big documents (megabytes). And even if you need just some headers you have to download whole document just to parse it.
The http-stream-xml library helps you to partially download response and parse them.
It does not matter if the server use HTTP protocol chunks.
Installation¶
pip install http-stream-xml --upgrade
Usage sample¶
Implement custom loader¶
Step by step instruction how to implement custom XML loader on Entrez gene DB example.
Use Entrez PubMed loader¶
Entrez gene streaming with http-stream-xml
Use entrez class to receives data from NCBI PubMed biomedical articles corpus with downloading only small part of Entrez response, just to extract some summary data.
So you do not have to download whole huge Entrez answer to get just basic gene description.