View on GitHub


Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

Exporting NLP-progress into a structure format

Parse and export the unstructured information from Markdown into a structured JSON format.


Requires Python 3.6+.

Create a virtualenv and install requirements (you can also use conda):

virtualenv -p python3 venv
source venv/bin/activate

pip install -r requirements.txt


From the NLP-progress root directly (where the LICENCE file is), run:

python structured/ <one or more directories or files>

For example, to export all the data in the english/ directory:

python structured/ english

By default the output will be written into structured.json, but you can override this with the --output parameter.