Odqa (!130) · Merge requests · DeepPavlov / DeepPavlov

Closed Andrei Glinskii requested to merge odqa into dev Mar 21, 2018

Created by: my-master

Major changes in core:

Rename DatasetBasicIterator -> DatasetLearningIterator
Add class DatasetFittingIterator
Add fit_batches() method to Estimator
Add fit_batches() to train.py
Make DataReader optional in JSON config in train_model_from_config()
Rename method batch_generator() -> gen_batch() in DatasetLearningIterator (former DatasetBasicIterator)
Rename method iter_all() -> get_instances() in DatasetLearningIterator (former DatasetBasicIterator)

Changes in the core are made due to lack of functionality for fitting on batches, for parallel fitting on batches (should be solved later) and for non-availability of the iterator during the fitting process. Vocabs and iterators, as they are, are not useful for large datasets. There are a few errors in the project architecture that should be addressed in the nearest future.

Other changes:

Edit all project code to work with new core changes. Basically, these are minor changes.
Add ODQA model classes (models.vectorizers.HashingTfidfVectorizer, models.odqa.TfidfRanker)

Tests passed except for the last test for Ranking model (not enough memory on my GPU).