Created by: my-master
Major changes in core:
- Rename DatasetBasicIterator -> DatasetLearningIterator
- Add class DatasetFittingIterator
- Add fit_batches() method to Estimator
- Add fit_batches() to train.py
- Make DataReader optional in JSON config in train_model_from_config()
- Rename method batch_generator() -> gen_batch() in DatasetLearningIterator (former DatasetBasicIterator)
- Rename method iter_all() -> get_instances() in DatasetLearningIterator (former DatasetBasicIterator)
Changes in the core are made due to lack of functionality for fitting on batches, for parallel fitting on batches (should be solved later) and for non-availability of the iterator during the fitting process. Vocabs and iterators, as they are, are not useful for large datasets. There are a few errors in the project architecture that should be addressed in the nearest future.
Other changes:
- Edit all project code to work with new core changes. Basically, these are minor changes.
- Add ODQA model classes (models.vectorizers.HashingTfidfVectorizer, models.odqa.TfidfRanker)
Tests passed except for the last test for Ranking model (not enough memory on my GPU).