Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • D DeepPavlov
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 18
    • Issues 18
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 22
    • Merge requests 22
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • DeepPavlov
  • DeepPavlov
  • Merge requests
  • !130

Odqa

  • Review changes

  • Download
  • Email patches
  • Plain diff
Closed Andrei Glinskii requested to merge odqa into dev Mar 21, 2018
  • Overview 17
  • Commits 44
  • Pipelines 0
  • Changes 37

Created by: my-master

Major changes in core:

  • Rename DatasetBasicIterator -> DatasetLearningIterator
  • Add class DatasetFittingIterator
  • Add fit_batches() method to Estimator
  • Add fit_batches() to train.py
  • Make DataReader optional in JSON config in train_model_from_config()
  • Rename method batch_generator() -> gen_batch() in DatasetLearningIterator (former DatasetBasicIterator)
  • Rename method iter_all() -> get_instances() in DatasetLearningIterator (former DatasetBasicIterator)

Changes in the core are made due to lack of functionality for fitting on batches, for parallel fitting on batches (should be solved later) and for non-availability of the iterator during the fitting process. Vocabs and iterators, as they are, are not useful for large datasets. There are a few errors in the project architecture that should be addressed in the nearest future.

Other changes:

  • Edit all project code to work with new core changes. Basically, these are minor changes.
  • Add ODQA model classes (models.vectorizers.HashingTfidfVectorizer, models.odqa.TfidfRanker)

Tests passed except for the last test for Ranking model (not enough memory on my GPU).

Assignee
Assign to
Reviewers
Request review from
Time tracking
Source branch: odqa