Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • D DeepPavlov
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 18
    • Issues 18
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 22
    • Merge requests 22
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • DeepPavlov
  • DeepPavlov
  • Merge requests
  • !962

feat: Imdb sentiment dataset reader

  • Review changes

  • Download
  • Email patches
  • Plain diff
Merged Andrei Glinskii requested to merge github/fork/Huawei-MRC-OSI/pr-imdb-clean into dev Aug 08, 2019
  • Overview 31
  • Commits 8
  • Pipelines 0
  • Changes 5

Created by: sgrechanik-h

This PR implements a dataset reader for the IMDb sentiment classification dataset. It also includes a json configuration for BERT (en, cased) which is mostly the same as the configuration for rusentiment except for the max seq length and batch size (which I set to values such that I don't get out-of-memory on my hardware).

This PR also includes a fix for the sets_accuracy metric which should now correctly work for string labels (i.e. wrap them into sets instead converting them to sets). Also I added reporting of cached files in download_decompress.

Assignee
Assign to
Reviewers
Request review from
Time tracking
Source branch: github/fork/Huawei-MRC-OSI/pr-imdb-clean