Created by: sgrechanik-h
This PR implements a dataset reader for the IMDb sentiment classification dataset. It also includes a json configuration for BERT (en, cased) which is mostly the same as the configuration for rusentiment except for the max seq length and batch size (which I set to values such that I don't get out-of-memory on my hardware).
This PR also includes a fix for the sets_accuracy
metric which should now correctly work for string labels (i.e. wrap them into sets instead converting them to sets). Also I added reporting of cached files in download_decompress
.