Created by: dimakarp1996
Added 8bit optimizers. These optimizers, as my experiments have shown, yield ~9-11% memory economy on training. The code is absolutely backward-compatible.
Created by: dimakarp1996
Added 8bit optimizers. These optimizers, as my experiments have shown, yield ~9-11% memory economy on training. The code is absolutely backward-compatible.