Continuum: A Data Loader for Continual Learning

Introduction

Continual Learning, Incremental Learning, Lifelong Learning, or Online Learning are similar research fields which aim at learning an ever-growing amount of knowledge while trying to forget the less as little as possible.

One implementation difficulty in those fields is how to create the stream of data that feed the new knowledge to an algorithm. Continuum aims to make it simple and to avoid problems with data loader for researchers. The goal is to not waste time anymore to reproduce the continual learning settings, and starts directly to work on the algorithm.

Continuum proposes different existing scenarios. Moreover, it is developed such as making it easy to create your personnal dream scenarios.

Here is a short presentation :

Installation

Continuum is available on PiPy plateforms, it can be installed with:

pip3 install continuum

Continuum project is also available here.

Organization

To create continual learning scenarios, Continuum decompose the data management into three levels of data structures: Datasets, Tasksets, Scenarios

  • Datasets: Datasets are the raw data that will be used to create tasks and scenarios.
  • Tasksets: The taskset contains the data specific to a task. The data are selected from the original dataset and eventually transformed.
  • Scenarios: A scenario is a sequence of tasks. It composes the curriculum of learning experience fed to the algorithms.

Example

from torch.utils.data import DataLoader

from continuum import ClassIncremental
from continuum.datasets import MNIST

# First we get a dataset that will be used to compose tasks and the continuum
dataset = MNIST("my/data/path", download=True, train=True)

# Then the dataset is provided to a scenario class that will process it to create the sequence of tasks
# Here, we create split mnist with 5 tasks of 2 classes
scenario = ClassIncremental(dataset, increment=2)

# The continuum can then enumerate the 5 tasks
for task_id, taskset in enumerate(scenario):
	# taskset can be used as a Pytorch Dataset to load the task data
	loader = DataLoader(taskset)

	for x, y, t in loader:
		# data, label, task index
		# train on the task here
		break

Main Supported Scenarios:

Continuum supports various types of scenarios, but mainly it can be for most scenarios of the continual learning literature.

  • Classes Incremental scenarios (similar to disjoint/new classes/split scenarios from the literature )
  • Transformation Incremental e.g. Permutation MNIST, Rotation MNIST
  • More scenarios in Continuum documentation

Supported Datasets:

Continuum supports all the basic datasets from pytorch.datasets (MNIST, CIFAR10, CIFAR100) as well as larger datasets such as ImageNet or CORe50. We provide also tools to create manually new datasets. For example, the fellowships class make possible to concatenate several datasets into one for specific scenarios. You can find a complete list of supported datasets here.

Conclusion

Continuum is an open -ource project which aims at simplifying data management for continual learning algorithms. It is developed such as being easily adaptable to specific needs. If you have an idea of new scenarios that should be added don’t hesitate to put an issue or a pull request to Continuum Github Repository.

Continuum is made to save you time, reduce code size in your project, and avoid you dev problems! We hope you will enjoy it :)

Arthur Douillard, PhD Student @ Sorbonne + Research Scientist @ Heuritech Timothée LESORT, Postdoctoral Researcher @ MILA