Dataset module

Dataset module provides an interface for accessing the datasets and sequences.

It also provides a set of utility functions for downloading and extracting datasets.

class vot.dataset.BasedSequence(name: str, loader: callable, metadata: dict = None)

This class implements the caching of the sequence data.

The sequence data is loaded only when it is needed.

channel(channel: str = None) Channel

Returns the channel with the specified name. If the channel name is not specified, the default channel is returned.

Parameters

channel (str, optional) – Channel name. Defaults to None.

Returns

Channel

Return type

Channel

channels() List[str]

Returns a list of channel names in the sequence.

Returns

List of channel names

Return type

List[str]

frame(index)

Returns the frame with the specified index in the sequence as a Frame object.

Parameters

index (int) – Frame index

Returns

Frame

Return type

Frame

groundtruth(index=None)

Returns the groundtruth object. If the index is specified, the object is returned as a Region object. If the sequence contains more than one object, an exception is raised. If more objects are present, this method ignores special objects.

Parameters

index (int, optional) – Frame index. Defaults to None.

Returns

Groundtruth region

Return type

Region

property height

Returns the sequence height.

metadata(name: str = None, default=None)

Returns the metadata value with the specified name.

Parameters
  • name (str, optional, if None, returns the entire metadata dictionary) – Metadata name

  • default (object, optional) – Default value. Defaults to None.

Returns

Metadata value

Return type

object

object(oid, index=None) Region

Returns the object with the specified id. If the index is specified, the object is returned as a Region object.

Parameters
  • oid (str) – Object id

  • index (int, optional) – Frame index. Defaults to None.

Returns

Object region

Return type

Region

objects() List[str]

Returns a list of object ids in the sequence.

Returns

List of object ids

Return type

List[str]

property size

Returns the sequence size as a tuple (width, height)

Returns

Sequence size

Return type

tuple

tags(index: int = None) List[str]

Returns a list of tags in the sequence. If the index is specified, only the tags that are present in the frame with the specified index are returned.

Parameters

index (int, optional) – Frame index. Defaults to None.

Returns

List of tags

Return type

List[str]

values(index: int = None) List[float]

Returns a list of values in the sequence. If the index is specified, only the values that are present in the frame with the specified index are returned.

Parameters

index (int, optional) – Frame index. Defaults to None.

Returns

List of values

Return type

List[float]

property width

Returns the sequence width.

class vot.dataset.Channel

Abstract representation of individual image channel, a sequence of images with uniform dimensions.

abstract filename(index: int) str

Returns filename for the given index of the channel sequence.

Parameters

index (int) – Index of the frame

Returns

Filename of the frame

Return type

str

abstract frame(index: int) Frame

Returns frame object for the given index.

Parameters

index (int) – Index of the frame

Returns

Frame object

Return type

Frame

abstract property size: int

Returns the size of the channel in bytes.

class vot.dataset.Dataset(sequences: Mapping[str, Sequence])

Base abstract class for a tracking dataset, a list of image sequences addressable by their names and interatable.

keys() List[str]

Returns a list of unique sequence names.

Returns

List of sequence names

Return type

List[str]

list() List[str]

Returns a list of unique sequence names.

Returns

List of sequence names

Return type

List[str]

exception vot.dataset.DatasetException

Dataset and sequence related exceptions.

class vot.dataset.Frame(sequence, index)

Frame object represents a single frame in the sequence.

It provides access to the image data, groundtruth, tags and values as a wrapper around the sequence object.

channel(channel: Optional[str] = None)

Returns the channel object for the given channel name.

Parameters

channel (Optional[str], optional) – Name of the channel. Defaults to None.

channels()

Returns the list of channels in the sequence.

Returns

List of channels

Return type

List[str]

filename(channel: Optional[str] = None)

Returns the filename for the given channel name and frame index.

Parameters

channel (Optional[str], optional) – Name of the channel. Defaults to None.

Returns

Filename of the frame

Return type

str

groundtruth() Region

Returns the groundtruth region for the frame.

Returns

Groundtruth region

Return type

Region

Raises

DatasetException – If groundtruth is not available

image(channel: Optional[str] = None) ndarray

Returns the image for the given channel name and frame index.

Parameters

channel (Optional[str], optional) – Name of the channel. Defaults to None.

Returns

Image object

Return type

np.ndarray

property index: int

Returns the index of the frame.

Returns

Index of the frame

Return type

int

object(id: str) Region

Returns the object region for the given object id and frame index.

Parameters

id (str) – Id of the object

Returns

Object region

Return type

Region

objects() List[str]

Returns the list of objects in the frame.

Returns

List of object ids

Return type

List[str]

property sequence: Sequence

Returns the sequence object of the frame object.

Returns

Sequence object

Return type

Sequence

tags() List[str]

Returns the tags for the frame.

Returns

List of tags

Return type

List[str]

values() Mapping[str, float]

Returns the values for the frame.

Returns

Mapping of values

Return type

Mapping[str, float]

class vot.dataset.FrameList

Abstract base for all sequences, just a list of frame objects.

frame(index: int) Frame

Returns the frame at the specified index.

Parameters

index (int) – Frame index

class vot.dataset.InMemoryChannel

In-memory channel represents a sequence of images with uniform dimensions.

It is used to represent a sequence of images in memory.

append(image)

Appends an image to the channel.

Parameters

image (np.ndarray) – Image object

filename(index)

Thwows an exception as the sequence is available in memory and not in files.

frame(index)

Returns the frame object for the given index in the sequence channel.

Parameters

index (int) – Index of the frame

Returns

Frame object

Return type

Frame

property size

Returns the size of the channel in the format (width, height)

Returns

Size of the channel

Return type

Tuple[int, int]

class vot.dataset.InMemorySequence(name, channels)

An in-memory sequence that can be used to construct a sequence programmatically and store it do disk. Used mainly for testing and debugging.

Only single object sequences are supported at the moment.

append(images: dict, region: Region, tags: list = None, values: dict = None)

Appends a new frame to the sequence. The frame is specified by a dictionary of images, a region and optional tags and values.

Parameters
  • images (dict) – Dictionary of images

  • region (Region) – Region

  • tags (list, optional) – List of tags

  • values (dict, optional) – Dictionary of values

channel(channel: str) Channel

Returns the specified channel object.

Parameters

channel (str) – Channel name

Returns

Channel object

Return type

Channel

channels() List[str]

Returns a list of channel names.

Returns

List of channel names

Return type

List[str]

frame(index: int) Frame

Returns the specified frame. The frame is returned as a Frame object.

Parameters

index (int) – Frame index

Returns

Frame object

Return type

Frame

groundtruth(index: int = None) Region

Returns the groundtruth object. If the index is specified, the object is returned as a Region object. If the sequence contains more than one object, an exception is raised. If the index is not specified, the groundtruth object is returned as a Region object. If the sequence contains more than one object, an exception is raised.

Parameters

index (int, optional) – Frame index. Defaults to None.

Returns

Groundtruth object

Return type

Region

property height: int

Returns the sequence height.

Returns

Sequence height

Return type

int

metadata(name=None, default=None)

Returns the value of the specified metadata field. If the field does not exist, the default value is returned.

Parameters
  • name (str) – Name of the metadata field, if None, returns the entire metadata dictionary

  • default (object, optional) – Default value

Returns

Value of the metadata field

Return type

object

object(oid: str, index: int = None) Region

Returns the specified object. If the index is specified, the object is returned as a Region object. If the sequence contains more than one object, an exception is raised. If the index is not specified, the groundtruth object is returned as a Region object. If the sequence contains more than one object, an exception is raised.

Parameters
  • oid (str) – Object id

  • index (int, optional) – Frame index. Defaults to None.

Returns

Object

Return type

Region

objects(index: str = None) List[str]

Returns a list of object ids. If the index is specified, only the objects that are present in the frame with the specified index are returned.

Since only single object sequences are supported, the only object id that is returned is “object”.

Parameters

index (int, optional) – Frame index. Defaults to None.

property size: tuple

Returns the sequence size as a tuple (width, height)

Returns

Sequence size

Return type

tuple

tags(index=None)

Returns a list of tags in the sequence. If the index is specified, only the tags that are present in the frame with the specified index are returned.

Parameters

index (int, optional) – Frame index. Defaults to None.

Returns

List of tags

Return type

List[str]

values(index=None)

Returns a list of values in the sequence. If the index is specified, only the values that are present in the frame with the specified index are returned.

Parameters

index (int, optional) – Frame index. Defaults to None.

Returns

List of values

Return type

List[str]

property width: int

Returns the sequence width.

Returns

Sequence width

Return type

int

class vot.dataset.PatternFileListChannel(path, start=1, step=1, end=None, check_files=True)

Sequence channel implementation where each frame is stored in a file and all file names follow a specific pattern.

property base: str

Returns the base path of the sequence.

Returns

Base path

Return type

str

filename(index) str

Returns the filename of the frame at the specified index.

Parameters

index (int) – Frame index

Returns

Filename

Return type

str

frame(index: int) ndarray

Returns the frame at the specified index as a numpy array. The image is loaded using OpenCV and converted to RGB color space if necessary.

Parameters

index (int) – Frame index

Returns

Frame

Return type

np.ndarray

Raises

DatasetException – If the index is out of bounds

property height: int

Returns the height of the frames in the sequence.

Returns

Height of the frames

Return type

int

property pattern

Returns the pattern of the sequence.

Returns

Pattern

Return type

str

property size: tuple

Returns the size of the frames in the sequence as a tuple (width, height)

Returns

Size of the frames

Return type

tuple

property width: int

Returns the width of the frames in the sequence.

Returns

Width of the frames

Return type

int

class vot.dataset.Sequence(name: str)

A sequence is a list of frames (multiple channels) and a list of one or more annotated objects.

It also contains additional metadata and per-frame information, such as tags and values.

abstract channel(channel=None) Channel

Returns the channel with the specified name or the default channel if no name is specified.

Parameters

channel (str, optional) – Name of the channel

Returns

Channel

Return type

Channel

abstract channels() Set[str]

Returns the names of all channels in the sequence.

Returns

Names of all channels

Return type

set

describe()

Returns a dictionary with information about the sequence.

Returns

Dictionary with information

Return type

dict

abstract groundtruth(index: int) Region

Returns the ground truth region for the specified frame index or None if no ground truth is available for the frame or the frame index is out of bounds. This is a legacy method for compatibility with single-object datasets and should not be used in new code.

Parameters

index (int) – Frame index

Returns

Ground truth region

Return type

Region

abstract property height: int

Returns the height of the frames in the sequence in pixels.

Returns

Height of the frames

Return type

int

property identifier: str

Returns the identifier of the sequence. The identifier is a string that uniquely identifies the sequence in the dataset. The identifier is usually the same as the name, but may be different if the name is not unique.

Returns

Identifier

Return type

str

abstract metadata(name=None, default=None)

Returns the value of the specified metadata field. If the field does not exist, the default value is returned.

Parameters
  • name (str) – Name of the metadata field, if None, returns the entire metadata dictionary

  • default (object, optional) – Default value

Returns

Value of the metadata field

Return type

object

property name: str

Returns the name of the sequence.

Returns

Name

Return type

str

abstract object(oid, index=None)

Returns the object with the specified name or identifier. If the index is specified, the object is returned only if it is visible in the frame at the specified index.

Parameters
  • id (str) – Name or identifier of the object

  • index (int, optional) – Frame index

Returns

Object

Return type

Region

abstract objects() Set[str]

Returns the names of all objects in the sequence.

Returns

Names of all objects

Return type

set

property size: Tuple[int, int]

Returns the size of the frames in the sequence in pixels as a tuple (width, height)

Returns

Size of the frames

Return type

tuple

abstract tags(index=None) List[str]

Returns the tags for the specified frame index or None if no tags are available for the frame or the frame index is out of bounds.

Parameters

index (int, optional) – Frame index

Returns

List of tags

Return type

list

abstract values(index=None) Mapping[str, Number]

Returns the values for the specified frame index or None if no values are available for the frame or the frame index is out of bounds.

Parameters

index (int, optional) – Frame index

Returns

Dictionary of values

Return type

dict

abstract property width: int

Returns the width of the frames in the sequence in pixels.

Returns

Width of the frames

Return type

int

class vot.dataset.SequenceData(channels, objects, tags, values, length)
channels

Alias for field number 0

length

Alias for field number 4

objects

Alias for field number 1

tags

Alias for field number 2

values

Alias for field number 3

class vot.dataset.SequenceIterator(sequence: Sequence)

Sequence iterator provides an iterator interface for the sequence object.

vot.dataset.download_bundle(url: str, path: str = '.')

Downloads a dataset bundle as a ZIP file and decompresses it.

Parameters
  • url (str) – Source bundle URL

  • path (str, optional) – Destination directory. Defaults to “.”.

Raises

DatasetException – If the bundle cannot be downloaded or is not supported.

vot.dataset.download_dataset(url: str, path: str)

Downloads a dataset from a given url or an alias.

Parameters
  • url (str) – URL to the data bundle or metadata description file

  • path (str) – Destination directory

Raises

DatasetException – If the dataset is not found or a network error occured

vot.dataset.load_dataset(path: str) Dataset

Loads a dataset from a local directory.

Parameters

path (str) – The path to the local dataset data

Raises

DatasetException – When a folder does not exist or the format is not recognized.

Returns

Dataset object

Return type

Dataset

vot.dataset.load_sequence(path: str) Sequence

Loads a sequence from a given path (directory), tries to guess the format of the sequence.

Parameters

path (str) – The path to the local sequence data

Raises

DatasetException – If an loading error occures, unsupported format or other issues.

Returns

Sequence object

Return type

Sequence

vot.dataset.read_legacy_sequence(path: str) Sequence

Wrapper around the legacy sequence reader.

This module contains functionality for reading sequences from the storage using VOT compatible format.

vot.dataset.common.convert_int(value: str) int

Converts the given value to an integer. If the value is not a valid integer, None is returned.

Parameters

value (str) – The value to convert.

Returns

The converted value or None if the value is not a valid integer.

Return type

int

vot.dataset.common.download_dataset_meta(url: str, path: str) None

Downloads the metadata of a dataset from a given URL and stores it in the given path.

Parameters
  • url (str) – The URL to download the metadata from.

  • path (str) – The path to store the metadata in.

vot.dataset.common.list_sequences(path: str) None

Indexes the sequences in the given path. Only works if there is a list.txt file in the given path or the path is a list file.

Parameters

path (str) – The path to index sequences in.

vot.dataset.common.read_sequence(path)

Reads a sequence from the given path.

Parameters

path (str) – The path to read the sequence from.

Returns

The sequence read from the given path.

Return type

Sequence

vot.dataset.common.read_sequence_legacy(path)

Reads a sequence from the given path.

Parameters

path (str) – The path to read the sequence from.

Returns

The sequence read from the given path.

Return type

Sequence

vot.dataset.common.write_sequence(directory: str, sequence: Sequence)

Writes a sequence to a directory. The sequence is written as a set of images in a directory structure corresponding to the channel names. The sequence metadata is written to a file called sequence in the root directory.

Parameters
  • directory (str) – The directory to write the sequence to.

  • sequence (Sequence) – The sequence to write.

Extended dataset support

Many datasets are supported by the toolkit using special adapters.

### OTB

OTB dataset adapter module.

OTB is one of the earliest tracking benchmarks. It is a collection of 50/100 sequences with ground truth annotations. The dataset is available at http://cvlab.hanyang.ac.kr/tracker_benchmark/datasets.html.

vot.dataset.otb.download_otb100(path: str)

Downloads OTB100 dataset to the given path.

Parameters

path (str) – Path to the dataset folder.

vot.dataset.otb.download_otb50(path: str)

Downloads OTB50 dataset to the given path.

Parameters

path (str) – Path to the dataset folder.

vot.dataset.otb.read_sequence(path: str)

Reads a sequence from OTB dataset. The sequence is identified by the name of the folder and the groundtruth_rect.txt file is expected to be present in the folder.

Parameters

path (str) – Path to the sequence folder.

Returns

The sequence object.

Return type

Sequence

### GOT10k

GOT-10k dataset adapter module.

The format of GOT-10k dataset is very similar to a subset of VOT, so there is a lot of code duplication.

vot.dataset.got10k.load_channel(source)

Load channel from the given source.

Parameters

source (str) – Path to the source. If the source is a directory, it is assumed to be a pattern file list. If the source is a file, it is assumed to be a video file.

Returns

Channel object.

Return type

Channel

vot.dataset.got10k.read_sequence(path)

Read GOT-10k sequence from the given path.

Parameters

path (str) – Path to the sequence.

### TrackingNet

Dataset adapter for the TrackingNet dataset.

Note that the dataset is organized a different way than the VOT datasets, annotated frames are stored in a separate directory. The dataset also contains train and test splits. The loader assumes that only one of the splits is used at a time and that the path is given to this part of the dataset.

vot.dataset.trackingnet.list_sequences(path)

List sequences in the given path. The path is expected to be the root of the TrackingNet dataset split.

Parameters

path (str) – Path to the dataset root.

Returns

List of sequences.

Return type

list

vot.dataset.trackingnet.load_channel(source)

Load channel from the given source.

Parameters

source (str) – Path to the source. If the source is a directory, it is assumed to be a pattern file list. If the source is a file, it is assumed to be a video file.

Returns

Channel object.

Return type

Channel

vot.dataset.trackingnet.read_sequence(path)

Read sequence from the given path. Different to VOT datasets, the sequence is not a directory, but a file. From the file name the sequence name is extracted and the path to image frames is inferred based on standard TrackingNet directory structure.

Parameters

path (str) – Path to the sequence groundtruth.

Returns

Sequence object.

Return type

Sequence