pyFTS.data package

Module contents

Module for pyFTS standard datasets facilities

Submodules

pyFTS.data.common module

pyFTS.data.common.get_dataframe(filename: str, url: str, sep: str = ';', compression: str = 'infer') pandas.core.frame.DataFrame[source]

This method check if filename already exists, read the file and return its data. If the file don’t already exists, it will be downloaded and decompressed.

Parameters
  • filename – dataset local filename

  • url – dataset internet URL

  • sep – CSV field separator

  • compression – type of compression

Returns

Pandas dataset

Datasets

Artificial and synthetic data generators

Facilities to generate synthetic stochastic processes

class pyFTS.data.artificial.SignalEmulator(**kwargs)[source]

Bases: object

Emulate a complex signal built from several additive and non-additive components

blip(**kwargs)[source]

Creates an outlier greater than the maximum or lower then the minimum previous values of the signal, and insert it on a random location of the signal.

Returns

the current SignalEmulator instance, for method chaining

components

Components of the signal

incremental_gaussian(mu: float, sigma: float, **kwargs)[source]

Creates an additive gaussian interference on a previous signal

Parameters
  • mu – increment on mean

  • sigma – increment on variance

  • start – lag index to start this signal, the default value is 0

  • it – Number of iterations, the default value is 1

  • length – Number of samples generated on each iteration, the default value is 100

  • vmin – Lower bound value of generated data, the default value is None

  • vmax – Upper bound value of generated data, the default value is None

Returns

the current SignalEmulator instance, for method chaining

periodic_gaussian(type: str, period: int, mu_min: float, sigma_min: float, mu_max: float, sigma_max: float, **kwargs)[source]

Creates an additive periodic gaussian interference on a previous signal

Parameters
  • type – ‘linear’ or ‘sinoidal’

  • period – the period of recurrence

  • mu – increment on mean

  • sigma – increment on variance

  • start – lag index to start this signal, the default value is 0

  • it – Number of iterations, the default value is 1

  • length – Number of samples generated on each iteration, the default value is 100

  • vmin – Lower bound value of generated data, the default value is None

  • vmax – Upper bound value of generated data, the default value is None

Returns

the current SignalEmulator instance, for method chaining

run()[source]

Render the signal

Returns

a list of float values

stationary_gaussian(mu: float, sigma: float, **kwargs)[source]

Creates a continuous Gaussian signal with mean mu and variance sigma.

Parameters
  • mu – mean

  • sigma – variance

  • additive – If False it cancels the previous signal and start this one, if True this signal is added to the previous one

  • start – lag index to start this signal, the default value is 0

  • it – Number of iterations, the default value is 1

  • length – Number of samples generated on each iteration, the default value is 100

  • vmin – Lower bound value of generated data, the default value is None

  • vmax – Upper bound value of generated data, the default value is None

Returns

the current SignalEmulator instance, for method chaining

pyFTS.data.artificial.generate_gaussian_linear(mu_ini, sigma_ini, mu_inc, sigma_inc, it=100, num=10, vmin=None, vmax=None)[source]

Generate data sampled from Gaussian distribution, with constant or linear changing parameters

Parameters
  • mu_ini – Initial mean

  • sigma_ini – Initial variance

  • mu_inc – Mean increment after ‘num’ samples

  • sigma_inc – Variance increment after ‘num’ samples

  • it – Number of iterations

  • num – Number of samples generated on each iteration

  • vmin – Lower bound value of generated data

  • vmax – Upper bound value of generated data

Returns

A list of it*num float values

pyFTS.data.artificial.generate_linear_periodic_gaussian(period, mu_min, sigma_min, mu_max, sigma_max, it=100, num=10, vmin=None, vmax=None)[source]

Generates a periodic linear variation on mean and variance

Parameters
  • period – the period of recurrence

  • mu_min – initial (and minimum) mean of each period

  • sigma_min – initial (and minimum) variance of each period

  • mu_max – final (and maximum) mean of each period

  • sigma_max – final (and maximum) variance of each period

  • it – Number of iterations

  • num – Number of samples generated on each iteration

  • vmin – Lower bound value of generated data

  • vmax – Upper bound value of generated data

Returns

A list of it*num float values

pyFTS.data.artificial.generate_sinoidal_periodic_gaussian(period, mu_min, sigma_min, mu_max, sigma_max, it=100, num=10, vmin=None, vmax=None)[source]

Generates a periodic sinoidal variation on mean and variance

Parameters
  • period – the period of recurrence

  • mu_min – initial (and minimum) mean of each period

  • sigma_min – initial (and minimum) variance of each period

  • mu_max – final (and maximum) mean of each period

  • sigma_max – final (and maximum) variance of each period

  • it – Number of iterations

  • num – Number of samples generated on each iteration

  • vmin – Lower bound value of generated data

  • vmax – Upper bound value of generated data

Returns

A list of it*num float values

pyFTS.data.artificial.generate_uniform_linear(min_ini, max_ini, min_inc, max_inc, it=100, num=10, vmin=None, vmax=None)[source]

Generate data sampled from Uniform distribution, with constant or linear changing bounds

Parameters
  • mu_ini – Initial mean

  • sigma_ini – Initial variance

  • mu_inc – Mean increment after ‘num’ samples

  • sigma_inc – Variance increment after ‘num’ samples

  • it – Number of iterations

  • num – Number of samples generated on each iteration

  • vmin – Lower bound value of generated data

  • vmax – Upper bound value of generated data

Returns

A list of it*num float values

pyFTS.data.artificial.random_walk(n=500, type='gaussian')[source]

Simple random walk

Parameters
  • n – number of samples

  • type – ‘gaussian’ or ‘uniform’

Returns

pyFTS.data.artificial.white_noise(n=500)[source]

Simple Gaussian noise signal :param n: number of samples :return:

AirPassengers dataset

Monthly totals of a airline passengers from USA, from January 1949 through December 1960.

Source: Hyndman, R.J., Time Series Data Library, http://www-personal.buseco.monash.edu.au/~hyndman/TSDL/.

pyFTS.data.AirPassengers.get_data() numpy.ndarray[source]

Get a simple univariate time series data.

Returns

numpy array

pyFTS.data.AirPassengers.get_dataframe() pandas.core.frame.DataFrame[source]

Get the complete multivariate time series data.

Returns

Pandas DataFrame

Bitcoin dataset

Bitcoin to USD quotations

Daily averaged index, by business day, from 2010 to 2018.

Source: https://finance.yahoo.com/quote/BTC-USD?p=BTC-USD

pyFTS.data.Bitcoin.get_data(field: str = 'AVG') numpy.ndarray[source]

Get the univariate time series data.

Parameters

field – dataset field to load

Returns

numpy array

pyFTS.data.Bitcoin.get_dataframe() pandas.core.frame.DataFrame[source]

Get the complete multivariate time series data.

Returns

Pandas DataFrame

DowJones dataset

DJI - Dow Jones

Daily averaged index, by business day, from 1985 to 2017.

Source: https://finance.yahoo.com/quote/%5EGSPC/history?p=%5EGSPC

pyFTS.data.DowJones.get_data(field: str = 'AVG') numpy.ndarray[source]

Get the univariate time series data.

Parameters

field – dataset field to load

Returns

numpy array

pyFTS.data.DowJones.get_dataframe() pandas.core.frame.DataFrame[source]

Get the complete multivariate time series data.

Returns

Pandas DataFrame

Enrollments dataset

Yearly University of Alabama enrollments from 1971 to 1992.

pyFTS.data.Enrollments.get_data() numpy.ndarray[source]

Get a simple univariate time series data.

Returns

numpy array

pyFTS.data.Enrollments.get_dataframe() pandas.core.frame.DataFrame[source]

Ethereum dataset

Ethereum to USD quotations

Daily averaged index, by business day, from 2016 to 2018.

Source: https://finance.yahoo.com/quote/ETH-USD?p=ETH-USD

pyFTS.data.Ethereum.get_data(field: str = 'AVG') numpy.ndarray[source]

Get the univariate time series data.

Parameters

field – dataset field to load

Returns

numpy array

pyFTS.data.Ethereum.get_dataframe() pandas.core.frame.DataFrame[source]

Get the complete multivariate time series data.

Returns

Pandas DataFrame

EUR-GBP dataset

FOREX market EUR-GBP pair.

Daily averaged quotations, by business day, from 2016 to 2018.

pyFTS.data.EURGBP.get_data(field: str = 'avg') numpy.ndarray[source]

Get the univariate time series data.

Parameters

field – dataset field to load

Returns

numpy array

pyFTS.data.EURGBP.get_dataframe() pandas.core.frame.DataFrame[source]

Get the complete multivariate time series data.

Returns

Pandas DataFrame

EUR-USD dataset

FOREX market EUR-USD pair.

Daily averaged quotations, by business day, from 2016 to 2018.

pyFTS.data.EURUSD.get_data(field: str = 'avg') numpy.ndarray[source]

Get the univariate time series data.

Parameters

field – dataset field to load

Returns

numpy array

pyFTS.data.EURUSD.get_dataframe() pandas.core.frame.DataFrame[source]

Get the complete multivariate time series data.

Returns

Pandas DataFrame

GBP-USD dataset

FOREX market GBP-USD pair.

Daily averaged quotations, by business day, from 2016 to 2018.

pyFTS.data.GBPUSD.get_data(field: str = 'avg') numpy.ndarray[source]

Get the univariate time series data.

Parameters

field – dataset field to load

Returns

numpy array

pyFTS.data.GBPUSD.get_dataframe() pandas.core.frame.DataFrame[source]

Get the complete multivariate time series data.

Returns

Pandas DataFrame

INMET dataset

INMET - Instituto Nacional Meteorologia / Brasil

Belo Horizonte station, from 2000-01-01 to 31/12/2012

Source: http://www.inmet.gov.br

pyFTS.data.INMET.get_dataframe() pandas.core.frame.DataFrame[source]

Get the complete multivariate time series data.

Returns

Pandas DataFrame

Malaysia dataset

Hourly Malaysia eletric load and tempeature

pyFTS.data.Malaysia.get_data(field: str = 'load') numpy.ndarray[source]

Get the univariate time series data.

Parameters

field – dataset field to load

Returns

numpy array

pyFTS.data.Malaysia.get_dataframe() pandas.core.frame.DataFrame[source]

Get the complete multivariate time series data.

Returns

Pandas DataFrame

NASDAQ module

National Association of Securities Dealers Automated Quotations - Composite Index (NASDAQ IXIC)

Daily averaged index by business day, from 2000 to 2016.

Source: http://www.nasdaq.com/aspx/flashquotes.aspx?symbol=IXIC&selected=IXIC

pyFTS.data.NASDAQ.get_data(field: str = 'avg') numpy.ndarray[source]

Get a simple univariate time series data.

Parameters

field – the dataset field name to extract

Returns

numpy array

pyFTS.data.NASDAQ.get_dataframe() pandas.core.frame.DataFrame[source]

Get the complete multivariate time series data.

Returns

Pandas DataFrame

SONDA dataset

SONDA - Sistema de Organização Nacional de Dados Ambientais, from INPE - Instituto Nacional de Pesquisas Espaciais, Brasil.

Brasilia station

Source: http://sonda.ccst.inpe.br/

pyFTS.data.SONDA.get_data(field: str) numpy.ndarray[source]

Get a simple univariate time series data.

Parameters

field – the dataset field name to extract

Returns

numpy array

pyFTS.data.SONDA.get_dataframe() pandas.core.frame.DataFrame[source]

Get the complete multivariate time series data.

Returns

Pandas DataFrame

S&P 500 dataset

S&P500 - Standard & Poor’s 500

Daily averaged index, by business day, from 1950 to 2017.

Source: https://finance.yahoo.com/quote/%5EGSPC/history?p=%5EGSPC

pyFTS.data.SP500.get_data() numpy.ndarray[source]

Get the univariate time series data.

Returns

numpy array

pyFTS.data.SP500.get_dataframe() pandas.core.frame.DataFrame[source]

Get the complete multivariate time series data.

Returns

Pandas DataFrame

TAIEX dataset

The Taiwan Stock Exchange Capitalization Weighted Stock Index (TAIEX)

Daily averaged index by business day, from 1995 to 2014.

Source: http://www.twse.com.tw/en/products/indices/Index_Series.php

pyFTS.data.TAIEX.get_data() numpy.ndarray[source]

Get the univariate time series data.

Returns

numpy array

pyFTS.data.TAIEX.get_dataframe() pandas.core.frame.DataFrame[source]

Get the complete multivariate time series data.

Returns

Pandas DataFrame

Henon chaotic time series

  1. Hénon. “A two-dimensional mapping with a strange attractor”. Commun. Math. Phys. 50, 69-77 (1976)

dx/dt = a + by(t-1) - x(t-1)^2 dy/dt = x

pyFTS.data.henon.get_data(var: str, a: float = 1.4, b: float = 0.3, initial_values: list = [1, 1], iterations: int = 1000) pandas.core.frame.DataFrame[source]

Get a simple univariate time series data.

Parameters

var – the dataset field name to extract

Returns

numpy array

pyFTS.data.henon.get_dataframe(a: float = 1.4, b: float = 0.3, initial_values: list = [1, 1], iterations: int = 1000) pandas.core.frame.DataFrame[source]

Return a dataframe with the bivariate Henon Map time series (x, y).

Parameters
  • a – Equation coefficient

  • b – Equation coefficient

  • initial_values – numpy array with the initial values of x and y. Default: [1, 1]

  • iterations – number of iterations. Default: 1000

Returns

Panda dataframe with the x and y values

Logistic_map chaotic time series

May, Robert M. (1976). “Simple mathematical models with very complicated dynamics”. Nature. 261 (5560): 459–467. doi:10.1038/261459a0.

x(t) = r * x(t-1) * (1 - x(t -1) )

pyFTS.data.logistic_map.get_data(r: float = 4, initial_value: float = 0.3, iterations: int = 100) list[source]

Return a list with the logistic map chaotic time series.

Parameters
  • r – Equation coefficient

  • initial_value – Initial value of x. Default: 0.3

  • iterations – number of iterations. Default: 100

Returns

Lorentz chaotic time series

Lorenz, Edward Norton (1963). “Deterministic nonperiodic flow”. Journal of the Atmospheric Sciences. 20 (2): 130–141. https://doi.org/10.1175/1520-0469(1963)020<0130:DNF>2.0.CO;2

dx/dt = a(y -x) dy/dt = x(b - z) - y dz/dt = xy - cz

pyFTS.data.lorentz.get_data(var: str, a: float = 10.0, b: float = 28.0, c: float = 2.6666666666666665, dt: float = 0.01, initial_values: list = [0.1, 0, 0], iterations: int = 1000) pandas.core.frame.DataFrame[source]

Get a simple univariate time series data.

Parameters

var – the dataset field name to extract

Returns

numpy array

pyFTS.data.lorentz.get_dataframe(a: float = 10.0, b: float = 28.0, c: float = 2.6666666666666665, dt: float = 0.01, initial_values: list = [0.1, 0, 0], iterations: int = 1000) pandas.core.frame.DataFrame[source]

Return a dataframe with the multivariate Lorenz Map time series (x, y, z).

Parameters
  • a – Equation coefficient. Default value: 10

  • b – Equation coefficient. Default value: 28

  • c – Equation coefficient. Default value: 8.0/3.0

  • dt – Time differential for continuous time integration. Default value: 0.01

  • initial_values – numpy array with the initial values of x,y and z. Default: [0.1, 0, 0]

  • iterations – number of iterations. Default: 1000

Returns

Panda dataframe with the x, y and z values

Mackey-Glass chaotic time series

Mackey, M. C. and Glass, L. (1977). Oscillation and chaos in physiological control systems. Science, 197(4300):287-289.

dy/dt = -by(t)+ cy(t - tau) / 1+y(t-tau)^10

pyFTS.data.mackey_glass.get_data(b: float = 0.1, c: float = 0.2, tau: float = 17, initial_values: numpy.ndarray = array([0.5, 0.55882353, 0.61764706, 0.67647059, 0.73529412, 0.79411765, 0.85294118, 0.91176471, 0.97058824, 1.02941176, 1.08823529, 1.14705882, 1.20588235, 1.26470588, 1.32352941, 1.38235294, 1.44117647, 1.5]), iterations: int = 1000) list[source]

Return a list with the Mackey-Glass chaotic time series.

Parameters
  • b – Equation coefficient

  • c – Equation coefficient

  • tau – Lag parameter, default: 17

  • initial_values – numpy array with the initial values of y. Default: np.linspace(0.5,1.5,18)

  • iterations – number of iterations. Default: 1000

Returns

Rossler chaotic time series

    1. Rössler, Phys. Lett. 57A, 397 (1976).

dx/dt = -z - y dy/dt = x + ay dz/dt = b + z( x - c )

pyFTS.data.rossler.get_data(var: str, a: float = 0.2, b: float = 0.2, c: float = 5.7, dt: float = 0.01, initial_values: numpy.ndarray = [0.001, 0.001, 0.001], iterations: int = 5000) numpy.ndarray[source]

Get a simple univariate time series data.

Parameters

var – the dataset field name to extract

Returns

numpy array

pyFTS.data.rossler.get_dataframe(a: float = 0.2, b: float = 0.2, c: float = 5.7, dt: float = 0.01, initial_values: numpy.ndarray = [0.001, 0.001, 0.001], iterations: int = 5000) pandas.core.frame.DataFrame[source]

Return a dataframe with the multivariate Rössler Map time series (x, y, z).

Parameters
  • a – Equation coefficient. Default value: 0.2

  • b – Equation coefficient. Default value: 0.2

  • c – Equation coefficient. Default value: 5.7

  • dt – Time differential for continuous time integration. Default value: 0.01

  • initial_values – numpy array with the initial values of x,y and z. Default: [0.001, 0.001, 0.001]

  • iterations – number of iterations. Default: 5000

Returns

Panda dataframe with the x, y and z values

Sunspots dataset

Monthly sunspot numbers from 1749 to May 2016

Source: https://www.esrl.noaa.gov/psd/gcos_wgsp/Timeseries/SUNSPOT/

pyFTS.data.sunspots.get_data() numpy.ndarray[source]

Get a simple univariate time series data.

Returns

numpy array

pyFTS.data.sunspots.get_dataframe() pandas.core.frame.DataFrame[source]

Get the complete multivariate time series data.

Returns

Pandas DataFrame