Time series Auto Regression and Error correction models

Vector auto regression, Volatility, Granger causality & Error Correction

Image for post
Image for post
Image by author

Vector auto regression (VAR) to first difference generally creates integrated time-series (TS) models. But we may eliminate valuable information about the relationship among variables by differencing, where Vector Error Correction model (VECM) is applicable.

Granger causality:

VAR involves multiple exog variables which are important to predict future state of endog variable. Using Granger causality (GC) we can determine the importance of multiple variables and GC is only relevant with TS variables. We will use VAR to investigate GC here.

Here our use case is that, we have data of Western Texas Intermediate, Brent Crude oil and HenryHub Spot price and we shall forecast future 15 time steps of each. We shall use R program to solve this. …

Machine learning & classification algorithms

Classification algorithms to determine application outcome

Image for post
Image for post
Image by author

Loans in terms of financial pay outs is an important aspect of banking business system. Several loan applications are scanned based on certain inputs to validate the eligibility for loan. Here our use-case is that, we want to automate the loan eligibility process (real time) based on customer detail obtained during loan application. This will lead to improved service and customer satisfaction.

Let us load the available data to check the information it contain.

Loading training data:

Image for post
Image for post

Here that the dependent / target variable is the Loan_Status and we need to develop a model using the rest of the features to predict the target variable. …

Stochastic time-series modeling

Time-Series modeling using Natural Gas data

Image for post
Image for post
Image by Author

ARIMA stands for auto regressive integrated moving averages and popular for time-series prediction. We have a univariate daily time series data and our use case here is to forecast future time steps using the univariate data. The time series is stochastic/ random walk price series.

Let us load the check the data we have and resample to monthly frequency for the ease of computation.

print("....Data Loading...."); print();
print('\033[4mHenry Hub Natural Gas Price\033[0m');
data = web.DataReader('NNJ24.NYM', data_source = 'yahoo', start = '2000-01-01');
data.rename(columns={'Close': 'price'}, inplace=True);
df = data.resample('M').last(); df = DataFrame(df.price.copy());
Image for post
Image for post
window = 12
df['rolling_mean'] = df.price.rolling(window=window).mean();
df['rolling_std'] = df.price.rolling(window=window).std();
df.plot(title='Natural …

Regression & Classification to generate buy/sell signals

An example with Crude Oil daily data

Image for post
Image for post
Image by Author

Buy and sell signals can be generated by two moving averages — a long-period and a short-period average. When the short moving average rises or falls below the long moving average, buy or sell signals can be generated based on set parameters.

Here, with a simple example, we have shown as how to generate report on buy/sell signals and visualize the chart.

print("....Data Loading...."); print();
print('\033[4mCrude Oil Spot Price\033[0m');
data = web.DataReader('CL=F', data_source = 'yahoo', start = '2000-01-01');
Image for post
Image for post

Let’s focus on closing price of daily stock.

df = data[['Close']];
# Plot the closing price
df.Close.plot(figsize=(10, 5));
plt.ylabel("Prices (USD)"); plt.title("Crude …

Artificial Intelligence and Anomaly Detection

Henry Hub Spot price time series data & anomaly detection

Image for post
Image for post
Image by Author

Anomaly here to provide detect that actual results differ from predicted results in price prediction. As we are aware that, real-life data is streaming, time-series data etc., where anomalies give significant information in critical situations.

Here we will develop an anomaly detection model for Time Series data using neural network. Let us load Henry Hub Spot Price data from EIA.

print("....Data loading...."); print()
print('\033[4mHenry Hub Natural Gas Spot Price, Daily (Dollars per Million Btu)\033[0m')
def retrieve_time_series(api, series_ID):
series_search = api.data_by_series(series=series_ID)
spot_price = DataFrame(series_search)
return spot_price
def main():
api_key = "....API KEY..."
api = eia.API(api_key)
series_ID = 'xxxxxx'
spot_price = retrieve_time_series(api, series_ID)
return spot_price;
except Exception as e:
print("error", e)
return DataFrame(columns=None)
spot_price = main()
spot_price = spot_price.rename({'Henry Hub Natural Gas Spot Price, Daily (Dollars per Million Btu)': 'price'}, axis = 'columns')
spot_price = spot_price.reset_index()
spot_price['index'] = pd.to_datetime(spot_price['index'].str[:-3], format='%Y %m%d')
spot_price['Date']= pd.to_datetime(spot_price['index'])
spot_price.set_index('Date', inplace=True)
spot_price = spot_price.loc['2000-01-01':,['price']]
spot_price = spot_price.astype(float) …

Cryptocurrency and Neural Network

Neural Network & Time-series price prediction using hourly data

Image for post
Image for post
Image by Author

Note from Towards Data Science’s editors: While we allow independent authors to publish articles in accordance with our rules and guidelines, we do not endorse each author’s contribution. You should not rely on an author’s works without seeking professional advice. See our Reader Terms for details.

Prediction of stock price is quite a challenging because of highly volatile nature of time series combined with stochastic movement with non-linear . Here, the problem we have in hand is a price prediction issue and we’re trying to predict a numerical value defined in a range (from 9000 to 12500 approx). This problem fits the Regression Analysis framework. …

Predictive analytics and time series data

Simple steps to multi-step future prediction

Image for post
Image for post
Image by author

Vector Auto Regression (VAR) comes with an advantage in easy implementation. Every equation in the VAR has the same number of variables on the right-hand side, the coefficients {α1, α2, …, β11, β21, …, γ 11, γ 21, … } of the overall system can be easily estimated by applying (OLS) regression to each equation individually. We could estimate this model using the ordinary least squares (OLS) estimator computed separately from each equations. Since the OLS estimator has standard asymptotic properties, it is possible to test any linear restriction, either in one equation or across equations, with the standard t and F statistics. …

Machine Learning & Linear Regression

Predictive model using a machine learning algorithm

Image for post
Image for post
Image by author

Predictive modeling using machine learning comes with a trick to generalize new cases and not merely memorizing past cases. In order to achieve that, the ML algorithm must look through multiple rows of data, and different features which have significant correlations with target variable. In designing predictive modeling the key is to find a way to identify price trends without the uncertainty and bias of the our mental model. A successful approach could be linear regression. Stock’s price and time period determine the system parameters for linear regression.

Most of the online resources which are available, where we can find that, the prediction problem ends with validating on test set. Very few resources are available which clearly shows the actual prediction report with future dates and foretasted prices. …

Real-time experience of a practitioner

Analytics Project | not only what we do, but also how we do it

Image for post
Image for post
Photo by Jo Szczepanska on Unsplash

Analytics projects often comes with uncertainty and a high implementation risk and thus demand different approach in project management. There is more uncertainty around a typical analytics project comparatively. So it certainly takes some special skills to execute and deliver analytics project. Analytics projects fail to achieve desired outcome more frequently than we would like to admit. Enough time needs to be spent on understanding the exact business problem and then converting this business problem into an analytics problem that can be solved with data.

Most key stakeholders within an organization will have at least an elementary understanding of the Project Management life-cycle. They probably do not have much exposure to the typical analytics project life-cycle. Here I will cover some of the finer points of analytics project which gave me success. …

Performance with RandomForest Classifier

Machine Learning to Solve Multi-class Classification Problem

Image for post
Image for post
Image by author

Machine learning algorithms normally assume roughly similar classes in number of objects. However, in real-life scenario, the data distribution is mostly skewed and some of classes appear much more frequently than others. So, when facing such disproportions we must design an intelligent system that is able to overcome such a bias.

Here, we will work with a multi-class problem where data are taken from UCI ML library as shown below.

url = ("https://archive.ics.uci.edu/ml/machine-learning-"
df = pd.read_csv(url, header=None)
df.columns = ['Id', 'RI', 'Na', 'Mg', 'Al', 'Si','K', 'Ca', 'Ba', 'Fe', 'type']
df.set_index('Id', inplace=True)
print('Data loading:')
Image for post
Image for post

Here, we have different chemical compositions in the features and different type of glasses as multi-class. The problem presents chemical compositions of various types of glass with the objective of the problem is to determine the use for the glass. …


Sarit Maitra

Data Science Practice Lead at KSG Analytics Pvt. Ltd.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store