Linear Programming in data science

Optimization modeling :: Cost Minimization

Image for post
Image for post
Photo by Antoine Dautry on Unsplash

Linear programming is widely used for optimization and applications can be found almost in every industry operating under conflicting constraints. We will here work with a simple and quite common use case of cost optimization problem. The problem can be formulated as a standard linear optimization problem with the objective function is to minimize the transportation cost, subject to supply & demand with equality and inequality constraints.

Let us create some synthetic data. For easy understanding and computational ease, the relevant information is in tabular format as shown below:


Mathematical Optimization for Portfolio Management

Image for post
Image for post
Photo by Jeswin Thomas on Unsplash

Optimization method has a wide application in the industry in many diverse fields such as machine learning, finance, aviation & logistics etc. to name a few. Once we zeroed down on the problem statement, the next step is to solve the problem with the best available options.

To simplify, the idea is to find the best available solution which is at least as good and any other possible solution. If we want to quantify and express the problem in mathematics, we need to come with an objective of solving the problem which is the objective function in mathematics. …


Predicting Trading Signals of Stock Markets

Technical analysis to build signals

Image for post
Image for post
Image by author

Profitability of stock market trading is directly related to the prediction of trading signals. Here, we will discuss about some basic to advanced and popular technical analysis to build trading signals. Our focus will be on signal generation and visualization. A long list of technical indicators are available covering principal domains such as trend, momentum, volume, volatility, and support and resistance. We will cover a few of these here.

However, once signal is generated, strategy is defined, the next most important task is performance testing which is not the scope of this article.

We will use free crypto currency data…


Brent Crude Oil Futures price movements prediction

ML to generate buy/sell signals

Image for post
Image for post
Image by author

Prediction and classification are important and of great interest because successful prediction of stock prices lead to attractive benefits. However, there is no universal common set of rules but a series of highly complicated and quite difficult tasks are involved for such prediction.

Here, we will show a simple use case to showcase how classification rule can be applied to obtain a trading strategy and conclude with a performance testing of the strategy by running a simple script.

Let us load the data from Quandl.

BC = BC.loc['2010-01-01':,]
BC.sort_index(ascending=True, inplace=True)
BC.tail()


Test harness is important to evaluate trading strategy

Designing a financial trading strategy

Image for post
Image for post
Image by author

Backtesting is an important step to get the statistics to ensure effective trading strategy. It comes with some of the key points such as profit and loss, net profit and loss, invested capital, number of trades/orders return, sharpe ratio etc. Here, we will discuss as how to design a financial trading strategy using open source Python tools and we’ll review the results of the backtest by going through some plots generated by pyfolio.

Let us load the data as shown below.

dataset = web.DataReader('^IXIC', data_source = 'yahoo', start = '2000-01-01') dataset = dataset.sort_index(ascending=True) # display print(dataset.head()); print(dataset.shape) # Plot the…


short term and long term dynamics model

Relational econometric model for time-series data

Image for post
Image for post
Image by author

Error correction model (ECM)is important in time-series analysis to better understand long-run dynamics. ECM can be derived from auto-regressive distributed lag model as long as there is a cointegration relationship between variables. In that context, each equation in the vector auto regressive (VAR) model is an autoregressive distributed lag model; therefore, it can be considered that the vector error correction model (VECM) is a VAR model with cointegration constraints.

Cointegration relations built into the specification so that it restricts the long-run behavior of the endogenous variables to converge to their cointegrating relationships while allowing for short-run adjustment dynamics. …


Effective dimensionality reduction techniques

How to improve learning performance and increase computational efficiency

Image for post
Image for post
Image by author

Feature selection method is a data pre-processing step in conjunction with machine learning for classification or regression purposes. The main motivation for reducing the dimensionality of the data and keeping the number of features as low as possible is to reduce the training time and enhance the classification accuracy of the algorithms we use; moreover, reduced dimensions provide a more robust generalization and a faster response with unseen data. Unlike feature extraction, feature selection does not alter the data.

There are three main groups of feature selection in general: (1) wrapper, (2) embedded and (3) filter methods. Each group has…


Time series Auto Regression and Error correction models

Vector auto regression, Volatility, Granger causality & Error Correction

Image for post
Image for post
Image by author

Vector auto regression (VAR) to first difference generally creates integrated time-series (TS) models. But we may eliminate valuable information about the relationship among variables by differencing, where Vector Error Correction model (VECM) is applicable.

Granger causality:

VAR involves multiple exog variables which are important to predict future state of endog variable. Using Granger causality (GC) we can determine the importance of multiple variables and GC is only relevant with TS variables. We will use VAR to investigate GC here.

Here our use case is that, we have data of Western Texas Intermediate, Brent Crude oil and HenryHub Spot price and we shall…


Machine learning & classification algorithms

Classification algorithms to determine application outcome

Image for post
Image for post
Image by author

Loans in terms of financial pay outs is an important aspect of banking business system. Several loan applications are scanned based on certain inputs to validate the eligibility for loan. Here our use-case is that, we want to automate the loan eligibility process (real time) based on customer detail obtained during loan application. This will lead to improved service and customer satisfaction.

Let us load the available data to check the information it contain.

Loading training data:


Stochastic time-series modeling

Time-Series modeling using Natural Gas data

Image for post
Image for post
Image by author

ARIMA stands for auto regressive integrated moving averages and popular for time-series prediction. We have a univariate daily time series data and our use case here is to forecast future time steps using the univariate data. The time series is stochastic/ random walk price series. Here, we will discuss basic time series analysis and concepts of stationary or non-stationary time series, and how we can model financial data displaying such behavior.

We will introduce and implement advanced mathematical approaches Autoregressive (AR), Moving Average (MA), Differentiation (D), AutoCorrelation Function (ACF), and Partial Autocorrelation Function (PACF) for dealing with non-stationary time series…

Sarit Maitra

Data Science Practice Lead at KSG Analytics Pvt. Ltd.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store