text mining and unsupervised machine learning

Text mining, Entity Recognition and LDA topic Modeling

Image by author

Text Analytic is quite useful and proven to extract relevant information and knowledge hidden in unstructured content. By applying effectively to a corpus, it helps to gather important insights from unstructured data e.g. patterns, trends and insight.

Here, I will experiment with news articles by focusing on named entities in news by using natural language toolkit (NLTK) which is quite useful in statistical natural language processing (NLP). Furthermore, Latent Dirichlet Allocation ( LDA) unsupervised algorithm will be used for modeling purpose. LDA is generative probabilistic topic modeling approach.

Let us collect data and explore automatic text processing to identify potential…


Algorithmic trading simplified

A case study with stochastic & price oscillator and back testing

Image by author

Price Oscillator is a technical indicator calculating the percentage difference between two price moving averages. It is basically another name for the MACD indicator. Crossovers of two moving averages correspond to crossovers of absolute price oscillator [APO (MACD)] and zero central signal line around it oscillates.

To understand price oscillator, we need to know the exponential moving averages (EMA) concept. In simple term, EMA is the average price over a certain number of days, with more recent days weighted more heavily i.e. exponentially. EMA needs to be calculated over a period of appropriate length to maximize meaningful data while minimizing…


Back-testing is a key component of algorithmic trading

Predictive Model & Back-testing with Natural Gas data

Image by Author

Back-testing which is in the form of performance testing is one of the most complicated tasks involved in an algorithmic trading research system. There are several things such as software latencies, network latencies, slippage, fees etc. are involved to build an algorithmic trading framework. The basic idea is to that is given historical data, what would be the performance of the trading strategy.

Here, we will explore as how we can use machine learning algorithm to predict future direction and define a strategy for trading. …


Linear Programming in data science

Optimization modeling :: Cost Minimization

Photo by Antoine Dautry on Unsplash

Linear programming is widely used for optimization and applications can be found almost in every industry operating under conflicting constraints. We will here work with a simple and quite common use case of cost optimization problem. The problem can be formulated as a standard linear optimization problem with the objective function is to minimize the transportation cost, subject to supply & demand with equality and inequality constraints.

Let us create some synthetic data. For easy understanding and computational ease, the relevant information is in tabular format as shown below:


Mathematical Optimization for Portfolio Management

Photo by Jeswin Thomas on Unsplash

Optimization method has a wide application in the industry in many diverse fields such as machine learning, finance, aviation & logistics etc. to name a few. Once we zeroed down on the problem statement, the next step is to solve the problem with the best available options.

To simplify, the idea is to find the best available solution which is at least as good and any other possible solution. If we want to quantify and express the problem in mathematics, we need to come with an objective of solving the problem which is the objective function in mathematics. …


Intuition based trading strategy

Fundamentals of signal generation using technical analysis

Profitability of stock market trading is directly related to the prediction of trading signals. Here, we will discuss about some basic to advanced and popular technical analysis to build trading signals. Our focus will be on signal generation and visualization. A long list of technical indicators are available covering principal domains such as trend, momentum, volume, volatility, and support and resistance. We will cover a few of these here.

However, once signal is generated, strategy is defined, the next most important task is performance testing which is not the scope of this article. However, it’s not only the strategy decides…


Brent Crude Oil Futures price movements prediction

Algorithmic trading strategy evaluation based on crude oil data set

Image by author

Prediction and classification are important and of great interest for the simple fact that successful prediction of stock prices lead to rewarding benefits. However, there is no universal common set of rules but a series of highly complicated and quite difficult tasks are involved for such prediction.

Here, we will show a simple use case to showcase how classification rule can be applied to obtain a trading strategy and conclude with a performance testing of the strategy by running a simple script.

Let us load the data from Quandl.

BC = BC.loc['2010-01-01':,]
BC.sort_index(ascending=True, inplace=True)
BC.tail()


Test harness is important to evaluate trading strategy

Designing a financial trading strategy

Image by author

Back-testing is an important step to get the statistics to ensure effective trading strategy. It comes with some of the key points such as profit and loss, net profit and loss, invested capital, number of trades/orders return, Sharpe ratio etc. Here, we will discuss as how to design a financial trading strategy using open source Python tools and we’ll review the results of the back-test by going through some plots generated by pyfolio.

Let us load the data as shown below.

Function to extract data:

def dataExtraction():
dataset = web.DataReader('^IXIC', data_source = 'yahoo', start = '2010-01-01')
dataset = dataset.sort_index(ascending=True)
# Plot…


short term and long term dynamics model

Relational econometric model for time-series data

Image by author

Error correction model (ECM)is important in time-series analysis to better understand long-run dynamics. ECM can be derived from auto-regressive distributed lag model as long as there is a cointegration relationship between variables. In that context, each equation in the vector auto regressive (VAR) model is an autoregressive distributed lag model; therefore, it can be considered that the vector error correction model (VECM) is a VAR model with cointegration constraints.

Cointegration relations built into the specification so that it restricts the long-run behavior of the endogenous variables to converge to their cointegrating relationships while allowing for short-run adjustment dynamics. …


Effective dimensionality reduction techniques

How to improve learning performance and increase computational efficiency

Image by author

Feature selection method is a data pre-processing step in conjunction with machine learning for classification or regression purposes. The main motivation for reducing the dimensionality of the data and keeping the number of features as low as possible is to reduce the training time and enhance the classification accuracy of the algorithms we use; moreover, reduced dimensions provide a more robust generalization and a faster response with unseen data. Unlike feature extraction, feature selection does not alter the data.

There are three main groups of feature selection in general: (1) wrapper, (2) embedded and (3) filter methods. Each group has…

Sarit Maitra

Data Science Practice Lead at KSG Analytics Pvt. Ltd.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store