Available courses

Business Analytics is the practice of iterative, methodical exploration of an organization's data, with an emphasis on statistical analysis. Business Analytics is used by companies committed to data-driven decision-making. It is about using your data to derive information, insights, knowledge, and recommendations. Businesses use business analytics to improve effectiveness and efficiency of their solutions.

In this module, I will talk about how analytics has progressed from simple descriptive analytics to being predictive and prescriptive. I will also talk about multiple examples to understand these better, and discuss various industry use cases. I will also introduce multiple components of big data analysis including data mining, machine learning, web mining, natural language processing, social network analysis, and visualization in this module. Lastly, I will provide some tips for learners of data science to succeed in learning and applying data science successfully for their projects.

Python and R are the two most popular programming languages for data scientists as of now. Python is an interpreted high-level programming language for general-purpose programming. Created by Guido van Rossum and first released in 1991, Python has a design philosophy that emphasizes code readability, notably using significant whitespace. Python is open source, has awesome community support, is easy to learn, good for quick scripting as well as coding for actual deployments, good for web coding too.

In this module, I will start with basics of the Python language. We will do both theory as well as hands-on exercises intermixed. I will use Jupyter notebooks while doing hands-on. I will also discuss in detail topics like control flow, input output, data structures, functions, regular expressions and object orientation in Python. Closer to data science, I will discuss about popular Python libraries like NumPy, Pandas, SciPy, Matplotlib, Scikit-Learn and NLTK.

While Python has been used by many programmers even before they were introduced to data science, R has its main focus on statistics, data analysis, and graphical models. R is meant mainly for data science. Just like Python, R has also has very good community support. Python is good for beginners, R is good for experienced data scientists. R provides the most comprehensive statistical analysis packages.

In this module, I will again talk about both theory as well as hands-on about various aspects of R. I will use the R Studio for hands-on. I will discuss basic programming aspects of R as well as visualization using R. Then, I will talk about how to use R for exploratory data analysis, for data wrangling, and for building models on labeled data. Overall, I will cover whatever you need to do good data science using R.

Probability and Statistics helps in understanding whether data is meaningful, including inference, testing, and other methods for analyzing patterns in data and using them to predict, understand, and improve results.

We live in an uncertain and complex world, yet we continually have to make decisions in the present with uncertain future outcomes. To study, or not to study? To invest, or not to invest? To marry, or not to marry? This is what is captured mathematically using the notion of probability. Statistics on the other hand, helps us analyze data sets, and correctly interpret results to make solid, evidence-based decisions.

In this module, I will discuss some very fundamental terms/concepts related to probability and statistics that often come across any literature related to Machine Learning and AI. Key topics include quantifying uncertainty with probability, descriptive statistics, point and interval estimation of means, central limit theorem, and the basics of hypothesis testing.

Machine learning is the science of getting computers to act without being explicitly programmed. In the past decade, machine learning has given us self-driving cars, practical speech recognition, effective web search, and a vastly improved understanding of the human genome. Machine Learning is a first-class ticket to the most exciting careers in data science. As data sources proliferate along with the computing power to process them, automated predictions have become much more accurate and dependable. Machine learning brings together computer science and statistics to harness that predictive power. It’s a must-have skill for all aspiring data analysts and data scientists, or anyone else who wants to wrestle all that raw data into refined trends and predictions.

In this module, broadly I will talk about supervised as well as unsupervised learning. We will talk about multiple types of classifiers like Naïve Bayes, KNN, decision trees, SVMs, artificial neural networks, logistic regression, and ensemble learning. Further, we will also talk about linear regression analysis, sequence labeling using HMMs. As part of unsupervised learning, I will discuss clustering as well as dimensionality reduction. Finally, we will also discuss briefly about semi-supervised learning, mult-task learning, architecting ML solutions, and a few ML case studies.

Project 1: Learning various classifiers on Iris dataset

Project 2: MLP for hand-written digit recognition

Project 3: Logistic regression on the titanic dataset

Project 4: Use CoNLL 2002 data to build a NER system

Data mining is the process of sorting through large data set to identify patterns and establish relationships to solve problems through data analysis. Data mining tools allow enterprises to predict future trends.

Text Analytics, also known as text mining, is the process of examining large collections of written resources to generate new information, and to transform the unstructured text into structured data for use in further analysis. Text mining identifies facts, relationships and assertions that would otherwise remain buried in the mass of textual big data.  These facts are extracted and turned into structured data, for analysis, visualization (e.g. via html tables, mind maps, charts), integration with structured data in databases or warehouses, and further refinement using machine learning (ML) systems.

Web mining is the process of using data mining techniques and algorithms to extract information directly from the Web by extracting it from Web documents and services, Web content, hyperlinks and server logs. The goal of Web mining is to look for patterns in Web data by collecting and analyzing information in order to gain insight into trends, the industry and users in general.

Data scientist is the sexiest job of the 21st century. When performing data science, a lot of time is spent in collecting useful data and pre-processing it. If the collected data is of bad quality, it can lead to bad quality models. Hence, it is very important to understand how to collect good quality data. Also, it is important to understand various ways in which data can be collected.

In this module I will discuss different aspects of data collection. I will begin with discussions around decisions to make while doing data collection, data collection rules and approaches, and ways of performing data collection. Further, data can be collected from the web by scraping. Hence, we will learn how to perform basic scraping. Lastly, we will discuss briefly about collecting graph data as well data collection using IoT sensors.

Deep learning has caught a great momentum in the last few years. Research in the field of deep learning is progressing amazingly fast. Deep Learning is a rapidly growing area of machine learning. Machine learning has seen numerous successes but applying learning algorithms today often means spending a long time hand-engineering the input feature representation. This is true for many problems in vision, audio, NLP, robotics, and other areas. To address this, researchers have developed deep learning algorithms that automatically learn a good representation for the input. These algorithms are today enabling many groups to achieve ground-breaking results in vision, speech, language, robotics, and other areas.

I already discuss the basics of artificial neural networks in the machine learning module. Further, in this module, I will focus on other popular deep learning architectures like Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs) and Long Short Term Memory (LSTMs) Networks.

For any good data science story, it is very important to visualize it nicely. Visualizations help us understand data and insights much better.

I cover basics of visualization in R and Python in those respective modules. In this module, I will talk about innovative ways of visualizing complex and large data.

Customers of a big international bank decided to leave the bank. The bank is investigating a very high rate of customer leaving the bank. The dataset contains 10000 records, and we use it to investigate and predict which of the customers are more likely to leave the bank soon. The approach here is supervised classification; the classification model to be built on historical data and then used to predict the classes for the current customers to identify the churn. The dataset contains 13 features, and also the label column (Exited or not). The best accuracy was obtained with the Naïve Bayes model (83.29%). Such churn prediction models could be very useful for applications such as churn prediction in Telecom sector to identify the customers who are switching from current network, and also for Churn prediction in subscription services.

Investigation of open data from internet-based expressions and opinions could yield fascinating outcomes and bits of knowledge into the universe of popular feelings about any item, administration or identity. The blast of Web 2.0 has prompted expanded action in Podcasting, Blogging, Tagging, Contributing to RSS, Social Bookmarking, and Social Networking. Subsequently there has been a sudden increase of enthusiasm for individuals to mine these tremendous assets of information for suppositions. Sentiment analysis or Opinion Mining is mining of sentiment polarities from online social media. In this project we will talk about a procedure which permits use and understanding of twitter information for sentiment analysis. We perform several steps of text pre-processing, and then experiment with multiple classification mechanisms. Using a dataset of 50000 tweets and TFIDF features, we comparison the accuracy obtained using various classifiers for this task. We find that linear SVMs provide us the best accuracy results among the various classifiers tried. Sentiment analysis classifier could be useful for many applications like market analysis of different features of a new product or public opinion for a new movie or speech by a political candidate.

Around 285 million people globally suffer from impaired vision, of which 70% are avoidable with early detection. Diabetic Retinopathy (DR) and Glaucoma are two rapidly increasing causes of blindness, and account for 10% of cases involving vision loss. Such diseases can be easily diagnosed using color retinal fundus images. But such diagnosis is challenging due to lack of skilled ophthalmologists and trained device operators, and often the quality of images taken is not satisfactory. In this project, we use a dataset of 500 fundus images to learn a classification model which can accurately accurate a fundus image as DR versus not. We use various deep learning techniques for this classification task and achieve an accuracy of 76%. Such a solution can surely be used to quickly identify high probability DR cases from a large collection of fundus images obtained from say mass screening camps.

Hate speech detection on Twitter is extremely crucial and critical for controversial data extraction, creating AI bots, sentiment classification and recommending content. We define this task as being able to classify a tweet as sexist/racist or neither.The complexity of the NLP constructs makes this task challenging.We perform extensive experiments with multiple Machine learning algorithms to learn semantic embeddings to handle this complexity.We experiment on a benchmark dataset of ~32K annotated tweets.We find that racist and homophobic tweets are more likely to be classified as hate speech but that sexist tweets are generally classified as offensive. Tweets without explicit hate keywords are also more difficult to classify

•    Machine learning plays an important role in the current world and it helps to the educational institution or MOOC courses to predict and make decisions related to the students' academic status.

•    Dropping out of students' affects not only the students' career but also the reputation of the institute.

•    The existing system is a system which maintains the student information in the form of numerical values and it just stores and retrieves the information as stored and when required. So the system has no intelligence to analyze the data.

•    The proposed system is a Machine learning application which makes use of the classifier algorithms to predict the student drop out probability,and also predicts the student performance.