What is a feature?

A feature is every individual piece in a sample. I like to think of it as the smallest unit of a dataset, that helps us analyze it and put it to use. For instance, in order to determine whether a patient’s tumor is benign or malignant, features of each cell nucleus, such as radius, perimeter, texture, area, and so on, are considered.

This blog will cover:

  • Feature Scaling
  • Data Imputation
  • Outliers
  • Encoding techniques

Feature scaling is applicable for numerical variables. Often a myriad of features in our dataset contribute towards the final prediction. A tremendous difference between the magnitudes of…


Introduction

Machine learning workflow- Image by author

Before building our machine learning model, we split the data into training and test sets. Our model learns on the training data and its performance is evaluated on the test data. Let’s look at a couple of scenarios to understand how incorrect use of train and test sets could negatively impact the accuracy of our entire model.

  • Using up all the data to train the algorithm would leave us with no new data to test the model on. …


Hypothesis testing is a way of deriving insights from the plethora of data available to us. It encompasses the evaluation of a statement on a given population using sample data. Imagine a court case where a lawyer needs to prove his client is innocent. This is what we call the null hypothesis, an assumption made right at the beginning. Now we work from there. In the next step, the lawyer will collect data such as his whereabouts at the time of the crime, alibi, and more, and use it as evidence to prove his client’s innocence. Say the defendant is…


In Classification problems, we might often come across an imbalance in our dataset. What this means is that there is a significant variation in the frequency of outcomes for our classes. For a binary classification problem, this could mean that there are 10,000 samples or rows with class label 1 and only 10 rows with label 0. In case of credit fraud detection, there is a possibility that there are numerous instances of non-fraudulent cases and a dearth of instances of fraudulent cases. …


It is extremely essential to prudently choose a performance metric while selecting a machine learning model. Not doing so would lead to compromised results once the model is deployed. Evaluation metrics give a quantitative measure of how good our Machine learning model is and how well our algorithm is performing.

Classification Metrics

1.Accuracy- It is equal to the No of correct predictions / Total predictions. After splitting the data and fitting the required model, we can find accuracy by simply using the score method from sklearn. The range will be from 0 to 1.

Model_name.score(test data)

The Accuracy measure merely tells how…


We have often heard of the various uses of Artificial Intelligence in our quotidian lives. Whether it is providing economic benefits to organizations, devising health care models or predicting future trends, the technology has taken over almost all domains of our life. Introduced in the 1950s, the concept of AI was gathering steam for a considerable amount of time until it gained popularity in the last two decades. Our view on AI has now shifted from a mere advancement in technology to taking the center stage in helping humanity. However there might be some areas in which one would never…


Hadoop is a java-based big data analytics tool used to fill the voids and pitfalls in the traditional approach when there is voluminous data. It is an open source framework for storing data and running applications on clusters of commodity hardware. It offers massive storage and enormous power for processing the data. It is based on the assumption that hardware failure is possible and must be handled by the framework. Hadoop splits large files of data into blocks or fragments, distributes these across nodes in a cluster and transfers code to data to allow parallel processing. The data is locally…


Whether it is online network such as Linkedin and Twitter or an offline, more traditional network such as people living in a neighborhood, analysis of these social networks, connecting people and information, leads to new realizations in a myriad of domains. It gives us a means to quantify connections between individual data points and present them in a graphical manner.

Lets have a look at SNA in R along with some examples.

OUTPUT

Here we are creating a simple graph and storing it in g. By default, it a directed graph (I.e. the edges have direction). To make it…


With more than 3.7 billion humans using the internet, over 40,000 Google searches every second, and 16 million text messages sent every minutes, the amount of data being generated is increasing exponentially. Big data refers to the massive data sets that are collected from a variety of sources for business needs to reveal insights for optimized decision making. For instance, social media data related to human behaviour and interactions is used for Sentiment Analysis to help drive businesses, make predictions in politics, and more.

Ever wondered what lead to Big Data Analytics? Here are the computing trends behind it.

Social…


In real life, it is not always possible to determine the state of the environment as it might not be clear. Due to partially observable or non-deterministic environments, agents may need to handle uncertainty and deal with:

Uncertain data: Data that is missing, unreliable, inconsistent or noisy

Uncertain knowledge: When the available knowledge has multiple causes leading to multiple effects or incomplete knowledge of causality in the domain

Uncertain knowledge representation: The representations which provides a restricted model of the real system, or has limited expressiveness

Inference: In case of incomplete or default reasoning methods, conclusions drawn might not be…

Heena Rijhwani

Final Year Information Technology engineer with a focus in Data Science, Machine Learning, Deep Learning and Natural Language Processing.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store