Image for post
Image for post

In Classification problems, we might often come across an imbalance in our dataset. What this means is that there is a significant variation in the frequency of outcomes for our classes. For a binary classification problem, this could mean that there are 10,000 samples or rows with class label 1 and only 10 rows with label 0. In case of credit fraud detection, there is a possibility that there are numerous instances of non-fraudulent cases and a dearth of instances of fraudulent cases. …

It is extremely essential to prudently choose a performance metric while selecting a machine learning model. Not doing so would lead to compromised results once the model is deployed. Evaluation metrics give a quantitative measure of how good our Machine learning model is and how well our algorithm is performing.

Classification Metrics

1.Accuracy- It is equal to the No of correct predictions / Total predictions. After splitting the data and fitting the required model, we can find accuracy by simply using the score method from sklearn. The range will be from 0 to 1.

Model_name.score(test data)

The Accuracy measure merely tells how…

Image for post
Image for post

We have often heard of the various uses of Artificial Intelligence in our quotidian lives. Whether it is providing economic benefits to organizations, devising health care models or predicting future trends, the technology has taken over almost all domains of our life. Introduced in the 1950s, the concept of AI was gathering steam for a considerable amount of time until it gained popularity in the last two decades. Our view on AI has now shifted from a mere advancement in technology to taking the center stage in helping humanity. However there might be some areas in which one would never…

Hadoop is a java-based big data analytics tool used to fill the voids and pitfalls in the traditional approach when there is voluminous data. It is an open source framework for storing data and running applications on clusters of commodity hardware. It offers massive storage and enormous power for processing the data. It is based on the assumption that hardware failure is possible and must be handled by the framework. Hadoop splits large files of data into blocks or fragments, distributes these across nodes in a cluster and transfers code to data to allow parallel processing. The data is locally…

Whether it is online network such as Linkedin and Twitter or an offline, more traditional network such as people living in a neighborhood, analysis of these social networks, connecting people and information, leads to new realizations in a myriad of domains. It gives us a means to quantify connections between individual data points and present them in a graphical manner.

Lets have a look at SNA in R along with some examples.


Image for post
Image for post

Here we are creating a simple graph and storing it in g. By default, it a directed graph (I.e. the edges have direction). To make it…

Image for post
Image for post

With more than 3.7 billion humans using the internet, over 40,000 Google searches every second, and 16 million text messages sent every minutes, the amount of data being generated is increasing exponentially. Big data refers to the massive data sets that are collected from a variety of sources for business needs to reveal insights for optimized decision making. For instance, social media data related to human behaviour and interactions is used for Sentiment Analysis to help drive businesses, make predictions in politics, and more.

Ever wondered what lead to Big Data Analytics? Here are the computing trends behind it.


Image for post
Image for post

In real life, it is not always possible to determine the state of the environment as it might not be clear. Due to partially observable or non-deterministic environments, agents may need to handle uncertainty and deal with:

Uncertain data: Data that is missing, unreliable, inconsistent or noisy

Uncertain knowledge: When the available knowledge has multiple causes leading to multiple effects or incomplete knowledge of causality in the domain

Uncertain knowledge representation: The representations which provides a restricted model of the real system, or has limited expressiveness

Inference: In case of incomplete or default reasoning methods, conclusions drawn might not be…

A Knowledge Based Agent in Artificial Intelligence has two levels: Knowledge Base (KB) and Inference Engine.

1. Knowledge Base- It is the base level of an agent, which consist of domain specific content. In this level agent has facts or information about the surrounding environment in which they are working. It does not consider the actual implementation.

2. Implementation level- It consists of domain independent algorithms. At this level, agents can recognize the data structures used in the knowledge base and algorithms which use them. For example, propositional logic and resolution. Knowledge based agents are crucial to use in partially…

Image for post
Image for post

In Artificial Intelligence, there are agents which perceive the environment via sensors and act upon the environment through actuators or effectors. Just like humans have sensors through which we sense our surroundings (eyes, ears, nose, tongue, and skin) and actuators (limbs) to perform actions on these surroundings. The agent starts from the initial state and performs a series of actions in order to reach the goal state. For instance, a vacuum cleaner agent will perform actions of moving right and left, and sucking in dirt to reach the goal of successfully cleaning the environment.

Planning and its types

This activity of coming up with…

Data Augmentation is a technique that aims to expand existing data by making slight modifications to the data. In NLP, it is often used to increase size of the training data and improve performance of the model. Here we will look at Data Augmentation using:

  • Word Embeddings
  • BERT
  • Back Translation
  • T5

Data Augmentation using Word Embeddings

Let’s first look at augmentation of data using Word Embeddings. Using word embeddings, we can represent words as vectors in a high-dimensional space in a meaningful way. For instance, we have two sentences:

Winter is Coming.

Any man who must say “I am the king” is no true king.

Heena Rijhwani

Final Year Information Technology engineer with a focus in Data Science, Machine Learning, Deep Learning and Natural Language Processing.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store