Machine Learning & Big Data Blog

Big Data: A Big Introduction

5 minute read
Muhammad Raza

The digital universe is continuously expanding—just like the physical universe, except that the digital world alone has generated more data than the number of stars in the entire observable physical universe.

44 zettabytes! That’s 44 with trailing zeros (44×1021). That’s 40 times more bytes than the number of stars in the observable universe.

Universe

By 2025, there will be 175 zettabytes of data in the global datasphere. The growth in data volume is exponential.

All of this data is aptly called Big Data. In this article, we will:

What is Big Data?

Big data is the term for information assets (data) that are characterized by high volume, velocity, and variety that are systematically extracted, analyzed, and processed for decision making or control actions.

The characteristics of Big Data make it virtually impossible to analyze using traditional data analysis methods.

The importance of big data lies in the patterns and insights, hidden in large information assets, that can drive business decisions. When extracted using advanced analytics technologies, these insights help organizations understand how their users, markets, society, and the world behaves.

3 Vs of Big Data

For an information asset to be considered as Big Data, it must meet the 3-V criteria:

  • Volume. The size of data. High volume data is likely to contain useful insights. A minimum threshold for data to be considered big usually starts at terabytes and petabytes. The large volume of Big Data requires hyperscale computing environments with large storage and fast IOPS (Input/Output Operations per Second) for fast analytics processing.
  • Velocity. The speed at which data is produced and processed. Big Data is typically produced in streams and is available in real-time. The continuous nature of data generation makes it relevant for real-time decision-making.
  • Variety. The type and nature of information assets. Raw big data is often unstructured or multi-structured, generated with a variety of attributes, standards, and file formats. For example, datasets collected from sensors, log files, and social media networks are unstructured. So, they must be processed into structured databases for data analytics and decision-making.

More recently, two additional Vs help characterize Big Data:

  • Veracity. The reliability or truthfulness of data. The extent to which the output of big data analysis is pertinent to the associated business goals is determined by the quality of data, the processing technology, and the mechanism used to analyze the information assets.
  • Value. The usefulness of Big Data assets. The worthiness of the output of big data analysis can be subjective and is evaluated based on unique business objectives.

characterize Big Data

Big data vs small data vs thick data

In contrast to these characteristics, there are two other forms of data: small data and thick data.

Small Data

Small Data refers to manageable data assets, usually in numerical or structured form, that can be analyzed using simple technologies such as Microsoft Excel or an open source alternative.

Thick Data

Thick Data refers to text or qualitative data that can be analyzed using manageable manual processes. Examples include:

  • Interview questions
  • Surveys
  • Video transcripts

When you use qualitative data in conjunction with quantitative big data, you can better understand the sentiment and behavioral aspects that can be easily communicated by individuals. Thick Data is particularly useful in the domains of medicine and scientific research where responses from individual humans hold sufficient value and insights—versus large big data streams.

Big Data trends in 2021-2022

Big Data technologies are continuously improving. Indeed, data itself is fast becoming the most important asset for a business organization.

Prevalence of the Internet of Things (IoT), cloud computing, and Artificial Intelligence (AI) is making it easier for organizations to transform raw data into actionable knowledge.

Here are three of the most popular big data technology trends to look out for in 2021:

  • Augmented Analytics. The Big Data industry will be worth nearly $274 billion by the end of 2021. Technologies such as Augmented Analytics, which help organizations with the data management process, are projected to grow rapidly and reach $18.4 billion by the year 2023.
  • Continuous Intelligence. Integrating real-time analytics to business operations is helping organizations leapfrog the competition with proactive and actionable insights delivered in real-time.
  • Blockchain. Stringent legislations such as the GDPR and HIPAA are encouraging organizations to make data secure, accessible, and reliable. Blockchain and similar technologies are making their way into the financial industry as a data governance and security instrument that is highly resilient and robust against privacy risks. This EU resource discusses how blockchain complements some key GDPR objectives.

Big Data best practices for businesses

Certainly the world of data is growing exponentially. Are your data and data processes up to the tasks that you’re asking of it?

BMC Blogs has many resources for understanding and working with Big Data. Browse the BMC Machine Learning & Big Data Blog or dive deeper into these areas:

Data basics

Data storage

Data management

Data security

Data analysis & analytics

Machine learning, data science & AI

Data theory & thought leadership

Learning Big Data

Big data tutorial series

These tutorial series are part of BMC Guides.

Amazon Redshift Kubernetes
Apache Cassandra Microsoft Power BI
Apache Spark MongoDB
AWS Pandas
Data Visualization Microsoft Power BI
Docker Redis
DynamoDB scikit-learn
ElasticSearch Snowflake
Hadoop Tableau Online

Learn ML with our free downloadable guide

This e-book teaches machine learning in the simplest way possible. This book is for managers, programmers, directors – and anyone else who wants to learn machine learning. We start with very basic stats and algebra and build upon that.


These postings are my own and do not necessarily represent BMC's position, strategies, or opinion.

See an error or have a suggestion? Please let us know by emailing blogs@bmc.com.

Business, Faster than Humanly Possible

BMC empowers 86% of the Forbes Global 50 to accelerate business value faster than humanly possible. Our industry-leading portfolio unlocks human and machine potential to drive business growth, innovation, and sustainable success. BMC does this in a simple and optimized way by connecting people, systems, and data that power the world’s largest organizations so they can seize a competitive advantage.
Learn more about BMC ›

About the author

Muhammad Raza

Muhammad Raza is a Stockholm-based technology consultant working with leading startups and Fortune 500 firms on thought leadership branding projects across DevOps, Cloud, Security and IoT.