BIG DATA ANALYTICS

BIG DATA

Big Data is a data but with a huge size. Big Data is a term used to describe a collection of data that is huge in volume and yet growing exponentially with time. Such data is so large and complex that none of the traditional data management tools are able to store it or process it efficiently. Example: Statics shows that 500+terabytes of new data get ingested into the databases of social media site Facebook, every day. This data is mainly generated in terms of photo and video uploads, message exchanges, putting comments etc.

TYPES OF BIG DATA:

1. Structured 2. Unstructured 3. Semi-structured

Structured: Any data that can be stored, accessed and processed in the form of fixed format is termed as 'structured' data. Example: An 'Employee' table in a database is an example of Structured Data.

Unstructured: Any data with unknown form or structure is classified as unstructured data. In addition to the size being huge, un-structured data poses multiple challenges in terms of it's processing for deriving value out of it. Example: The output returned by 'Google Search'.

Semi-structured: Semi-structured data can contain both the forms of data. We can see semi-structured data as a structured form but it is not defined for example a table in relational DBMS. Example: Data represented in an XML file.

CHARACTERISTICS OF BIG DATA:

1. Volume: Size of data plays a very crucial role in determining value of data. Also, whether a particular data can actually be considered as a Big Data or not, is dependent upon the volume of dara. Hence, 'Volume' is one characteristic which needs to be considered while dealing with Big Data.

2. Variety: Variety refers to heterogenous sources and the nature of data, both structured and unstructured. Spreadsheets, Databases, emails, photos, videos, monitoring devices, PDFs, audio, etc are considered in the analysis applications.

3. Velocity: The term 'velocity' refers to the speed of generation of data. Big Data Velocity deals with the speed at which data flows in from sources like business processes, application logs, networks, and social media sites, sensors, mobile devices, etc. The flow of data is massive and continuous.

4. Variability: This refers to the inconsistency which can be shown by the data at times, thus hampering the process of being able to handle and manage the data effectively.

BENEFITS OF BIG DATA:

1. Businesses can utilize outside intelligence while taking decisions

2. Improved customer service

3. Early indentification of risk to the product if any

4. Better operational efficiency.

Organizations if want to deploy on premise big data systems uses Apache open source technologies in addition to Hadoop and Spark as well as YARN( Hadoop's built-in resource manager which stands for Yet ANother Resource Negotiator), MapReduce(programming framework), Kafka( an application to application messaging and data streaming platform), the HBase database and SQL-on-Hadoop query engines like Drill, Hive, Impala and Presto.