Part 1: Introduction to Bigdata

Data!

Big data is a generic term given to datasets that are so large or complicated that they are difficult to store, manipulate and analyze.

We live in the market of data age. It’s very difficult to measure the total size of data stored electronically, but an IDC estimate put the size of the “digital universe” at 4.4 zettabytes in 2013 and is forecasting a tenfold growth by 2020 to 44 zettabytes.

BigData is nothing but a problem in the current market because this particular data there is two problems, First one is the Storage and another is Processing.


Like we are using social media app (Whatsapp, Facebook etc) or E-com apps, 
they were generating certain Logs behind the application, why these Logs are important?  based on this Logs they can derive your browsing pattern, and based on this pattern they generate outcomes like which product is highest sold in the current month or which product is not sold in particular region etc. 
and they can filter out this type of data only if they have Data, so basically to store this data and then to process this data for Analytics is the main problem in the current market.

This happens because the demand of growing data and nature(structure) of data is uncertain that traditional database system can't store and process it.
Traditional database systems like RDBMS which we are familiar to use in d2d life like Oracle, MySql etc are not capable enough to store this data and process it very quickly, so basically RDBMS is incapable of storing and processing BigData.

Bigdata: Any dataset which is difficult to handle( or Process) by RDBMS is known as BigData.

Bigdata is nothing but a problem in current market so as if there is some problem it has some solution, as we can't let it go. 
In 2007, Doug Cutting gave a solution to this problem which is known as Hadoop, So BigData is Problem and Hadoop is a solution to this problem.

Now as we know that data is increasing but there should be some statistics that show how this data are increasing, like in d2d life 5-7year ago if we have 500gb hard drive then it is enough for storing data but nowadays 1tb is the minimum requirement, even nowadays we are also kept additional external drive also.
it simply means that our data are increasing and file size is also increased like Photos that we capture through mobile before 5-7years ago it was in kb but nowadays it's 5-7Mb each.
It means day by day as we are improving quality it automatically increasing data size also.

Some statistic based on the year 2015:
- The New York Stock Exchange generates about 4−5 terabytes of data per day.
- Facebook hosts more than 240 billion photos, growing at 7 petabytes per month.
- Ancestry.com, the genealogy site, stores around 10 petabytes of data.
- The Internet Archive stores around 18.5 petabytes of data.
- The Large Hadron Collider near Geneva, Switzerland, produces about 30 petabytes of data per year.



Comments

Popular posts from this blog

Maths for Machine Learning

MotoE 2nd Gen. Now Out in USA