Data Storage perception

Bala Madhusoodhanan - Jan 19 '23 - - Dev Community

Intro:
Data storage refers to the technology and methods used to retain digital information on a computer or other electronic device. We are exploring the different concepts of Data storage and comparing them in this blog.

A database is like a big library where all the books are organized on shelves and you can find the book you're looking for by looking at the title, author, or subject. A datamart is like a smaller library inside the big library, with books that are only about a certain topic, like animals or sports. A data warehouse is like a super big library that has all the books from many different smaller libraries, so you can find all the information you need in one place. A data lake is like a giant swimming pool filled with all kinds of information, like books, videos, and pictures. But instead of swimming in it, we can use special tools to find the information we need.

Themes Database Datamart Data Warehouse Data Lake
Definition Collection of data that is organized in a specific way, typically using tables and relationships between those tables Subset of a larger data warehouse, focused on a specific business function or department Large, centralized repository of data that is used for reporting and analysis A storage repository that holds a vast amount of raw data in its native format, including structured and unstructured data
Usage Storage and maintaining specify data objects Reporting needs for operational reports for specific department Performance analytics and and Reporting needs for operational across departments at organisation level Predictive and Advance Analytics, Machine learning
Size Limited storage for specific data objects Limited amount but structured that is curated / processed Larger volumes of structured data Hugh volume of data; raw, structured and unstructured
Processing Schema on write Schema on write Schema on write Schema on read
Agility Not flexible Not flexible Less agile Agile and reconfigurable
Security Robust and optimised Robust and optimised Robust and optimised Maturing
Ease of Navigation Hard Hard Moderate Easy
Cost $ $$$ $$$$ $$
Setup Time πŸ•‘ πŸ•‘πŸ•‘ πŸ•‘πŸ•‘ πŸ•‘πŸ•‘πŸ•‘πŸ•‘πŸ•‘πŸ•‘
User personas IT IT / Business users IT / Business users Data scientist / IT / Citizen developers
Infra Scaling Hard Hard Medium Easy

Consideration while design data storage

1.Data Types : Factor the type of data that would be persisted in the data store

  • Structured: eg.- sales records or customer contacts
  • Unstructured: eg. - images, videos, PDF etc..
  • Semi-structured: eg. - meta data of media files

2.Scaling: Understand how the infrastructure scales without significant effort and cost.

3.Performance: how fast your queries can run and how you maintain that speed in times of high demand.

4.Hosting: Understand the hosting strategy and compatibility / support provided by the cloud service provider

5.Cost: Storage and compute cost

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .