Intro:
Data storage refers to the technology and methods used to retain digital information on a computer or other electronic device. We are exploring the different concepts of Data storage and comparing them in this blog.
A database is like a big library where all the books are organized on shelves and you can find the book you're looking for by looking at the title, author, or subject. A datamart is like a smaller library inside the big library, with books that are only about a certain topic, like animals or sports. A data warehouse is like a super big library that has all the books from many different smaller libraries, so you can find all the information you need in one place. A data lake is like a giant swimming pool filled with all kinds of information, like books, videos, and pictures. But instead of swimming in it, we can use special tools to find the information we need.
Themes | Database | Datamart | Data Warehouse | Data Lake |
---|---|---|---|---|
Definition | Collection of data that is organized in a specific way, typically using tables and relationships between those tables | Subset of a larger data warehouse, focused on a specific business function or department | Large, centralized repository of data that is used for reporting and analysis | A storage repository that holds a vast amount of raw data in its native format, including structured and unstructured data |
Usage | Storage and maintaining specify data objects | Reporting needs for operational reports for specific department | Performance analytics and and Reporting needs for operational across departments at organisation level | Predictive and Advance Analytics, Machine learning |
Size | Limited storage for specific data objects | Limited amount but structured that is curated / processed | Larger volumes of structured data | Hugh volume of data; raw, structured and unstructured |
Processing | Schema on write | Schema on write | Schema on write | Schema on read |
Agility | Not flexible | Not flexible | Less agile | Agile and reconfigurable |
Security | Robust and optimised | Robust and optimised | Robust and optimised | Maturing |
Ease of Navigation | Hard | Hard | Moderate | Easy |
Cost | $ | $$$ | $$$$ | $$ |
Setup Time | π | ππ | ππ | ππππππ |
User personas | IT | IT / Business users | IT / Business users | Data scientist / IT / Citizen developers |
Infra Scaling | Hard | Hard | Medium | Easy |
Consideration while design data storage
1.Data Types : Factor the type of data that would be persisted in the data store
- Structured: eg.- sales records or customer contacts
- Unstructured: eg. - images, videos, PDF etc..
- Semi-structured: eg. - meta data of media files
2.Scaling: Understand how the infrastructure scales without significant effort and cost.
3.Performance: how fast your queries can run and how you maintain that speed in times of high demand.
4.Hosting: Understand the hosting strategy and compatibility / support provided by the cloud service provider
5.Cost: Storage and compute cost