DFS(Distributed File Systems)-
A distributed file system is a client/server-based application that allows clients to access and process data stored on the server as if it were on their own computer. When a user accesses a file on the server, the server sends the user a copy of the file, which is cached on the user’s computer while the data is being processed and is then returned to the server.
In above pics there are different physical machines in different location but in one logical machine have a common file system for all physical machine.
- System that permanently store data
- Divided into logical units (files, shards, chunks, blocks etc)
- A file path joins file and directory names into a relative or absolute relative address to identify a file
- Support access to files and remote servers
- Support concurrency
- Support Distribution
- Support Replication
- NFS, GPFS, Hadoop DFS, GlusterFS, MogileFS…
What is Hadoop?
Apache Hadoop is a framework that allow for the distributed processing for large data sets across clusters of commodity computers using simple programing model.
It is design to scale up from a single server to thousands of machines each offering local computation and storage.
Apache Hadoop is simply a framework, it is library which build using java with objective of providing capability of managing huge amount of data.
Hadoop is a java framework providing by Apache hence to manage huge amount of data by providing certain components which have capability of understanding data providing the right storage capability and providing right algorithm to do analysis to it.
Open Source Software + Commodity Hardware = IT Costs reduction
What is Hadoop used for?
- Log Processing
- Recommendation systems
- Video and Image Analysis
- Data Retention
Company Using Hadoop:
- other mores