- Image a big text file
- File is broken up into several blocks of data(Chunks).
- each block is stored in different node in a cluster
- Advantage of doing this
- Each block is of equal size. Allows HDFS to deal with bigger files in the same way.
- Makes storage in simple.
- Only keep multiple copy of block not the whole file in different node.
- Always dealing with same about of data - Good for processes and equal processing time
- Optimum block size is 128 MB
- Namenode contains mapping of blocks in datanode