The Google File System ( GFS or GoogleFS ) is a proprietary distributed file system developed by Google to provide efficient and reliable data access using a large group of devices hard commodities. The new version of the Google File System code called Colossus was released in 2010.
Video Google File System
Design
GFS is upgraded for core data storage and Google usage needs (especially search engines), which can generate large amounts of data that should be stored; Google File System grew out of Google's previous attempts, "BigFiles", developed by Larry Page and Sergey Brin in the early days of Google, while still at Stanford. Files are divided into fixed size chunks of 64 megabytes, similar to clusters or sectors in the regular file system, which are only very rarely overwritten, or shrunk; files are usually added or read. It is also designed and optimized to run on Google's computing cluster, a compact node consisting of cheap "commodity" computers, meaning that precautions should be taken against the high failure rate of individual nodes and subsequent data loss. Other design decisions opt for high data throughput, even when it comes with the cost of latency.
The GFS cluster consists of several nodes. This node is divided into two types: one node Master and a large number of chunkservers . Each file is divided into fixed-size pieces. Chunk servers store these pieces. Each piece is labeled 64-bit unique by the master node at the time of creation, and the logical mapping of the file to the constituent of the piece is retained. Each piece is replicated several times across the network. By default, this is replicated three times, but this can be configured. Files that have high demand may have higher replication factors, while files that app clients use using tight storage optimizations can be replicated less than three times - to address a quick garbage-cleaning policy.
The Master server typically does not store the actual snippet, but all the metadata associated with the snippet, such as the table maping the 64-bit label to the location of the piece and the file they created, the location of the snippet, what process is being read or written to a particular part, or retrieve the " snapshot "of the snippet based on to replicate it (usually at the push of the Master server, when, due to node failure, the number of snippet pieces has fallen below the specified number). All these metadata are currently stored by the Master server periodically receiving updates from each chunk server ("Heartbeat message").
Permission for modification is handled by a limited time system, "lease" expires, where the Master server grants permission to the process for a limited period of time in which no other process will be granted permission by the Master server to change the chunks. The modify chunkserver, which is always the main chunk holder, then propagates changes to chunkservers with backup copies. Changes are not saved until all chunkservers recognize, thus ensuring completion and atomicity of operation.
The program accesses the snippet by first asking the Master server for the desired slice location; if the piece is not operated (ie no outstanding lease), Master responds with location, and the program then contacts and receives data from the chunkserver directly (similar to Kazaa and supernodes).
Unlike most other file systems, GFS is not implemented in the operating system kernel, but is provided as a userspace library.
Maps Google File System
Performance
Deciding from the comparison results, when used with a relatively small number of servers (15), the file system achieves a reading performance equivalent to one disk (80-100 MB/sec), but has a lower write performance (30 MB/sec). ), and relatively slow (5 MB/s) in adding data to an existing file. The authors do not present results at random search times. As the master node is not directly involved in reading the data (data passed from the chunk server directly to the reading client), the read rate increases significantly with the number of chunk servers, reaching 583 MB/s for 342 nodes. The aggregation of a large number of servers also allows large capacity, while it is somewhat reduced by storing data in three independent locations (to provide redundancy).
See also
- Bigtable
- Cloud storage
- CloudStore
- Fossil, the original file system of Plan 9
- IBM GPFS General Parallel File System
- GFS2 Red Hat Global Filesystem 2
- Hadoop and "Hadoop Distributed File System" (HDFS), an open source Java product similar to GFS
- List of Google products
- MapReduce
References
Bibliography
External links
- "GFS: Evolution on Fast-forward", Queue , ACM < span> Ã, .
- "Google Eval File System, Part I", Mojo Storage .
Source of the article : Wikipedia