Table of Contents
Welcome to the SeaweedFS wiki!
SeaweedFS is a versatile and efficient storage system designed to meet the needs of modern sysadmins managing a mix of blob, object, file, and data warehouse storage requirements. Its architecture guarantees fast access times, with constant-time (O(1)) disk seeks, regardless of the size of the dataset. This makes it an excellent choice for environments where speed and efficiency are critical.
SeaweedFS is organized into several layers, each serving a different storage need:
- Blob Storage is the foundation, comprising master servers, volume servers, and a cloud tier for infinite scalability.
- File Storage builds on Blob Storage by adding filer servers for managing filesystem-like operations.
- Object Storage extends File Storage with S3-compatible servers, making it a breeze to integrate with existing S3 workflows.
- Data Warehouse capabilities are integrated into File Storage, offering compatibility with big data frameworks like Hadoop, Spark, and Flink, through Hadoop-compatible libraries.
- FUSE Mount allows File Storage to be directly mounted in user space on clients, supporting common use cases like FUSE mounts and Kubernetes persistent volumes.
SeaweedFS stands out for its high performance, scalability, and flexibility. It features:
- Fast key-to-file mapping with minimal disk seek time.
- Customizable tiered storage that intelligently places data based on activity, moving less active data to cheaper cloud storage.
- Elastic scalability, easily expanding capacity by adding volume servers.
- A robust, high-performance, S3-compatible object store that can serve as an in-house alternative to HDFS.
The system is designed for high availability and durability, with features like:
- No single point of failure (SPOF), supporting active-active asynchronous replication and erasure coding for data protection.
- Support for file checksums to ensure data integrity.
- Rack and data center aware replication to enhance reliability.
- Flexible metadata management, compatible with a variety of popular databases and storage systems.
For sysadmins, SeaweedFS simplifies operations significantly. Adding capacity is as straightforward as integrating more volume servers. The system's architecture allows for easy data migration and backup, supporting a wide array of backend stores for metadata. This makes SeaweedFS an adaptable and reliable choice for managing diverse and demanding storage environments.
Here is the white paper for SeaweedFS Architecture.pdf
Roadmap
- Getting Started: If you are a user wanting to try out SeaweedFS.
- Production Setup: this lays out a more serious configuration designed for large volumes of traffic and high relability.
- Components: How the services fit together.
- Benchmarks: the measured performance of SeaweedFS.
- FAQ: things we should work to include in the main documentation.
- Applications, Use-Cases and Actual-Users: inspiration and ideas for how you might use SeaweedFS.
Introduction
API
Configuration
- Replication
- Store file with a Time To Live
- Failover Master Server
- Erasure coding for warm storage
- Server Startup Setup
- Environment Variables
Filer
- Filer Setup
- Directories and Files
- Data Structure for Large Files
- Filer Data Encryption
- Filer Commands and Operations
- Filer JWT Use
Filer Stores
- Filer Cassandra Setup
- Filer Redis Setup
- Super Large Directories
- Path-Specific Filer Store
- Choosing a Filer Store
- Customize Filer Store
Advanced Filer Configurations
- Migrate to Filer Store
- Add New Filer Store
- Filer Store Replication
- Filer Active Active cross cluster continuous synchronization
- Filer as a Key-Large-Value Store
- Path Specific Configuration
- Filer Change Data Capture
FUSE Mount
WebDAV
Cloud Drive
- Cloud Drive Benefits
- Cloud Drive Architecture
- Configure Remote Storage
- Mount Remote Storage
- Cache Remote Storage
- Cloud Drive Quick Setup
- Gateway to Remote Object Storage
AWS S3 API
- Amazon S3 API
- AWS CLI with SeaweedFS
- s3cmd with SeaweedFS
- rclone with SeaweedFS
- restic with SeaweedFS
- nodejs with Seaweed S3
- S3 API Benchmark
- S3 API FAQ
- S3 Bucket Quota
- S3 API Audit log
- S3 Nginx Proxy
- Docker Compose for S3
AWS IAM
Machine Learning
HDFS
- Hadoop Compatible File System
- run Spark on SeaweedFS
- run HBase on SeaweedFS
- run Presto on SeaweedFS
- Hadoop Benchmark
- HDFS via S3 connector
Replication and Backup
- Async Replication to another Filer [Deprecated]
- Async Backup
- Async Filer Metadata Backup
- Async Replication to Cloud [Deprecated]
- Kubernetes Backups and Recovery with K8up
Messaging
Use Cases
Operations
Advanced
- Large File Handling
- Optimization
- Volume Management
- Tiered Storage
- Cloud Tier
- Cloud Monitoring
- Load Command Line Options from a file
- SRV Service Discovery
- Volume Files Structure