Table of Contents
Motivation
NOTE: SeaweedFS provides two mechanisms to use cloud storage:
- SeaweedFS Cloud Drive
- in this case, you can mount an S3 bucket to the Seaweedfs file system (in the filer), and access the remote files through SeaweedFS. Effectively, SeaweedFS caches the files from the cloud storage.
- In this mode, the file structure in cloud store is exactly matching the SeaweedFS structure - so every file in SeaweedFS will also become a file in the cloud storage provider.
- This is useful in case you want to use the files inside the cloud provider's infrastructure.
- However, this does not support file encryption in any way (obviously), as the files are put to Cloud Storage as is.
- Tiered Storage with Cloud Tier (<== You are here)
- In this mode, seaweedFS moves full volume files to the cloud storage provider, so files which are 1 GB (in our case) big.
- This mode supports Filer Data Encryption transparently.
- The chunk files uploaded to the cloud provider are not usable outside of SeaweedFS.
Cloud storage is an ideal place to backup warm data. Its storage is scalable, and cost is usually low compared to on-premise storage servers. Uploading to the cloud is usually free. However, usually the cloud storage access is not free and slow.
SeaweedFS is fast. However, its capacity is limited by available number of volume servers.
One good way is to combine SeaweedFS' fast local access speed with the cloud storage's elastic capacity.
Assuming hot data is 20% and warm data is 80%. We can move the warm data to the cloud storage. The access for the warm data will be slower, but this can free up 80% servers, or repurpose them for faster local access, instead of just storing warm data with little access. All of this is transparent to SeaweedFS users.
With fixed number of servers, this transparent cloud integration literally gives SeaweedFS unlimited capacity, in addition to its fast speed. Just add more local SeaweedFS volume servers to increase the throughput.
Design
If one volume is tiered to the cloud,
- The volume is marked as readonly.
- The index file is still local
- The
.dat
file is moved to the cloud. - The same O(1) disk read is applied to the remote file. When requesting a file entry, a single range request retrieves the entry's content.
Benefits
Compared to direct S3 storage, this is both faster and cheaper!
- Cheaper!
- Reduce the API access cost to almost nothing, which is often forgotten at first, but will hit your wallet unexpectedly after you go all-in with S3.
- Use lower cost storage tiers.
- Compression
- Faster!
- Local access for recent data, skipping one internet trip.
- Access old data from S3 with O(1) http call.
Usage
- Use
weed scaffold -config=master
to generatemaster.toml
, tweak it, and start master server with themaster.toml
. - Use
volume.tier.upload
inweed shell
to move volumes to the cloud. - Use
volume.tier.download
inweed shell
to move volumes to the local cluster.
Configuring Storage Backend
The following storage backends are currently supported:
- S3
- Rclone
S3
Multiple S3 buckets are supported. Usually you just need to configure one backend.
# The storage backends are configured on the master, inside master.toml
[storage.backend]
[storage.backend.s3.default]
enabled = true
aws_access_key_id = "" # if empty, loads from the shared credentials file (~/.aws/credentials).
aws_secret_access_key = "" # if empty, loads from the shared credentials file (~/.aws/credentials).
region = "us-west-1"
bucket = "one_bucket" # an existing bucket
storage_class = "STANDARD_IA"
[storage.backend.s3.name2]
enabled = true
aws_access_key_id = "" # if empty, loads from the shared credentials file (~/.aws/credentials).
aws_secret_access_key = "" # if empty, loads from the shared credentials file (~/.aws/credentials).
region = "us-west-2"
bucket = "one_bucket_two" # an existing bucket
storage_class = "STANDARD_IA"
Rclone
Here's an example of a configuration for an Rclone storage backend:
[storage.backend]
[storage.backend.rclone.default]
enabled = true
remote_name = "my-onedrive"
key_template = "SeaweedFS/{{ slice . 0 2 }}/{{ slice . 2 4 }}/{{ slice . 4 }}" # optional
Where my-remote
must correspond to what's configured in ~/.config/rclone/rclone.conf
:
[my-onedrive]
type = onedrive
...
The optional key_template
property makes it possible to choose how the individual volume files are eventually named in the storage backend.
The template has to follow the text/template syntax, where .
is a UUID string representing the volume ID.
In this example, files would be named as such in OneDrive:
SeaweedFS/
5f/
e5/
03d8-0aba-4e42-b1c8-6116bf862f71
Configuration
After changing the master config, you need to restart the volume servers so that they pull the updated configuration from the master. If you forget to do this, you'll get an error message about no storage backends being configured.
After this is configured, you can use this command to upload the .dat file content to the cloud.
// move the volume 37.dat to the s3 cloud
volume.tier.upload -dest=s3 -collection=benchmark -volumeId=37
// or
volume.tier.upload -dest=s3.default -collection=benchmark -volumeId=37
// if for any reason you want to move the volume to a different bucket
volume.tier.upload -dest=s3.name2 -collection=benchmark -volumeId=37
Automated Tiering
In master.toml
generated by weed scaffold -config=master
, you can adjust the master.maintenance
script section:
[master.maintenance]
# periodically run these scripts are the same as running them from 'weed shell'
scripts = """
lock
volume.tier.upload -dest s3 -fullPercent=95 -quietFor=1h
...
unlock
"""
Or you can run the weed shell
periodically in some cron job:
echo "lock;volume.tier.upload -dest s3 -fullPercent=95 -quietFor=1h;unlock" | weed shell
Introduction
API
Configuration
- Replication
- Store file with a Time To Live
- Failover Master Server
- Erasure coding for warm storage
- Server Startup Setup
- Environment Variables
Filer
- Filer Setup
- Directories and Files
- Data Structure for Large Files
- Filer Data Encryption
- Filer Commands and Operations
- Filer JWT Use
Filer Stores
- Filer Cassandra Setup
- Filer Redis Setup
- Super Large Directories
- Path-Specific Filer Store
- Choosing a Filer Store
- Customize Filer Store
Advanced Filer Configurations
- Migrate to Filer Store
- Add New Filer Store
- Filer Store Replication
- Filer Active Active cross cluster continuous synchronization
- Filer as a Key-Large-Value Store
- Path Specific Configuration
- Filer Change Data Capture
FUSE Mount
WebDAV
Cloud Drive
- Cloud Drive Benefits
- Cloud Drive Architecture
- Configure Remote Storage
- Mount Remote Storage
- Cache Remote Storage
- Cloud Drive Quick Setup
- Gateway to Remote Object Storage
AWS S3 API
- Amazon S3 API
- AWS CLI with SeaweedFS
- s3cmd with SeaweedFS
- rclone with SeaweedFS
- restic with SeaweedFS
- nodejs with Seaweed S3
- S3 API Benchmark
- S3 API FAQ
- S3 Bucket Quota
- S3 API Audit log
- S3 Nginx Proxy
- Docker Compose for S3
AWS IAM
Machine Learning
HDFS
- Hadoop Compatible File System
- run Spark on SeaweedFS
- run HBase on SeaweedFS
- run Presto on SeaweedFS
- Hadoop Benchmark
- HDFS via S3 connector
Replication and Backup
- Async Replication to another Filer [Deprecated]
- Async Backup
- Async Filer Metadata Backup
- Async Replication to Cloud [Deprecated]
- Kubernetes Backups and Recovery with K8up
Messaging
Use Cases
Operations
Advanced
- Large File Handling
- Optimization
- Volume Management
- Tiered Storage
- Cloud Tier
- Cloud Monitoring
- Load Command Line Options from a file
- SRV Service Discovery
- Volume Files Structure