If your meta data size keeps growing and one filer store can not handle, you can still scale your system with path-specific filer stores.
Why this is needed?
In most cases, you would just need to setup one filer store.
However, there are some examples that one filer store is not enough:
- A portion of the data is critical. It needs strong consistency and is ok with high latency.
- With too many updates in a directory, the Cassandra with LSM tree becomes slower with too many tombstones. You may want to use Redis for that directory.
- One filer store can not handle the total amount of files. We can add more file stores for big directories to ensure linear scalability.
- One directory operations are hot. Having a separate store can physically isolate from and avoid impact to other directories.
How to add path-specific filer stores?
Run weed scaffold -config=filer
, there is an example:
##########################
##########################
# To add path-specific filer store:
#
# 1. Add a name following the store type separated by a dot ".". E.g., cassandra.tmp
# 2. Add a location configuraiton. E.g., location = "/tmp/"
# 3. Copy and customize all other configurations.
# Make sure they are not the same if using the same store type!
# 4. Set enabled to true
#
# The following is just using redis as an example
##########################
[redis2.tmp]
enabled = false
location = "/tmp/"
address = "localhost:6379"
password = ""
database = 0
You can add multiple path-specific filer stores.
How does it work?
When any request comes in, the directory is matched to all locations with customized filer stores. The matching is efficient and there are no limits in terms of the number of path-specific filer stores. The matched filer store is used to handle the metadata reads and writes.
This only works for new data or new updates. Existing data for old directories will become lost
or invisible. So only apply this to new directories.
Data Storage
The directory path stored in the path-specific filer store has the location
prefix trimmed and persists to the store. For example, if location = "/my/home/"
, the file /my/home/tmp/file.txt
will only store as "/tmp/file.txt". When reading back, the prefix will be transparently added back.
This trimming saves some storage, but that is not the purpose. This means the path-specific filer store is portable. It can later change location
to another directory, or even another filer.
What still works?
This can not be applied to existing directories. Besides this requirement, all other meta data operations are almost transparent to this configuration change. For example,
- The renaming works across filer stores.
- The metadata exports and imports still works.
- The cross-cluster replication still works.
Introduction
API
Configuration
- Replication
- Store file with a Time To Live
- Failover Master Server
- Erasure coding for warm storage
- Server Startup Setup
- Environment Variables
Filer
- Filer Setup
- Directories and Files
- Data Structure for Large Files
- Filer Data Encryption
- Filer Commands and Operations
- Filer JWT Use
Filer Stores
- Filer Cassandra Setup
- Filer Redis Setup
- Super Large Directories
- Path-Specific Filer Store
- Choosing a Filer Store
- Customize Filer Store
Advanced Filer Configurations
- Migrate to Filer Store
- Add New Filer Store
- Filer Store Replication
- Filer Active Active cross cluster continuous synchronization
- Filer as a Key-Large-Value Store
- Path Specific Configuration
- Filer Change Data Capture
FUSE Mount
WebDAV
Cloud Drive
- Cloud Drive Benefits
- Cloud Drive Architecture
- Configure Remote Storage
- Mount Remote Storage
- Cache Remote Storage
- Cloud Drive Quick Setup
- Gateway to Remote Object Storage
AWS S3 API
- Amazon S3 API
- AWS CLI with SeaweedFS
- s3cmd with SeaweedFS
- rclone with SeaweedFS
- restic with SeaweedFS
- nodejs with Seaweed S3
- S3 API Benchmark
- S3 API FAQ
- S3 Bucket Quota
- S3 API Audit log
- S3 Nginx Proxy
- Docker Compose for S3
AWS IAM
Machine Learning
HDFS
- Hadoop Compatible File System
- run Spark on SeaweedFS
- run HBase on SeaweedFS
- run Presto on SeaweedFS
- Hadoop Benchmark
- HDFS via S3 connector
Replication and Backup
- Async Replication to another Filer [Deprecated]
- Async Backup
- Async Filer Metadata Backup
- Async Replication to Cloud [Deprecated]
- Kubernetes Backups and Recovery with K8up
Messaging
Use Cases
Operations
Advanced
- Large File Handling
- Optimization
- Volume Management
- Tiered Storage
- Cloud Tier
- Cloud Monitoring
- Load Command Line Options from a file
- SRV Service Discovery
- Volume Files Structure