Table of Contents
Currently there are two Redis implementations, i.g., redis2
and redis3
. What is the difference?
Storing directory list
Since Redis is a key-value store, the directory list could not be efficiently iterated by scanning the keys. So the directory list is stored as a sorted set in redis2
:
<directory_path>: <sorted set of names>
The sorted set in Redis is fairly efficient. Storing 100K ~ 1 million entries is still fairly fast. In most of the cases, it should work very well with large directories.
Super Large Directories
In some cases, the directory may have 10s of millions or billions of files or sub directories. The directory list would be too big to fit into one Redis entry in one Redis instance. For redis2
, you can specify which folder is super large, and avoid storing this fat list of names, but giving up the capability to list a directory.
This is where redis3
can help. The internal data structure is:
<directory_path>: <skip list>
<skip list item 1>: <sorted set of 1 million names>
<skip list item 2>: <sorted set of 1 million names>
...
The directory list is stored as a skip list, and the child names are spread into the list items. This prevents each directory entry from being too large and slower to access. Skip list has O(log(N))
access time. With each sorted set storing 1 million names, it should scale very well to billions of files in one directory.
Compared to redis2
, there are extra Redis operations to maintain this list:
- Adding or deleting needs one additional lock operation.
- Updating an entry needs to takes
O(log(N))
times to access the skip list item first.
One Redis operation costs about 25 microseconds. The extra Redis operations cost about 4 Redis round trips when running with 1 million items. The cost is relatively small compared to the whole file creation/update/deletion process.
Note
The file read operation is still just one Redis operation, since it does not need to read the list of other directory items.
Introduction
API
Configuration
- Replication
- Store file with a Time To Live
- Failover Master Server
- Erasure coding for warm storage
- Server Startup Setup
- Environment Variables
Filer
- Filer Setup
- Directories and Files
- Data Structure for Large Files
- Filer Data Encryption
- Filer Commands and Operations
- Filer JWT Use
Filer Stores
- Filer Cassandra Setup
- Filer Redis Setup
- Super Large Directories
- Path-Specific Filer Store
- Choosing a Filer Store
- Customize Filer Store
Advanced Filer Configurations
- Migrate to Filer Store
- Add New Filer Store
- Filer Store Replication
- Filer Active Active cross cluster continuous synchronization
- Filer as a Key-Large-Value Store
- Path Specific Configuration
- Filer Change Data Capture
FUSE Mount
WebDAV
Cloud Drive
- Cloud Drive Benefits
- Cloud Drive Architecture
- Configure Remote Storage
- Mount Remote Storage
- Cache Remote Storage
- Cloud Drive Quick Setup
- Gateway to Remote Object Storage
AWS S3 API
- Amazon S3 API
- AWS CLI with SeaweedFS
- s3cmd with SeaweedFS
- rclone with SeaweedFS
- restic with SeaweedFS
- nodejs with Seaweed S3
- S3 API Benchmark
- S3 API FAQ
- S3 Bucket Quota
- S3 API Audit log
- S3 Nginx Proxy
- Docker Compose for S3
AWS IAM
Machine Learning
HDFS
- Hadoop Compatible File System
- run Spark on SeaweedFS
- run HBase on SeaweedFS
- run Presto on SeaweedFS
- Hadoop Benchmark
- HDFS via S3 connector
Replication and Backup
- Async Replication to another Filer [Deprecated]
- Async Backup
- Async Filer Metadata Backup
- Async Replication to Cloud [Deprecated]
- Kubernetes Backups and Recovery with K8up
Messaging
Use Cases
Operations
Advanced
- Large File Handling
- Optimization
- Volume Management
- Tiered Storage
- Cloud Tier
- Cloud Monitoring
- Load Command Line Options from a file
- SRV Service Discovery
- Volume Files Structure