Creating a hobbyist-scale storage network with heterogeneous commodity hardware is an exciting project that allows you to repurpose existing hardware resources, learn about distributed systems, and gain hands-on experience with networked storage solutions. This tutorial will guide you through the motivation behind building such a network, choosing appropriate hardware, configuring your environment, and operating the network using Ubuntu and SeaweedFS, a simple and highly scalable distributed file system.
Motivation
- Cost Efficiency: Utilizing heterogeneous commodity hardware allows for cost savings, as you can use existing or second-hand equipment rather than investing in expensive, purpose-built storage solutions.
- Learning and Experimentation: Building and managing a storage network provides invaluable hands-on experience with networked storage, distributed systems, and software-defined storage solutions.
- Scalability and Flexibility: A distributed file system like SeaweedFS enables you to scale out your storage capacity and performance by simply adding more nodes to the network.
- Data Redundancy and Availability: Implementing a storage network can improve data redundancy, fault tolerance, and availability, protecting against data loss and ensuring that data is accessible when needed.
Hardware Choices
- Computers: You can use any mix of desktops, laptops, or even single-board computers like Raspberry Pis. The key is network connectivity and sufficient storage (HDD or SSD) capacity. A reasonable choice for a storage node is a desktop using last generation technology. I have found the Ryzen 5600G to be a nice mixture of cheap, fast and low power. Power consumption is an important factor, as your storage will be on all the time. A $600 system made I made from new parts measures just 20W at the wall socket when idle, yet can easily saturate a typical network.
- Storage Media: Depending on your needs, you can use HDDs for larger, but slower, storage or SSDs for faster access. Ensure there's enough storage capacity for your needs. It's helpful to have some idea of what access patterns you expect, as well as your willingness to accept data loss vs dollars. A mixture of drive brands and ages is a good hedge against cascading failures. The append-only nature of SeaweedFS is well aligned with the write characteristics of modern shingled and stacked hdds.
- Networking Equipment: A reliable gigabit router and switches, plus enough Ethernet cables, are essential for connecting your hardware and ensuring good network performance. A flaky network can lead to hard to debug errors as well as data loss.
Configuration
Preparing Ubuntu
- Install Ubuntu: Install Ubuntu Server on each of your hardware nodes. Ubuntu Server is preferred for its stability, large community support and lack of graphical interface, saving system resources for storage services. Weed has been well tested on Ubuntu LTS releases.
- Network Configuration: Configure static IP addresses for each node to ensure consistent network communication. You can do this by editing the
/etc/netplan/01-netcfg.yaml
file. - Update and Upgrade: Run
sudo apt update && sudo apt upgrade
to ensure all packages are up to date.
Installing SeaweedFS
- Download SeaweedFS: Visit the SeaweedFS GitHub releases page and download the latest version of SeaweedFS for Linux.
- Extract and Install: Extract the downloaded file and move the SeaweedFS binaries to a location in your system's PATH, such as
/usr/local/bin
.
Operations
Starting the Master Server
- Choose one node to act as the master server. This node will manage the metadata.
- Start the master server with:
Replaceweed master -ip=master_node_IP -mdir=/var/lib/seaweedfs/master -defaultReplication=001
master_node_IP
with the IP address of your master node. This will require all writes to be written to any two drives by default. We will look at using erasure codes to increase reliability, but 2x writes is a reasonable starting point.
Starting Volume Servers
- On each storage node, start a volume server that connects to the master:
Adjust the directory as needed for your setup. I typically have data drives mounted with a consistent naming strategy and separate from the operating system storage.weed volume -mserver=master_node_IP:9333 -port=8080 -dir=/data1/seaweedfs/volume
Storing and Accessing Data
- Storing Data: Use the
weed
command to store files. For example:weed upload -master=master_node_IP:9333 file_to_upload
- Accessing Data: Access files through the HTTP interface provided by SeaweedFS or through the file system mount.
Conclusion
By following these steps, you've set up a scalable, flexible, and cost-efficient storage network using heterogeneous commodity hardware, Ubuntu, and SeaweedFS. This setup allows for learning and experimentation with distributed storage systems, providing a foundation that can be expanded or modified to suit your evolving needs. Remember, the configuration and operations can vary based on your specific hardware and network setup, so consider this guide a starting point for your customized storage solution.
Introduction
API
Configuration
- Replication
- Store file with a Time To Live
- Failover Master Server
- Erasure coding for warm storage
- Server Startup Setup
- Environment Variables
Filer
- Filer Setup
- Directories and Files
- Data Structure for Large Files
- Filer Data Encryption
- Filer Commands and Operations
- Filer JWT Use
Filer Stores
- Filer Cassandra Setup
- Filer Redis Setup
- Super Large Directories
- Path-Specific Filer Store
- Choosing a Filer Store
- Customize Filer Store
Advanced Filer Configurations
- Migrate to Filer Store
- Add New Filer Store
- Filer Store Replication
- Filer Active Active cross cluster continuous synchronization
- Filer as a Key-Large-Value Store
- Path Specific Configuration
- Filer Change Data Capture
FUSE Mount
WebDAV
Cloud Drive
- Cloud Drive Benefits
- Cloud Drive Architecture
- Configure Remote Storage
- Mount Remote Storage
- Cache Remote Storage
- Cloud Drive Quick Setup
- Gateway to Remote Object Storage
AWS S3 API
- Amazon S3 API
- AWS CLI with SeaweedFS
- s3cmd with SeaweedFS
- rclone with SeaweedFS
- restic with SeaweedFS
- nodejs with Seaweed S3
- S3 API Benchmark
- S3 API FAQ
- S3 Bucket Quota
- S3 API Audit log
- S3 Nginx Proxy
- Docker Compose for S3
AWS IAM
Machine Learning
HDFS
- Hadoop Compatible File System
- run Spark on SeaweedFS
- run HBase on SeaweedFS
- run Presto on SeaweedFS
- Hadoop Benchmark
- HDFS via S3 connector
Replication and Backup
- Async Replication to another Filer [Deprecated]
- Async Backup
- Async Filer Metadata Backup
- Async Replication to Cloud [Deprecated]
- Kubernetes Backups and Recovery with K8up
Messaging
Use Cases
Operations
Advanced
- Large File Handling
- Optimization
- Volume Management
- Tiered Storage
- Cloud Tier
- Cloud Monitoring
- Load Command Line Options from a file
- SRV Service Discovery
- Volume Files Structure