Commit Graph

866 Commits

Author SHA1 Message Date
Lisandro Pin
9b48ce0613
Parallelize EC shards balancing within racks (#6354)
Parallelize EC shards balancing within racks.
2024-12-15 13:36:23 -08:00
Lisandro Pin
926cfea3dc
Parallelize EC shards balancing across racks. (#6352) 2024-12-13 06:05:32 -08:00
Lisandro Pin
b81def5e5c
Parallelize EC balancing for racks. (#6351) 2024-12-13 05:33:53 -08:00
Konstantin Lebedev
0a4b1909a2
[shell] only apply the balancing for writable volumes (#6346) 2024-12-13 01:10:00 -08:00
Lisandro Pin
b0210df081
Begin implementing EC balancing parallelization support. (#6342)
* Begin implementing EC balancing parallelization support.

Impacts both `ec.encode` and `ec.balance`,

* Nit: improve type naming.

* Make the goroutine workgroup handler for `EcBalance()` a bit smarter/error-proof.

* Nit: unify naming for `ecBalancer` wait group methods with the rest of the module.

* Fix concurrency bug.

* Fix whitespace after Gitlab automerge.

* Delete stray TODO.
2024-12-12 09:14:44 -08:00
Lisandro Pin
23ffbb083c
Limit EC re-balancing for ec.encode to relevant collections when a volume ID argument is provided. (#6347)
Limit EC re-balancing for `ec.encode` to relevant collections when a volume ID is provided.
2024-12-12 08:41:33 -08:00
Lisandro Pin
6320036c56
Delete legacy balancing code for ec.encode. (#6344) 2024-12-12 07:42:03 -08:00
Konstantin Lebedev
700b95304b
[shell] volume.list show only writable volumes (#6338)
* show only writable volumes

* fix import
2024-12-11 09:06:15 -08:00
Konstantin Lebedev
c37281735e
volume.list avoid output empty data center and rack and disk info (#6341) 2024-12-11 09:03:16 -08:00
Lisandro Pin
8c82c037b9
Unify the re-balancing logic for ec.encode with ec.balance. (#6339)
Among others, this enables recent changes related to topology aware
re-balancing at EC encoding time.
2024-12-10 13:30:13 -08:00
Konstantin Lebedev
ff1392f7f4
[shell] use constant for hdd of type (#6337)
use constant for hdd of type
2024-12-10 08:43:59 -08:00
Lisandro Pin
522a25790a
Remove average constraints when selecting nodes/racks to balance EC shards into. (#6325) 2024-12-06 09:00:06 -08:00
Lisandro Pin
34cdbdd279
Share common parameters for EC re-balancing functions under a single struct. (#6319)
TODO cleanup for https://github.com/seaweedfs/seaweedfs/discussions/6179.
2024-12-05 09:00:46 -08:00
Lisandro Pin
edef485333
Account for replication placement settings when balancing EC shards within the same rack. (#6317)
* Account for replication placement settings when balancing EC shards within racks.

* Update help contents for `ec.balance`.

* Add a few more representative test cases for `pickEcNodeToBalanceShardsInto()`.
2024-12-04 10:47:51 -08:00
Lisandro Pin
351efa134d
Account for replication placement settings when balancing EC shards across racks. (#6316) 2024-12-04 09:00:55 -08:00
Lisandro Pin
b2ba7d7408
Resolve replica placement for EC volumes from master server defaults. (#6303) 2024-12-02 08:44:07 -08:00
Lisandro Pin
9a741a61b1
Display details upon failures to re-balance EC shards racks. (#6299) 2024-11-28 08:42:41 -08:00
Lisandro Pin
559a1fd0f4
Improve EC shards rebalancing logic across nodes (#6297)
* Improve EC shards rebalancing logic across nodes.

- Favor target nodes with less preexisting shards, to ensure a fair distribution.
- Randomize selection when multiple possible target nodes are available.
- Add logic to account for replication settings when selecting target nodes (currently disabled).

* Fix minor test typo.

* Clarify internal error messages for `pickEcNodeToBalanceShardsInto()`.
2024-11-27 11:51:57 -08:00
Trim21
d43fa07f06
use readable bytes size string in shell output (#6288) 2024-11-25 17:25:17 -08:00
chrislu
04081128a9 use math rand v2 2024-11-21 08:54:03 -08:00
Lisandro Pin
ca499de1cb
Improve EC shards rebalancing logic across racks (#6270)
Improve EC shards rebalancing logic across racks.

  - Favor target shards with less preexisting shards, to ensure a fair distribution.
  - Randomize selection when multiple possible target shards are available.
  - Add logic to account for replication settings when selecting target shards (currently disabled).
2024-11-21 08:46:24 -08:00
Konstantin Lebedev
a143c888e5
[shell] don't require lock when there are no changes for volume.fix.replication (#6266)
* don't require lock when there are no changes

* revert takeAction
2024-11-21 08:17:25 -08:00
Konstantin Lebedev
254ed8897e
[shell] add noLock param for volume.move (#6261) 2024-11-20 08:35:24 -08:00
chrislu
e28f55eb08 typo 2024-11-19 08:32:31 -08:00
chrislu
98f03862aa rename 2024-11-19 08:31:54 -08:00
chrislu
07cf8cf22d minor 2024-11-19 08:31:33 -08:00
Lisandro Pin
0d5393641e
Unify usage of shell.EcNode.dc as DataCenterId. (#6258) 2024-11-19 06:33:18 -08:00
chrislu
bc7650bd61 adds more info on skipped volumes 2024-11-18 18:25:44 -08:00
Lisandro Pin
f2db746690
Introduce logic to resolve volume replica placement within EC rebalancing. (#6254)
* Rename `command_ec_encode_test.go` to `command_ec_common_test.go`.

All tests defined in this file are now for `command_ec_common.go`.

* Minor code cleanups.

- Fix broken `ec.balance` test.
- Rework integer ceiling division to not use floats, which can introduce precision errors.

* Introduce logic to resolve volume replica placement within EC rebalancing.

This will be used to make rebalancing logic topology-aware.

* Give shell.EcNode.dc a dedicated DataCenterId type.
2024-11-18 18:05:06 -08:00
Chris Lu
72b14a451e
delete aborted ec shards from both source and target servers (#6221)
fix https://github.com/seaweedfs/seaweedfs/issues/6205#issuecomment-2465004586
2024-11-09 11:32:08 -08:00
Konstantin Lebedev
9a5d3e7b31
[shell] add admin noLock for balance (#6209)
add admin noLock for balance
2024-11-05 19:09:22 -08:00
Lisandro Pin
efdebf712e
Refactor ec.balance logic into a weeed/shell/command_ec_common.go… (#6195)
* Refactor `ec.balance` logic into a `weeed/shell/command_ec_common.go` standalone function.

This is a prerequisite to unify the balance logic for `ec.balance` and `ec.encode'.

* s/Balance()/EcBalance()/g
2024-11-04 17:56:20 -08:00
Chris Lu
dc784bf217
merge current message queue code changes (#6201)
* listing files to convert to parquet

* write parquet files

* save logs into parquet files

* pass by value

* compact logs into parquet format

* can skip existing files

* refactor

* refactor

* fix compilation

* when no partition found

* refactor

* add untested parquet file read

* rename package

* refactor

* rename files

* remove unused

* add merged log read func

* parquet wants to know the file size

* rewind by time

* pass in stop ts

* add stop ts

* adjust log

* minor

* adjust log

* skip .parquet files when reading message logs

* skip non message files

* Update subscriber_record.go

* send messages

* skip message data with only ts

* skip non log files

* update parquet-go package

* ensure a valid record type

* add new field to a record type

* Update read_parquet_to_log.go

* fix parquet file name generation

* separating reading parquet and logs

* add key field

* add skipped logs

* use in memory cache

* refactor

* refactor

* refactor

* refactor, and change compact log

* refactor

* rename

* refactor

* fix format

* prefix v to version directory
2024-11-04 12:08:25 -08:00
Konstantin Lebedev
5bddf0c085
[shell] volume.balance collect volume servers by dc rack node (#6191)
* chore: balance by rack

* fix: rm check lock

* fix: selected racks

* fix: selected nodes

* fix: containts

* fix: one collectVolumeServersByDcRackNode

* fix: revert lock and add lock

* fix: panic test

* revert noLock
2024-11-03 11:08:45 -08:00
wyang
c29c912bdc
fix format (#6185)
unitest weed/shell fail
2024-10-31 08:30:35 -07:00
chrislu
9105c6bdd1 fix format 2024-10-28 11:29:08 -07:00
chrislu
089d4316ef ensure 2 volume space since actual need 1.4x volume size empty space 2024-10-24 22:44:53 -07:00
chrislu
6e388e29c9 correcting free volume count, factor it during ec encoding to ensure enough disk space available
fix https://github.com/seaweedfs/seaweedfs/issues/6163
2024-10-24 22:42:38 -07:00
chrislu
8d6189bcc5 adjust output format 2024-10-24 21:41:39 -07:00
chrislu
ae5bd0667a rename proto field from DestroyTime to expire_at_sec
For TTL volume converted into EC volume, this change may leave the volumes staying.
2024-10-24 21:35:11 -07:00
chrislu
73921ce4f6 adjust help message 2024-10-22 08:51:02 -07:00
chrislu
cdd7fa81ab fix help message 2024-09-29 21:53:07 -07:00
chrislu
20929f2a57 adjust resource heavy for volume.fix.replication 2024-09-29 11:32:18 -07:00
chrislu
6564ceda91 skip resource heavy commands from running on master nodes 2024-09-29 10:51:17 -07:00
chrislu
ec30a504ba refactor 2024-09-29 10:38:22 -07:00
chrislu
9cd263b2ce refactor 2024-09-29 10:35:53 -07:00
chrislu
701abbb9df add IsResourceHeavy() to command interface 2024-09-28 20:23:01 -07:00
Max Denushev
f1e700ce2f
Fix/copy before delete replication (#6064)
* fix(shell): volume.fix.replication misplaced volumes unsatisfying replication factor

* fix(shell): simplify replication check

* fix(shell): add test for satisfyReplicaCurrentLocation
2024-09-26 08:34:13 -07:00
Max Denushev
d056c0ddf2
fix(volume): don't persist RO state in specific cases (#6058)
* fix(volume): don't persist RO state in specific cases

* fix(volume): writable always persist
2024-09-24 16:15:54 -07:00
dsd
a3572999bb
Vol check disk bug (#6044)
* fix volume.check.disk

* ensure multiple replica sync

* add comment

---------

Co-authored-by: 邓书东 <shudong_deng@hhnb2024010108.intsig.com>
2024-09-19 08:32:22 -07:00