Fix graph fusion with commutative ops #24577
### Pull Request Readiness Checklist
resolves https://github.com/opencv/opencv/issues/24568
**Merge with extra**: https://github.com/opencv/opencv_extra/pull/1125
TODO:
- [x] replace recursive function to sequential
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
dnn test: move layer norm tests into conformance tests #24544
Merge with https://github.com/opencv/opencv_extra/pull/1122
## Motivation
Some ONNX operators, such as `LayerNormalization`, `BatchNormalization` and so on, produce outputs for training (mean, stdev). So they have reference outputs of conformance tests for those training outputs as well. However, when it comes to inference, we do not need and produce those outputs for training here in dnn. Hence, output size does not match if we use dnn to infer those conformance models. This has become the barrier if we want to test these operators using their conformance tests.
<!--
| Operator | Inference needed | Outputs (required - total) | Optional outputs for training? |
| ----------------------- | ----------------------------------- | -------------------------- | ------------------------------ |
| BatchNormalization | Yes | 1 - 3 | Yes |
| Dropout | Maybe, can be eliminated via fusion | 1 - 2 | Yes |
| GRU | Yes | 0 - 2 | No |
| LSTM | Yes | 0 - 3 | No |
| LayerNormalization | Yes | 1 - 3 | Yes |
| MaxPool | Yes | 1 - 2 | Yes |
| RNN | Yes | 0 - 2 | No |
| SoftmaxCrossEntropyLoss | No | 1 - 2 | -- |
-->
**I checked all ONNX operators with optional outputs. Turns out there are only `BatchNormalization`, `Dropout`, `LayerNormalization` and `MaxPool` has optional outputs for training. All except `LayerNormalization` have models set for training mode and eval mode. Blame ONNX for that.**
## Solution
In this pull request, we remove graph outputs if the graph looks like the following:
```
[X] [Scale] [Bias] [X] [Scale] [Bias]
\ | / this patch \ | /
LayerNormalization -----------> LayerNormalization
/ | \ |
[Y] [Mean] [Stdev] [Y]
```
We can update conformance tests and turn on some cases as well if extending to more layers.
Notes:
1. This workaround does not solve expanded function operators if they are fused into a single operator, such as `$onnx/onnx/backend/test/data/node/test_layer_normalization_2d_axis1_expanded`, but they can be run without fusion. Note that either dnn or onnxruntime does not fuse those expanded function operators.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
dnn (onnx): add subgraph fusion tests #24500
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake