@@ -23,11 +23,6 @@ the resulting `memref` IR has no memory leaks.
23
23
24
24
## Deprecated Passes
25
25
26
- The old dialect conversion-based bufferization passes have been deprecated and
27
- should not be used anymore. Most of those passes have already been removed from
28
- MLIR. One-Shot Bufferize produces in better bufferization results with fewer
29
- memory allocations and buffer copies.
30
-
31
26
The buffer deallocation pass has been deprecated in favor of the ownership-based
32
27
buffer deallocation pipeline. The deprecated pass has some limitations that may
33
28
cause memory leaks in the resulting IR.
@@ -276,18 +271,13 @@ semantics (i.e., tensor result or tensor operand) that is not bufferizable
276
271
` to_memref ` /` to_tensor ` ops around the bufferization boundary.
277
272
278
273
One-Shot Bufferize can be configured to bufferize only ops from a set of
279
- dialects with ` dialect-filter ` . This can be useful for gradually migrating from
280
- dialect conversion-based bufferization to One-Shot Bufferize. One-Shot Bufferize
281
- must run first in such a case, because dialect conversion-based bufferization
282
- generates ` to_tensor ` ops without the ` restrict ` unit attribute, which One-Shot
283
- Bufferize cannot analyze.
274
+ dialects with ` dialect-filter ` .
284
275
285
276
One-Shot Bufferize can also be called programmatically with
286
277
[ ` bufferization::runOneShotBufferize ` ] ( https://github.com/llvm/llvm-project/blob/ae2764e835a26bad9774803eca0a6530df2a3e2d/mlir/include/mlir/Dialect/Bufferization/Transforms/OneShotAnalysis.h#L167 ) .
287
278
Alternatively,
288
279
[ ` bufferization::bufferizeOp ` ] ( https://github.com/llvm/llvm-project/blob/ae2764e835a26bad9774803eca0a6530df2a3e2d/mlir/include/mlir/Dialect/Bufferization/Transforms/Bufferize.h#L78 )
289
- skips the analysis and inserts a copy on every buffer write, just like the
290
- dialect conversion-based bufferization.
280
+ skips the analysis and inserts a copy on every buffer write.
291
281
292
282
By default, function boundaries are not bufferized. This is because there are
293
283
currently limitations around function graph bufferization: recursive
@@ -484,259 +474,3 @@ conflict detection algorithm, interested users may want to refer to:
484
474
* [ Original design document] ( https://discourse.llvm.org/uploads/short-url/5kckJ3DftYwQokG252teFgw3sYa.pdf )
485
475
* [ ODM talk] ( https://youtu.be/TXEo59CYS9A ) , ([ slides] ( https://mlir.llvm.org/OpenMeetings/2022-01-13-One-Shot-Bufferization.pdf ) ).
486
476
* [ LLVM Dev Meeting 2023 tutorial slides] ( https://m-sp.org/downloads/llvm_dev_2023.pdf )
487
-
488
- ## Migrating from Dialect Conversion-based Bufferization
489
-
490
- Both dialect conversion-based bufferization and One-Shot Bufferize generate
491
- ` to_tensor ` /` to_memref ` ops at the bufferization boundary (when run with
492
- ` allow-unknown-ops ` ). They can be combined and run in sequence. However,
493
- One-Shot Bufferize must run first because it cannot analyze those boundary ops.
494
- To update existing code step-by-step, it may be useful to specify a dialect
495
- filter for One-Shot Bufferize, so that dialects can be switched over one-by-one.
496
-
497
- ## Dialect Conversion-based Bufferization
498
-
499
- Disclaimer: Most dialect conversion-based bufferization has been migrated to
500
- One-Shot Bufferize. New users should use One-Shot Bufferize (with or without
501
- analysis). The following documentation is only for existing users of dialect
502
- conversion-based bufferization.
503
-
504
- This system is a simple application of MLIR's dialect conversion infrastructure.
505
- The bulk of the code related to bufferization is a set of ordinary
506
- ` ConversionPattern ` 's that dialect authors write for converting ops that operate
507
- on ` tensor ` 's to ops that operate on ` memref ` 's. A set of conventions and best
508
- practices are followed that allow these patterns to be run across multiple
509
- independent passes (rather than requiring a single huge atomic conversion pass),
510
- which makes the compilation pipelines scalable, robust, and easy to debug.
511
-
512
- This document is targeted at people looking to utilize MLIR's bufferization
513
- functionality, along with people who want to extend it to cover their own ops.
514
-
515
- <a name =" the-talk " >** NOTE:** </a > Before reading this document, please watch the
516
- talk "Type Conversions the Not-So-Hard-Way: MLIR's New Bufferization
517
- Infrastructure"
518
- ([ slides] ( https://drive.google.com/file/d/1FVbzCXxZzS9LBLuvpPNLWJD-XDkt54ky/view?usp=sharing ) ,
519
- [ recording] ( https://drive.google.com/file/d/1VfVajitgf8ZPnd-HRkJvaJiFLhBsluXN/view?usp=sharing ) ).
520
- That talk gives a high-level overview of the bufferization infrastructure and
521
- important conceptual details related to using the MLIR dialect conversion
522
- infrastructure.
523
-
524
- ### Bufferization's place in a compilation pipeline
525
-
526
- Bufferization itself does not free any of the buffers that have been allocated,
527
- nor does it do anything particularly intelligent with the placement of buffers
528
- w.r.t. control flow. Thus, a realistic compilation pipeline will usually consist
529
- of:
530
-
531
- 1 . Bufferization
532
- 1 . Buffer optimizations such as ` buffer-hoisting ` , ` buffer-loop-hoisting ` , and
533
- ` promote-buffers-to-stack ` , which do optimizations that are only exposed
534
- after bufferization.
535
- 1 . Finally, running the [ ownership-based buffer deallocation] ( OwnershipBasedBufferDeallocation.md )
536
- pass.
537
-
538
- After buffer deallocation has been completed, the program will be quite
539
- difficult to transform due to the presence of the deallocation ops. Thus, other
540
- optimizations such as linalg fusion on memrefs should be done before that stage.
541
-
542
- ### General structure of the bufferization process
543
-
544
- Bufferization consists of running multiple * partial* bufferization passes,
545
- followed by one * finalizing* bufferization pass.
546
-
547
- There is typically one partial bufferization pass per dialect (though other
548
- subdivisions are possible). For example, for a dialect ` X ` there will typically
549
- be a pass ` X-bufferize ` that knows how to bufferize all the ops in that dialect.
550
- By running pass ` X-bufferize ` for each dialect ` X ` in the program, all the ops
551
- in the program are incrementally bufferized.
552
-
553
- Partial bufferization passes create programs where only some ops have been
554
- bufferized. These passes will create * materializations* (also sometimes called
555
- "casts") that convert between the ` tensor ` and ` memref ` type, which allows
556
- bridging between ops that have been bufferized and ops that have not yet been
557
- bufferized.
558
-
559
- Finalizing bufferizations complete the bufferization process, and guarantee that
560
- there are no tensors remaining in the program. This involves eliminating the
561
- materializations. The pass ` finalizing-bufferize ` provides a minimal pass that
562
- only eliminates materializations and issues an error if any unbufferized ops
563
- exist in the program.
564
-
565
- However, it is possible for a finalizing bufferization to do more than just
566
- eliminate materializations. By adding patterns (just as a partial bufferization
567
- would), it is possible for a finalizing bufferization pass to simultaneously
568
- bufferize ops and eliminate materializations. This has a number of disadvantages
569
- discussed in the talk and should generally be avoided.
570
-
571
- ### Example
572
-
573
- As a concrete example, we will look at the bufferization pipeline from the
574
- ` mlir-npcomp ` reference backend
575
- ([ code] ( https://github.com/llvm/mlir-npcomp/blob/97d6d04d41216e73d40b89ffd79620973fc14ce3/lib/RefBackend/RefBackend.cpp#L232 ) ).
576
- The code, slightly simplified and annotated, is reproduced here:
577
-
578
- ``` c++
579
- // Partial bufferization passes.
580
- pm.addPass(createTensorConstantBufferizePass());
581
- pm.addNestedPass<func::FuncOp>(createTCPBufferizePass()); // Bufferizes the downstream `tcp` dialect.
582
- pm.addNestedPass<func::FuncOp>(createLinalgBufferizePass());
583
- pm.addNestedPass<func::FuncOp>(createTensorBufferizePass());
584
- pm.addPass(createFuncBufferizePass());
585
-
586
- // Finalizing bufferization pass.
587
- pm.addNestedPass<func::FuncOp>(createFinalizingBufferizePass());
588
- ```
589
-
590
- Looking first at the partial bufferization passes, we see that there are a
591
- sequence of ` FuncOp ` passes (which run in parallel on functions). These function
592
- passes are bracketed by ` arith-bufferize ` and ` func-bufferize ` , which are module
593
- passes (and thus serialize the parallel compilation process). These two passes
594
- must be module passes because they make changes to the top-level module.
595
-
596
- The bulk of the bufferization work is done by the function passes. Most of these
597
- passes are provided as part of the upstream MLIR distribution and bufferize
598
- their respective dialects (e.g. ` abc-bufferize ` bufferizes the ` abc ` dialect).
599
- The ` tcp-bufferize ` pass is an exception -- it is a partial bufferization pass
600
- used to bufferize the downstream ` tcp ` dialect, and fits in perfectly with all
601
- the other passes provided upstream.
602
-
603
- The last pass is the finalizing bufferization pass. The ` mlir-npcomp ` reference
604
- backend has arranged that all ops are bufferized by partial bufferizations, so
605
- that the upstream ` finalizing-bufferize ` pass can be used as the finalizing
606
- bufferization pass. This gives excellent diagnostics when something goes wrong
607
- with the bufferization process, such as due to an op that wasn't handled by any
608
- pattern.
609
-
610
- ### How to write a partial bufferization pass
611
-
612
- The contract of a partial bufferization pass is that a subset of ops (or kinds
613
- of ops, customizable by a ConversionTarget) get bufferized.
614
-
615
- A partial bufferization pass is just a pass that uses the
616
- [ dialect conversion] ( DialectConversion.md ) framework to apply
617
- ` ConversionPattern ` s with a ` tensor ` to ` memref ` type conversion.
618
-
619
- To describe how to write such a pass, we will walk through an example, the
620
- ` tensor-bufferize ` pass
621
- ([ code] ( https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/lib/Dialect/Tensor/Transforms/Bufferize.cpp#L23 ) ,
622
- [ test] ( https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/test/Dialect/Tensor/bufferize.mlir#L1 ) )
623
- that bufferizes the ` tensor ` dialect. Note that these passes have been replaced
624
- with a ` BufferizableOpInterface ` -based implementation in the meantime, so we
625
- have to take a looker at an older version of the code.
626
-
627
- The bulk of the code in the pass will be a set of conversion patterns, with a
628
- simple example being
629
- [ BufferizeCastOp] ( https://github.com/llvm/llvm-project/blob/2bf6e443e54604c7818c4d1a1837f3d091023270/mlir/lib/Dialect/Tensor/Transforms/Bufferize.cpp#L23 ) ).
630
-
631
- ```
632
- class BufferizeCastOp : public OpConversionPattern<tensor::CastOp> {
633
- public:
634
- using OpConversionPattern::OpConversionPattern;
635
- LogicalResult
636
- matchAndRewrite(tensor::CastOp op, OpAdaptor adaptor,
637
- ConversionPatternRewriter &rewriter) const override {
638
- auto resultType = getTypeConverter()->convertType(op.getType());
639
- rewriter.replaceOpWithNewOp<MemRefCastOp>(op, resultType, adaptor.source());
640
- return success();
641
- }
642
- };
643
- ```
644
-
645
- See [ the talk] ( #the-talk ) for more details on how to write these patterns.
646
-
647
- The
648
- [ pass itself] ( https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/lib/Dialect/Tensor/Transforms/Bufferize.cpp#L57 )
649
- is very small, and follows the basic pattern of any dialect conversion pass.
650
-
651
- ```
652
- void mlir::populateTensorBufferizePatterns(
653
- const BufferizeTypeConverter &typeConverter, RewritePatternSet &patterns) {
654
- patterns.add<BufferizeCastOp, BufferizeExtractOp>(typeConverter,
655
- patterns.getContext());
656
- }
657
-
658
- struct TensorBufferizePass : public TensorBufferizeBase<TensorBufferizePass> {
659
- void runOnOperation() override {
660
- auto *context = &getContext();
661
- BufferizeTypeConverter typeConverter;
662
- RewritePatternSet patterns(context);
663
- ConversionTarget target(*context);
664
-
665
- populateTensorBufferizePatterns(typeConverter, patterns);
666
- target.addIllegalOp<tensor::CastOp, tensor::ExtractOp>();
667
- target.addLegalDialect<func::FuncDialect>();
668
-
669
- if (failed(
670
- applyPartialConversion(getOperation(), target, std::move(patterns))))
671
- signalPassFailure();
672
- }
673
- };
674
- ```
675
-
676
- The pass has all the hallmarks of a dialect conversion pass that does type
677
- conversions: a ` TypeConverter ` , a ` RewritePatternSet ` , and a ` ConversionTarget ` ,
678
- and a call to ` applyPartialConversion ` . Note that a function
679
- ` populateTensorBufferizePatterns ` is separated, so that power users can use the
680
- patterns independently, if necessary (such as to combine multiple sets of
681
- conversion patterns into a single conversion call, for performance).
682
-
683
- One convenient utility provided by the MLIR bufferization infrastructure is the
684
- ` BufferizeTypeConverter ` , which comes pre-loaded with the necessary conversions
685
- and materializations between ` tensor ` and ` memref ` .
686
-
687
- In this case, the ` BufferizationOpsDialect ` is marked as legal, so the
688
- ` bufferization.to_tensor ` and ` bufferization.to_memref ` ops, which are inserted
689
- automatically by the dialect conversion framework as materializations, are
690
- legal. There is a helper ` populateBufferizeMaterializationLegality `
691
- ([ code] ( https://github.com/llvm/llvm-project/blob/a0b65a7bcd6065688189b3d678c42ed6af9603db/mlir/include/mlir/Transforms/Bufferize.h#L53 ) )
692
- which helps with this in general.
693
-
694
- ### Other partial bufferization examples
695
-
696
- - ` func-bufferize `
697
- ([ code] ( https://github.com/llvm/llvm-project/blob/2f5715dc78328215d51d5664c72c632a6dac1046/mlir/lib/Dialect/Func/Transforms/FuncBufferize.cpp#L1 ) ,
698
- [ test] ( https://github.com/llvm/llvm-project/blob/2f5715dc78328215d51d5664c72c632a6dac1046/mlir/test/Dialect/Func/func-bufferize.mlir#L1 ) )
699
-
700
- - Bufferizes ` func ` , ` call ` , and ` BranchOpInterface ` ops.
701
- - This is an example of how to bufferize ops that have multi-block
702
- regions.
703
- - This is an example of a pass that is not split along dialect
704
- subdivisions.
705
-
706
- ### How to write a finalizing bufferization pass
707
-
708
- The contract of a finalizing bufferization pass is that all tensors are gone
709
- from the program.
710
-
711
- The easiest way to write a finalizing bufferize pass is to not write one at all!
712
- MLIR provides a pass ` finalizing-bufferize ` which eliminates the
713
- ` bufferization.to_tensor ` / ` bufferization.to_memref ` materialization ops
714
- inserted by partial bufferization passes and emits an error if that is not
715
- sufficient to remove all tensors from the program.
716
-
717
- This pass is sufficient when partial bufferization passes have bufferized all
718
- the ops in the program, leaving behind only the materializations. When possible,
719
- it is recommended to structure your pass pipeline this way, as this has the
720
- significant advantage that if an op does not get bufferized (due to a missing
721
- pattern, bug in the code, etc.), ` finalizing-bufferize ` will emit a nice clean
722
- error, and the IR seen by ` finalizing-bufferize ` will only contain only one
723
- unbufferized op.
724
-
725
- However, before the current bufferization infrastructure was put in place,
726
- bufferization could only be done as a single finalizing bufferization mega-pass
727
- that used the ` populate*BufferizePatterns ` functions from multiple dialects to
728
- simultaneously bufferize everything at once. Thus, one might see code in
729
- downstream projects structured this way. This structure is not recommended in
730
- new code. A helper, ` populateEliminateBufferizeMaterializationsPatterns `
731
- ([ code] ( https://github.com/llvm/llvm-project/blob/a0b65a7bcd6065688189b3d678c42ed6af9603db/mlir/include/mlir/Transforms/Bufferize.h#L58 ) )
732
- is available for such passes to provide patterns that eliminate
733
- ` bufferization.to_tensor ` and ` bufferization.to_memref ` .
734
-
735
- ### Changes since [ the talk] ( #the-talk )
736
-
737
- - ` func-bufferize ` was changed to be a partial conversion pass, and there is a
738
- new ` finalizing-bufferize ` which serves as a general finalizing
739
- bufferization pass.
740
- - Most partial bufferization passes have been reimplemented in terms of
741
- ` BufferizableOpInterface ` . New users should use One-Shot Bufferize instead
742
- of dialect conversion-based bufferization.
0 commit comments