We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
2 parents 05e94ae + c3a7af2 commit 3f878a0Copy full SHA for 3f878a0
src/MMLLM/庖丁解牛BLIP2.md
@@ -21,3 +21,8 @@ author:
21
> 论文: [https://arxiv.org/abs/2301.12597](https://arxiv.org/abs/2301.12597)
22
> 代码: [https://github.com/salesforce/LAVIS/tree/main/projects/blip2](https://github.com/salesforce/LAVIS/tree/main/projects/blip2)
23
24
+## 背景
25
+
26
+多模态模型在过往发展的过程中,曾有一段时期一直在追求更大的网络架构(image encoder 和 text encoder/decoder)和 数据集,从而导致更大的训练代价。例如CLIP,400M数据,需要数百个GPU训练数十天,如何降低模型训练成本,同时具有很好的性能?
27
28
+这就是BLIP-2的起因,回顾下之前的多模态网络设计,三个模块(图像分支、文本分支、融合模块):
0 commit comments