Skip to content

Commit 3f878a0

Browse files
authored
Merge pull request #1 from BinaryOracle/master
Master
2 parents 05e94ae + c3a7af2 commit 3f878a0

File tree

1 file changed

+5
-0
lines changed

1 file changed

+5
-0
lines changed

src/MMLLM/庖丁解牛BLIP2.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,3 +21,8 @@ author:
2121
> 论文: [https://arxiv.org/abs/2301.12597](https://arxiv.org/abs/2301.12597)
2222
> 代码: [https://github.com/salesforce/LAVIS/tree/main/projects/blip2](https://github.com/salesforce/LAVIS/tree/main/projects/blip2)
2323
24+
## 背景
25+
26+
多模态模型在过往发展的过程中,曾有一段时期一直在追求更大的网络架构(image encoder 和 text encoder/decoder)和 数据集,从而导致更大的训练代价。例如CLIP,400M数据,需要数百个GPU训练数十天,如何降低模型训练成本,同时具有很好的性能?
27+
28+
这就是BLIP-2的起因,回顾下之前的多模态网络设计,三个模块(图像分支、文本分支、融合模块):

0 commit comments

Comments
 (0)