Skip to content

Commit b43d930

Browse files
committed
updates
1 parent 875b796 commit b43d930

File tree

4 files changed

+13
-1
lines changed

4 files changed

+13
-1
lines changed

src/MMLLM/庖丁解牛BLIP2.md

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -140,7 +140,7 @@ class Blip2Qformer(Blip2Base):
140140
141141
为了训练好Q-Former,第一阶段设计了三个训练目标,分别如下:
142142

143-
1、Image-Text Contrastive Learning (ITC Loss, CLIP-like)
143+
#### 1、Image-Text Contrastive Learning (ITC Loss, CLIP-like)
144144

145145
> 目的: Image representation 与 Text representation,以最大化互信息
146146
>
@@ -193,7 +193,19 @@ image_feats 中每个 image_feat 与 text_feat 计算一个 similarity score ,
193193
F.cross_entropy(sim_i2t, targets, label_smoothing=0.1) + F.cross_entropy(sim_t2i, targets, label_smoothing=0.1)
194194
) / 2
195195
```
196+
#### 2、Image-Text Matching (ITM Loss,二分类task)
196197

198+
> 目的:通过学习image-text pair是否match,以细粒度对齐 Image representation 与 Text representation
199+
>
200+
> 自注意力掩码策略: Bi-directional Self-attention Mask(双向自注意力)
201+
>
202+
> Queries 和Text都能和所有的tokens 做attention
203+
> ![Bi-directional Self-attention Mask](庖丁解牛BLIP2/7.png)
204+
205+
206+
每个output query embedding送到二分类器中,得到一个logit;所有logits的平均作为最终的matching score:
207+
208+
![matching score](庖丁解牛BLIP2/8.png)
197209

198210

199211
BertLayer 核心代码实现如下:

src/MMLLM/庖丁解牛BLIP2/4.png

27.8 KB
Loading

src/MMLLM/庖丁解牛BLIP2/7.png

132 KB
Loading

src/MMLLM/庖丁解牛BLIP2/8.png

207 KB
Loading

0 commit comments

Comments
 (0)