We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent 21bbfb3 commit 671bd96Copy full SHA for 671bd96
src/MMLLM/庖丁解牛BLIP2.md
@@ -335,7 +335,22 @@ class BertEmbeddings(nn.Module):
335
```
336
下图展示了 Image-Text Matching 的完整计算流程,关于BertModel的代码解析部分,将会在下文进行详细讲解:
337
338
-
+
339
+
340
+#### 3、Image-Grounded Text Generation (ITG Loss, GPT-like)
341
342
+> - 目的:让Q-Former学习“图生文”的能力,即给定Input Image,生成Text
343
+>
344
+> - 自注意力掩码策略:Multimodal Causal Self-attention Mask(多模态因果自监督)
345
346
+> - Queies 可以和所有自己的tokens做attention
347
348
+> - Text 可以和所有的query tokens 及 当前token之前的text tokens做attention
349
350
+> 
351
352
353
354
355
356
src/MMLLM/庖丁解牛BLIP2/10.png
126 KB
0 commit comments