Skip to content

Commit 40380f0

Browse files
Support GPT-4V model (#260)
## Purpose <!-- Describe the intention of the changes being proposed. What problem does it solve or functionality does it add? --> * ... Fixes #257 Fixes #265 Fixes #283 This PR adds a toggle to enable gpt-4v and its related resource deployment and serve as predecessor of ongoing gpt-4v support To enable gpt-4v related resource during provision ``` azd env set USE_GPT4V true // default to false ``` ## Does this introduce a breaking change? <!-- Mark one with an "x". --> ``` [ ] Yes [X] No ``` ## Pull Request Type What kind of change does this Pull Request introduce? <!-- Please check the one that applies to this PR using "x". --> ``` [ ] Bugfix [ ] Feature [ ] Code style update (formatting, local variables) [ ] Refactoring (no functional changes, no api changes) [ ] Documentation content changes [ ] Other... Please describe: ``` ## How to Test * Get the code ``` git clone [repo-address] cd [repo-name] git checkout [branch-name] npm install ``` * Test the code <!-- Add steps to run the tests suite and/or manually test --> ``` ``` ## What to Check Verify that the following are valid * ... ## Other Information <!-- Add any other helpful information that may be needed here. -->
1 parent d6019d3 commit 40380f0

File tree

17 files changed

+366
-102
lines changed

17 files changed

+366
-102
lines changed

README.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@ description: A csharp sample app that chats with your data using OpenAI and AI S
4545
- [Enabling optional features](#enabling-optional-features)
4646
- [Enabling Application Insights](#enabling-optional-features)
4747
- [Enabling authentication](#enabling-authentication)
48+
- [Enable GPT-4V support](#enable-gpt-4v-support)
4849
- [Productionizing](#productionizing)
4950
- [Resources](#resources)
5051
- [FAQ](#faq)
@@ -306,6 +307,35 @@ By default, the deployed Azure container app will have no authentication or acce
306307

307308
To then limit access to a specific set of users or groups, you can follow the steps from [Restrict your Azure AD app to a set of users](https://learn.microsoft.com/azure/active-directory/develop/howto-restrict-your-app-to-a-set-of-users) by changing "Assignment Required?" option under the Enterprise Application, and then assigning users/groups access. Users not granted explicit access will receive the error message -AADSTS50105: Your administrator has configured the application <app_name> to block users unless they are specifically granted ('assigned') access to the application.-
308309

310+
### Enable GPT-4V support
311+
312+
With GPT-4-vision-preview(GPT-4V), it's possible to support an enrichmented retrival augmented generation by providing both text and image as source content. To enable GPT-4V support, you need to enable `USE_VISION` and use `GPT-4V` model when provisioning.
313+
314+
> [!NOTE]
315+
> You would need to re-indexing supporting material and re-deploy the application after enabling GPT-4V support if you have already deployed the application before. This is because enabling GPT-4V support requires new fields to be added to the search index.
316+
317+
To enable GPT-4V support with Azure OpenAI Service, run the following commands:
318+
```bash
319+
azd env set USE_VISION true
320+
azd env set USE_AOAI true
321+
azd env set AZURE_OPENAI_CHATGPT_MODEL_NAME gpt-4
322+
azd env set AZURE_OPENAI_RESOURCE_LOCATION westus # gpt-4-vision-preview is only available in a few regions. Please check the model availability for more details.
323+
azd up
324+
```
325+
326+
To enable GPT-4V support with OpenAI, run the following commands:
327+
```bash
328+
azd env set USE_VISION true
329+
azd env set USE_AOAI false
330+
azd env set OPENAI_CHATGPT_DEPLOYMENT gpt-4-vision-preview
331+
azd up
332+
```
333+
334+
To clean up previously deployed resources, run the following command:
335+
```bash
336+
azd down --purge
337+
azd env set AZD_PREPDOCS_RAN false # This is to ensure that the documents are re-indexed with the new fields.
338+
```
309339
## Productionizing
310340

311341
This sample is designed to be a starting point for your own production application,

app/Dockerfile

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,11 @@ FROM --platform=$BUILDPLATFORM mcr.microsoft.com/dotnet/sdk:8.0 AS build
99
WORKDIR /src
1010
COPY ["Directory.Build.props", "."]
1111
COPY ["Directory.Packages.props", "."]
12+
COPY ["NuGet.config", "."]
1213
COPY ["backend/", "backend/"]
1314
COPY ["frontend/", "frontend/"]
1415
COPY ["shared/", "shared/"]
16+
COPY ["SharedWebComponents", "SharedWebComponents/"]
1517
RUN dotnet restore "backend/MinimalApi.csproj"
1618

1719
WORKDIR "/src/backend"

app/NuGet.config

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
<?xml version="1.0" encoding="utf-8"?>
2+
<configuration>
3+
<packageSources>
4+
<clear />
5+
<add key="dotnet-public" value="https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet-public/nuget/v3/index.json" />
6+
<add key="dotnet-tools" value="https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet-tools/nuget/v3/index.json" />
7+
<add key="nuget" value="https://api.nuget.org/v3/index.json" />
8+
</packageSources>
9+
<packageSourceMapping>
10+
<packageSource key="nuget">
11+
<package pattern="*" />
12+
</packageSource>
13+
</packageSourceMapping>
14+
</configuration>

app/SharedWebComponents/Components/SettingsPanel.razor

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -40,13 +40,10 @@
4040
<MudCheckBox @bind-Checked="@Settings.Overrides.SemanticCaptions" Size="Size.Large"
4141
Color="Color.Primary"
4242
Label="Use query-contrextual summaries instead of whole documents" />
43-
44-
@if (_supportedSettings is not SupportedSettings.Chat)
45-
{
46-
<MudCheckBox @bind-Checked="@Settings.Overrides.SuggestFollowupQuestions" Size="Size.Large"
47-
Color="Color.Primary" Label="Suggest follow-up questions"
48-
aria-label="Suggest follow-up questions checkbox." />
49-
}
43+
44+
<MudCheckBox @bind-Checked="@Settings.Overrides.SuggestFollowupQuestions" Size="Size.Large"
45+
Color="Color.Primary" Label="Suggest follow-up questions"
46+
aria-label="Suggest follow-up questions checkbox." />
5047
</div>
5148
<div class="d-flex align-content-end flex-wrap flex-grow-1 pa-6">
5249
<MudButton Variant="Variant.Filled" Color="Color.Secondary"

app/backend/Extensions/KeyVaultConfigurationBuilderExtensions.cs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ internal static class KeyVaultConfigurationBuilderExtensions
66
{
77
internal static IConfigurationBuilder ConfigureAzureKeyVault(this IConfigurationBuilder builder)
88
{
9-
var azureKeyVaultEndpoint = Environment.GetEnvironmentVariable("AZURE_KEY_VAULT_ENDPOINT");
9+
var azureKeyVaultEndpoint = Environment.GetEnvironmentVariable("AZURE_KEY_VAULT_ENDPOINT") ?? throw new InvalidOperationException("Azure Key Vault endpoint is not set.");
1010
ArgumentNullException.ThrowIfNullOrEmpty(azureKeyVaultEndpoint);
1111

1212
builder.AddAzureKeyVault(

app/backend/Extensions/ServiceCollectionExtensions.cs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -79,10 +79,10 @@ internal static IServiceCollection AddAzureServices(this IServiceCollection serv
7979
services.AddSingleton<ReadRetrieveReadChatService>(sp =>
8080
{
8181
var config = sp.GetRequiredService<IConfiguration>();
82-
var useGPT4V = config["UseGPT4V"] == "true";
82+
var useVision = config["UseVision"] == "true";
8383
var openAIClient = sp.GetRequiredService<OpenAIClient>();
8484
var searchClient = sp.GetRequiredService<ISearchService>();
85-
if (useGPT4V)
85+
if (useVision)
8686
{
8787
var azureComputerVisionServiceEndpoint = config["AzureComputerVisionServiceEndpoint"];
8888
ArgumentNullException.ThrowIfNullOrEmpty(azureComputerVisionServiceEndpoint);

app/backend/Services/AzureBlobStorageService.cs

Lines changed: 41 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -17,43 +17,64 @@ internal async Task<UploadDocumentsResponse> UploadFilesAsync(IEnumerable<IFormF
1717

1818
await using var stream = file.OpenReadStream();
1919

20-
using var documents = PdfReader.Open(stream, PdfDocumentOpenMode.Import);
21-
for (int i = 0; i < documents.PageCount; i++)
20+
// if file is an image (end with .png, .jpg, .jpeg, .gif), upload it to blob storage
21+
if (Path.GetExtension(fileName).ToLower() is ".png" or ".jpg" or ".jpeg" or ".gif")
2222
{
23-
var documentName = BlobNameFromFilePage(fileName, i);
24-
var blobClient = container.GetBlobClient(documentName);
23+
var blobName = BlobNameFromFilePage(fileName);
24+
var blobClient = container.GetBlobClient(blobName);
2525
if (await blobClient.ExistsAsync(cancellationToken))
2626
{
2727
continue;
2828
}
2929

30-
var tempFileName = Path.GetTempFileName();
31-
32-
try
30+
var url = blobClient.Uri.AbsoluteUri;
31+
await using var fileStream = file.OpenReadStream();
32+
await blobClient.UploadAsync(fileStream, new BlobHttpHeaders
3333
{
34-
using var document = new PdfDocument();
35-
document.AddPage(documents.Pages[i]);
36-
document.Save(tempFileName);
34+
ContentType = "image"
35+
}, cancellationToken: cancellationToken);
36+
uploadedFiles.Add(blobName);
37+
}
38+
else if (Path.GetExtension(fileName).ToLower() is ".pdf")
39+
{
40+
using var documents = PdfReader.Open(stream, PdfDocumentOpenMode.Import);
41+
for (int i = 0; i < documents.PageCount; i++)
42+
{
43+
var documentName = BlobNameFromFilePage(fileName, i);
44+
var blobClient = container.GetBlobClient(documentName);
45+
if (await blobClient.ExistsAsync(cancellationToken))
46+
{
47+
continue;
48+
}
3749

38-
await using var tempStream = File.OpenRead(tempFileName);
39-
await blobClient.UploadAsync(tempStream, new BlobHttpHeaders
50+
var tempFileName = Path.GetTempFileName();
51+
52+
try
4053
{
41-
ContentType = "application/pdf"
42-
}, cancellationToken: cancellationToken);
54+
using var document = new PdfDocument();
55+
document.AddPage(documents.Pages[i]);
56+
document.Save(tempFileName);
4357

44-
uploadedFiles.Add(documentName);
45-
}
46-
finally
47-
{
48-
File.Delete(tempFileName);
58+
await using var tempStream = File.OpenRead(tempFileName);
59+
await blobClient.UploadAsync(tempStream, new BlobHttpHeaders
60+
{
61+
ContentType = "application/pdf"
62+
}, cancellationToken: cancellationToken);
63+
64+
uploadedFiles.Add(documentName);
65+
}
66+
finally
67+
{
68+
File.Delete(tempFileName);
69+
}
4970
}
5071
}
5172
}
5273

5374
if (uploadedFiles.Count is 0)
5475
{
5576
return UploadDocumentsResponse.FromError("""
56-
No files were uploaded. Either the files already exist or the files are not PDFs.
77+
No files were uploaded. Either the files already exist or the files are not PDFs or images.
5778
""");
5879
}
5980

app/backend/Services/ReadRetrieveReadChatService.cs

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -177,6 +177,7 @@ You answer needs to be a json object with the following format.
177177
{
178178
MaxTokens = 1024,
179179
Temperature = overrides?.Temperature ?? 0.7,
180+
StopSequences = [],
180181
};
181182

182183
// get answer
@@ -199,7 +200,7 @@ You answer needs to be a json object with the following format.
199200
{ans}
200201
201202
# Format of the response
202-
Return the follow-up question as a json string list.
203+
Return the follow-up question as a json string list. Don't put your answer between ```json and ```, return the json string directly.
203204
e.g.
204205
[
205206
""What is the deductible?"",
@@ -209,6 +210,7 @@ Return the follow-up question as a json string list.
209210

210211
var followUpQuestions = await chat.GetChatMessageContentAsync(
211212
followUpQuestionChat,
213+
promptExecutingSetting,
212214
cancellationToken: cancellationToken);
213215

214216
var followUpQuestionsJson = followUpQuestions.Content ?? throw new InvalidOperationException("Failed to get search query");

app/functions/EmbedFunctions/Program.cs

Lines changed: 48 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -48,34 +48,75 @@ uri is not null
4848
return containerClient;
4949
});
5050

51+
services.AddSingleton<BlobServiceClient>(_ =>
52+
{
53+
return new BlobServiceClient(
54+
GetUriFromEnvironment("AZURE_STORAGE_BLOB_ENDPOINT"), credential);
55+
});
56+
5157
services.AddSingleton<EmbedServiceFactory>();
5258
services.AddSingleton<EmbeddingAggregateService>();
5359

5460
services.AddSingleton<IEmbedService, AzureSearchEmbedService>(provider =>
5561
{
5662
var searchIndexName = Environment.GetEnvironmentVariable("AZURE_SEARCH_INDEX") ?? throw new ArgumentNullException("AZURE_SEARCH_INDEX is null");
57-
var embeddingModelName = Environment.GetEnvironmentVariable("AZURE_OPENAI_EMBEDDING_DEPLOYMENT") ?? throw new ArgumentNullException("AZURE_OPENAI_EMBEDDING_DEPLOYMENT is null");
58-
var openaiEndPoint = Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT") ?? throw new ArgumentNullException("AZURE_OPENAI_ENDPOINT is null");
59-
60-
var openAIClient = new OpenAIClient(new Uri(openaiEndPoint), new DefaultAzureCredential());
63+
var useAOAI = Environment.GetEnvironmentVariable("USE_AOAI")?.ToLower() == "true";
64+
var useVision = Environment.GetEnvironmentVariable("USE_VISION")?.ToLower() == "true";
65+
66+
OpenAIClient? openAIClient = null;
67+
string? embeddingModelName = null;
68+
69+
if (useAOAI)
70+
{
71+
var openaiEndPoint = Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT") ?? throw new ArgumentNullException("AZURE_OPENAI_ENDPOINT is null");
72+
embeddingModelName = Environment.GetEnvironmentVariable("AZURE_OPENAI_EMBEDDING_DEPLOYMENT") ?? throw new ArgumentNullException("AZURE_OPENAI_EMBEDDING_DEPLOYMENT is null");
73+
openAIClient = new OpenAIClient(new Uri(openaiEndPoint), new DefaultAzureCredential());
74+
}
75+
else
76+
{
77+
embeddingModelName = Environment.GetEnvironmentVariable("OPENAI_EMBEDDING_DEPLOYMENT") ?? throw new ArgumentNullException("OPENAI_EMBEDDING_DEPLOYMENT is null");
78+
var openaiKey = Environment.GetEnvironmentVariable("OPENAI_API_KEY") ?? throw new ArgumentNullException("OPENAI_API_KEY is null");
79+
openAIClient = new OpenAIClient(openaiKey);
80+
}
6181

6282
var searchClient = provider.GetRequiredService<SearchClient>();
6383
var searchIndexClient = provider.GetRequiredService<SearchIndexClient>();
64-
var blobContainerClient = provider.GetRequiredService<BlobContainerClient>();
84+
var corpusContainer = provider.GetRequiredService<BlobContainerClient>();
6585
var documentClient = provider.GetRequiredService<DocumentAnalysisClient>();
6686
var logger = provider.GetRequiredService<ILogger<AzureSearchEmbedService>>();
6787

68-
return new AzureSearchEmbedService(
88+
if (useVision)
89+
{
90+
var visionEndpoint = Environment.GetEnvironmentVariable("AZURE_COMPUTER_VISION_ENDPOINT") ?? throw new ArgumentNullException("AZURE_COMPUTER_VISION_ENDPOINT is null");
91+
var httpClient = new HttpClient();
92+
var visionClient = new AzureComputerVisionService(httpClient, visionEndpoint, new DefaultAzureCredential());
93+
94+
return new AzureSearchEmbedService(
95+
openAIClient: openAIClient,
96+
embeddingModelName: embeddingModelName,
97+
searchClient: searchClient,
98+
searchIndexName: searchIndexName,
99+
searchIndexClient: searchIndexClient,
100+
documentAnalysisClient: documentClient,
101+
corpusContainerClient: corpusContainer,
102+
computerVisionService: visionClient,
103+
includeImageEmbeddingsField: true,
104+
logger: logger);
105+
}
106+
else
107+
{
108+
return new AzureSearchEmbedService(
69109
openAIClient: openAIClient,
70110
embeddingModelName: embeddingModelName,
71111
searchClient: searchClient,
72112
searchIndexName: searchIndexName,
73113
searchIndexClient: searchIndexClient,
74114
documentAnalysisClient: documentClient,
75-
corpusContainerClient: blobContainerClient,
115+
corpusContainerClient: corpusContainer,
76116
computerVisionService: null,
77117
includeImageEmbeddingsField: false,
78118
logger: logger);
119+
}
79120
});
80121
})
81122
.ConfigureFunctionsWorkerDefaults()

app/functions/EmbedFunctions/Services/EmbeddingAggregateService.cs

Lines changed: 39 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,8 @@ namespace EmbedFunctions.Services;
44

55
public sealed class EmbeddingAggregateService(
66
EmbedServiceFactory embedServiceFactory,
7-
BlobContainerClient client,
7+
BlobServiceClient blobServiceClient,
8+
BlobContainerClient corpusClient,
89
ILogger<EmbeddingAggregateService> logger)
910
{
1011
internal async Task EmbedBlobAsync(Stream blobStream, string blobName)
@@ -14,23 +15,51 @@ internal async Task EmbedBlobAsync(Stream blobStream, string blobName)
1415
var embeddingType = GetEmbeddingType();
1516
var embedService = embedServiceFactory.GetEmbedService(embeddingType);
1617

17-
var result = await embedService.EmbedPDFBlobAsync(blobStream, blobName);
18+
if (Path.GetExtension(blobName) is ".png" or ".jpg" or ".jpeg" or ".gif")
19+
{
20+
logger.LogInformation("Embedding image: {Name}", blobName);
21+
var contentContainer = blobServiceClient.GetBlobContainerClient("content");
22+
var blobClient = contentContainer.GetBlobClient(blobName);
23+
var uri = blobClient.Uri.AbsoluteUri ?? throw new InvalidOperationException("Blob URI is null.");
24+
var result = await embedService.EmbedImageBlobAsync(blobStream, uri, blobName);
25+
var status = result switch
26+
{
27+
true => DocumentProcessingStatus.Succeeded,
28+
_ => DocumentProcessingStatus.Failed
29+
};
1830

19-
var status = result switch
31+
await corpusClient.SetMetadataAsync(new Dictionary<string, string>
32+
{
33+
[nameof(DocumentProcessingStatus)] = status.ToString(),
34+
[nameof(EmbeddingType)] = embeddingType.ToString()
35+
});
36+
}
37+
else if (Path.GetExtension(blobName) is ".pdf")
2038
{
21-
true => DocumentProcessingStatus.Succeeded,
22-
_ => DocumentProcessingStatus.Failed
23-
};
39+
logger.LogInformation("Embedding pdf: {Name}", blobName);
40+
var result = await embedService.EmbedPDFBlobAsync(blobStream, blobName);
41+
42+
var status = result switch
43+
{
44+
true => DocumentProcessingStatus.Succeeded,
45+
_ => DocumentProcessingStatus.Failed
46+
};
2447

25-
await client.SetMetadataAsync(new Dictionary<string, string>
48+
await corpusClient.SetMetadataAsync(new Dictionary<string, string>
49+
{
50+
[nameof(DocumentProcessingStatus)] = status.ToString(),
51+
[nameof(EmbeddingType)] = embeddingType.ToString()
52+
});
53+
}
54+
else
2655
{
27-
[nameof(DocumentProcessingStatus)] = status.ToString(),
28-
[nameof(EmbeddingType)] = embeddingType.ToString()
29-
});
56+
throw new NotSupportedException("Unsupported file type.");
57+
}
3058
}
3159
catch (Exception ex)
3260
{
3361
logger.LogError(ex, "Failed to embed: {Name}, error: {Message}", blobName, ex.Message);
62+
throw;
3463
}
3564
}
3665

0 commit comments

Comments
 (0)