Understanding the question/answering process and its costs

**Can someone explain to me what the process is behind the scenes when calling the OpenAI API?** 

I understand how embedding works (#1). But how much text from the embedding is included in following requests? And why there are for example 2 requests for one question or even 5 when using ChatOpenAI?

Example:

I tried simple question (in Czech because my embeddings are in Czech) *"How old must the camp leader be at least?"*. The chain made two API calls with 5565 tokens in total. And the response was *"The minimum age for the camp leader is 18 according to Junák – český skaut."* It's not very cost effective when using `text-davinci`. For one simple question I pay around 0,11 USD.

<img width="890" alt="req" src="https://user-images.githubusercontent.com/43731495/231674288-a6a8a0f1-8cd8-4b53-a9ae-414219ead0cb.png">

<img width="305" alt="costs" src="https://user-images.githubusercontent.com/43731495/231674341-3c58d3ed-8912-44dd-9300-9c648596a2fc.png">

I simply tried replace `OpenAI()` with `ChatOpenAI()` which uses `gpt-3.5-turbo-0301`. The chain made 5 requests (4,643 prompt + 278 completion = 4,921 tokens). The price is 10x lower and also less tokens are used.

```python
chain = VectorDBQAWithSourcesChain.from_llm(llm=ChatOpenAI(temperature=0), vectorstore=store)
```

**Is it possible to affect how long "embeddings" will be included in the request?**

Thanks for any information.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Understanding the question/answering process and its costs #24

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Understanding the question/answering process and its costs #24

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions