Skip to content

Understanding the question/answering process and its costs #24

Open
@bbscout

Description

@bbscout

Can someone explain to me what the process is behind the scenes when calling the OpenAI API?

I understand how embedding works (#1). But how much text from the embedding is included in following requests? And why there are for example 2 requests for one question or even 5 when using ChatOpenAI?

Example:

I tried simple question (in Czech because my embeddings are in Czech) "How old must the camp leader be at least?". The chain made two API calls with 5565 tokens in total. And the response was "The minimum age for the camp leader is 18 according to Junák – český skaut." It's not very cost effective when using text-davinci. For one simple question I pay around 0,11 USD.

req

costs

I simply tried replace OpenAI() with ChatOpenAI() which uses gpt-3.5-turbo-0301. The chain made 5 requests (4,643 prompt + 278 completion = 4,921 tokens). The price is 10x lower and also less tokens are used.

chain = VectorDBQAWithSourcesChain.from_llm(llm=ChatOpenAI(temperature=0), vectorstore=store)

Is it possible to affect how long "embeddings" will be included in the request?

Thanks for any information.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions