2 min readfrom Machine Learning

Production AI very different from the demos [D]

Our take

The transition of an AI feature into production can starkly contrast with earlier demos, revealing unexpected cost implications. Initially, small-scale tests with short prompts kept expenses manageable. However, as user traffic surged, longer and often unclear inquiries significantly increased token usage, necessitating context retrieval that further inflated input lengths. Despite starting with GPT-4, which delivered satisfactory responses, the subsequent volume exposed financial challenges. With limited visibility into feature-specific costs, manual reconciliation of token counts has become unsustainable, leaving uncertainty around accurate expenditure tracking.

Moved an AI feature into production a few months ago and the cost profile has been a constant surprise since so the demos and the early prototypes ran cheap because the volume was tiny + the prompts were short but when it hit traffic the token usage scaled a lot. I think it was partly because customers ask longer and unclear questions than our test set because we ended up adding context retrieval that doubled the input length on every call.

We started on GPT4o for the early version and the response quality was good enough that nobody pushed back but after a few weeks of volume the bill came in higher and finance had no way to break out which feature or which model was driving it. I am pulling exports from the OpenAI dashboard and trying to map them back to features manually which is not sustainable.
I shipped the feature and now I am the de facto owner of the cost question. The OpenAI dashboard tells me the total but it does not tell me what I actually need to answer and I spend half a day every week trying to reconcile token counts against feature usage but I am still not confident in the numbers I hand off.

submitted by /u/Far-Football3763
[link] [comments]

Read on the original site

Open the publisher's page for the full experience

View original article

Tagged with

#rows.com#natural language processing for spreadsheets#generative AI for data analysis#Excel alternatives for data analysis#Production AI#token usage#OpenAI dashboard#cost profile#GPT4o#feature usage#context retrieval#response quality#customers#input length#token counts#surprise costs#early prototypes#manual mapping#clarity in questions#finance