Free users, they're great, also a financial burden. Maybe ONNX Web Runtime may be the solution for you?

Not sponsored… yet?

Throughout my journey in the NVIDIA + AutoHDR hackathon, I had an absolute pleasure working along some of the greatest minds in Austin TX last weekend, Dec 17th.

I was objected towards developing out a concept about optimizing real estate photos, where my teammate and I decided to pursue a complex adventure over the weekend.

The idea was simple, make a better real estate photo editing tool then what's currently available, the winner receiving a nice cash prize and a wonderful salary development role. Although my teammate and I didn't ultimately win, we found great pleasure and fulfillment in what we did (See application here). An AI implementation into the clients web-browser. It was complex, beautiful, elegant even, but most importantly, it was a nightmare worth pursuing!

The ONNX Discovery

We found that ONNX web runtime was an exceptionally useful tool, it provides a way to convert an AI model (.pth) into an Open Neural Network Exchange (.onnx), and ultimately enabled an opportunity for an AIaaS business model to reduce its API/AI image generation costs on behalf of free clients, by offloading the token generation costs from major enterprises, directly injecting it into the users web browser.

We did some numbers, expecting that an application like the project we're building for would have around 50k trial users monthly generating 5 images each, it would ultimately cost around $2,500 in API calls alone. Then we had a simply brilliant idea. Offload the costs directly to the users. They download the .onnx file type in the background without even knowing/caring. Although it would be a bit more energy expensive on their behalf, it entirely removes any costs we have.

Environmental Impact & Scalability

Then, we realized, there is a few amazing benefits to this architecture of scalability. With the frenzy of major AI/media companies buying farmer and community land and draining up its resources, there can be a collective mission to offset these environmental issues by distributing AI processing units directly to the consumers.

There is risk of course with the processing power that is limited in most user devices, making LLMs impractical, however, if you can offset the APIs of 80–90% of trialed users, you can justify creating a benchmark device testing that if the users cannot run it, then send them the traditional route.

The Bigger Picture

Anyways, there's something solid in this concept. Imagine if your users can interact with their own chat bot whenever they go to your website, you would save 80/95% of costs by offloading customer support AI agents, photo editing AI, data structuring, general purpose AIs, directly to your user base.

Hope this was a helpful or insightful read. I thought it was interesting :)

Have a good one!

Reach out to Arthur Labs if you have any questions regarding implementation, or looking for a contract developer who can make this possible.