AI as a Service — In the context of LLM & Generative AI

13 min readApr 7, 2024

The article offers an analysis of how AIaaS (AI as a Service) is evolving, driving and demonstrating promising mutual business expansion through the utilization of AI-powered innovation and development.

What is AI as a Service

AIaaS (AI as a Service) refers to the provision of artificial intelligence (AI) capabilities as cloud-based services to users on a subscription basis. Just like other “as a Service” offerings such as Software as a Service (SaaS), Platform as a Service (PaaS), or Infrastructure as a Service (IaaS), AIaaS provides a convenient and scalable way for individuals and businesses to access and utilize AI technologies without the need for heavy investment in infrastructure, expertise, or development.

Origins of the Trend

In November 2022, OpenAI unveiled ChatGPT, a specialized instance of their Generative Pre-Trained Transformer (GPT) model tailored for conversational interactions. This launch sparked tremendous excitement within the data science community, marking the official onset of the Large Language Model (LLM) wave.

Following this milestone, rapid advancements in both commercial and open-source communities unfolded around LLM and Generative AI technologies. Among the verticals that surged alongside Generative AI and LLM, AI as a Service (AIaaS) emerged as a prominent force, contributing to the wave of innovation.

Before this, AI as a Service (AIaaS) was in a relatively immature phase. Only a few major companies offered such services through their cloud platforms, focusing mainly on specific applications like OCR, language translation, or speech-related services. Even before ChatGPT, OpenAI had introduced another Large Language Model (LLM) called GPT-3 in 2019, which was accessible as a service through API calls. Although, that did not get much attention and still most of the models and associated developments were open source.

OpenAI set the cornerstone by introducing GPT-3.5-Turbo and its various versions, accessible through API calls. This move not only accelerated innovation, development, and business expansion utilizing these models but also paved the way for a distinct avenue, enabling enterprises of all sizes to create AI-related products and offer them through API services.

Timeline

Following the release of ChatGPT, within a mere four months, a whirlwind of dramatic and rapid changes are observed, giving rise to a myriad of related Large Language Models (LLMs).

As illustrated in the timeline above, within a mere 4-month span, numerous organizations such as Cohere, Amazon, Meta, Google, Microsoft, Salesforce, Anthropic, and Bloomberg swiftly introduced their own models. These models, whether open-source or commercial, became readily accessible through API services. Notably, March 2023 alone witnessed significant strides, marked by six major advancements. These included the introduction of new customer relationship management solutions and enhanced support tailored for the financial services sector.

Generative AI, the new leg of AI

We’re all aware of how generative AI is driving business growth through innovation. Continuously, new use cases are envisioned thanks to the blessings of generative AI and Large Language Models (LLMs). However, one major challenge with these expansive generative models is the difficulty of creating one from scratch. It demands significant costs and time, and there’s no turning back if the model creation fails to deliver expected results.

In short, not every organization can afford to develop their own LLM tailored to their needs. This challenge has led to the rise of AI models developed in-house captive mode and offered as subscription services via APIs. This shift is fueling significant growth and innovation in the field of AI as a Service (AIaaS).

Current trend of AIaaS

I aim to discuss several valuable and trending AI as a Service (AIaaS) offerings currently prevalent, particularly within the Generative AI and LLM realm. Additionally, I’ll categorize these services for clearer comprehension, outlining their names and how they can be advantageous in developing use cases.

Let us try to understand and categorize the technical segments on which different organizations are trying to create AI services.

Large Generative Models as a Service: This stands as a pioneering service within the realm of Generative AI and LLM. OpenAI pioneered by introducing subscription-based offerings for its inaugural proprietary LLM, GPT-3, in 2019. At that juncture, all transformer-based models, whether discriminative or generative, resided within the open-source community.
Although GPT-3 initially didn’t get the attention it deserved, subsequent developments, particularly with the introduction of ChatGPT, saw OpenAI unveil their GPT-3.5-Turbo models and their variants as subscription-based API services. This marked the onset of a revolution in Large Generative Models as a Service, prompting enterprises of all sizes to embark on a similar trajectory.
OpenAI: We all know about OpenAI large models. You can get a list of all the models here in the documentation. That includes language models, speech models and vision models.
Cohere: Cohere is one of the early adopters to provide large generative models as a service. List of all the models including the latest command r models you can get here in the documentation.
Anthropic: Anthropic is also one of the early adopters, that created a massive attention by introducing huge context length large models. Its recent model claude 3 is one of the big competitors in the market. You can get the model details here in the documentation.
Google: Although Google started with PaLM and its variant which has inclined towards open-source community, but after Bard (Similar to ChaGPT) Google introduced Gemini. According to Google, it is highly advanced and is on par with GPT-4 and also exceeds it in certain scenarios.
Mistral: Similar to Google, Mistral also started with their own open models to capture the market sentiment and released there commercial models as part of their API services.
These are the list of top commercial services towards large models. Although few names that we can’t escape includes Huggingchat, StabilityAI, Mosaic ML, Cerebras, Aleph Alpha, AI21 Labs etc.
Embedding Models as a Service: This is similar to the above. A specific model that translates data (text, images) into numerical vectors, commonly known as embeddings. With the rise of Generative AI, Retrieval Augmented Generation (RAG) has garnered significant attention (more on RAG to come), with embedding serving as a crucial component. High-quality embeddings notably enhance overall performance, prompting enterprises to develop large-scale embedding models and offer them through API services.
OpenAI: text-embedd-ada-002 was one of the proprietary embedding models that OpenAI first came up with. List of all embedding models can be found here in the documentation.
Cohere: Cohere is also embracing the trend of embedding models as a service. Although its input context length is relatively smaller compared to OpenAI embedding models, Cohere stands out by offering a wider range of variants and distinctions, including various versions and robust multilingual support. List of all embedding models can be found here in the documentation.
Anthropic & VoyageAI: Anthropic does not provide its own embedding model. However, one notable embeddings provider that excels in offering a diverse range of options and capabilities, covering all four considerations mentioned above, is Voyage AI. Voyage AI specializes in state-of-the-art embedding models and provides tailored solutions for specific industry domains such as finance and healthcare. Additionally, they offer bespoke fine-tuned models catering to individual customer needs. How to use along with Anthropic and VoyageAI specific embedding models.
Mistral: Mistral has started their commercial embedding models as well and might come with more variations in this space.
Vector DB as a Service: Vector Database is one of the important components in RAG and got heavy traction as and when RAG & Generative AI evolves. A vector database, also known as a vector data store or vector-based database, is a type of database optimized for storing and querying vector data. In the context of computer science and data management, a vector refers to an ordered collection of values, often representing attributes or features of objects in space. In the last part we understood embeddings, those are stored as vectors in this DB.
Pinecone: One of the early adopters of vector DB as a service. Pinecone is quite popular and easy to use for LLM powered application. If you are looking for a high-performance, scalable, and flexible vector database, then Pinecone Vector Databases are a good option to consider.
Qdrant: Qdrant is one of the popular vector DBs i the market. It is opensource as well provides “as a service” sandbox for development as well. It supports high-performance similarity search using advanced indexing techniques like HNSW and PQ. With scalability for distributed deployments, it can handle complex, multi-dimensional vectors and offers flexible querying via a RESTful API. Users can define custom distance metrics and seamlessly integrate with popular ML libraries. Overall, Qdrant provides a powerful solution for applications requiring fast and scalable vector similarity search.
Weaviate: Weaviate is an open-source vector database designed for real-time storage and querying of vector-based data. It enables semantic search by capturing semantic relationships between entities using vector embeddings. With a schema-based data model, users can define custom classes, properties, and relationships. Weaviate provides a GraphQL API for efficient interaction with the database and supports geospatial data for spatial queries and analysis. It offers real-time updates, multi-tenancy, and authentication mechanisms for secure access control. Integration with machine learning models enriches data with vector embeddings and enables advanced analytics. Weaviate has an active community and ecosystem, with plugins and extensions available for extending its functionality.
Activeloop: Deeplake by active loop is one of a kind “as a service”. It is not only a vector DB it also provides end to end data storage and data lake functionalities. Some of the out of box features such as version control for data, In browser visualization, Rapid queries with Tensor Query Language (TQL), Streaming, Deep retrieval, Integration with data stack, mlops and llm orchestrator platforms.
Other than these, traditional SQL & No-SQL databases also coming up with vector db services such as Postgress PG Vector, and MongoDB vector.
Associate services: RAG stands as a pivotal concept within Large Language Models (LLM), embodying the core essence of data and autonomous agents. Undoubtedly, in any context involving retrieval agents, data serves as both the fuel and performance metric for applications powered by Generative AI (GenAI). Parsing unstructured data emerges as a primary challenge, with the creation of robust agents presenting an equally formidable task. To address these challenges, a plethora of service offerings are emerging, potentially introducing new momentum to the landscape of LLM and GenAI.
Llama-cloud: LlamaParse, a state-of-the-art parser designed to unlock RAG over complex PDFs with embedded tables and charts. Traditional methods often fall short, but LlamaParse pioneers a novel recursive retrieval technique, enabling hierarchical indexing and querying. With its proprietary parsing service, PDFs are seamlessly transformed into structured markdown format, integrating smoothly with advanced retrieval algorithms. Currently available in public preview, LlamaParse offers universal accessibility with a usage limit of 1k pages per day. Complementing this is the Managed Ingestion and Retrieval API, streamlining data pipelines for context-augmented LLM applications.
OpenAI: OpenAI The Assistants API empowers you to construct AI assistants directly within your applications. These assistants are equipped with instructions and have the ability to utilize various models, tools, and knowledge to address user queries effectively. Presently, the Assistants API offers support for three distinct types of tools: Code Interpreter, Retrieval, and Function Calling.
GenAI Platform as a Service: One of the major developments includes providing a platform that can server all the generative AI needs in a single platform.
Amazon Bedrock: Amazon Bedrock offers a swift pathway for embracing and leveraging the newest generative AI breakthroughs. With seamless access to a selection of top-performing FM (Foundational Models) from renowned AI entities such as AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon, adapting to cutting-edge innovations becomes effortless. The singular API interface provided by Amazon Bedrock ensures consistent accessibility, irrespective of the chosen models. This grants users the freedom to utilize various FM options and seamlessly transition to the latest model iterations with minimal adjustments to their codebase.
Groq: Groq is dedicated to establishing the benchmark for GenAI inference speed, empowering real-time AI applications to thrive in the present. At the core of Groq’s innovation lies the Groq LPU (Language Processing Unit), a revolutionary Inference Engine designed to address the unique challenges posed by computationally intensive applications, particularly those with a sequential component such as AI language applications (LLMs).
Nvidia: We all know about Nvidia. Nvidia’s new cloud native AI platform can serve community models, Nvidia’s models and custom models as well. Also serves for the enterprise ready RAG platforms.
Langsmith: LangSmith is a unified DevOps platform tailored for developing, testing, deploying, and monitoring Large Language Model (LLM) applications. It addresses the unique challenges posed by LLMs, such as non-determinism and unpredictable inputs, by providing features like easy sharing of chain traces, collaborative prompt crafting through LangSmith Hub, and AI-assisted evaluation to ensure application quality.

Benefits of AIaaS

Till now we have seen different “as a Service” that are popular and useful. In future with the pace of development I will try to update the list. But, all in all, what are the benefits?

Cloud-based: AI services are hosted on cloud platforms, allowing users to access them remotely over the internet. This eliminates the need for users to manage and maintain their own AI infrastructure.
Scalability: AIaaS providers typically offer scalable solutions, enabling users to easily adjust the usage based on their needs. This scalability ensures that resources can be allocated efficiently, whether the demand for AI services fluctuates or grows over time.
Pay-as-you-go: AIaaS often operates on a subscription or usage-based pricing model, where users pay only for the resources they consume. This pay-as-you-go approach makes AI technologies more accessible to a wider range of users, including small businesses and startups.
Pre-built models and APIs: AIaaS providers offer a range of pre-built AI models, algorithms, and APIs (Application Programming Interfaces) that cover various use cases such as natural language processing, computer vision, machine learning, and more. Users can integrate these models into their applications with minimal effort.
Customization and flexibility: While AIaaS providers offer pre-built solutions, they also allow users to customize and fine-tune the models to better fit their specific requirements. This flexibility enables users to adapt AI technologies to their unique use cases and business needs.
No need of AI Experts: With user-friendly interfaces and pay-as-you-go pricing, AIaaS platforms democratize AI access, enabling non-experts to leverage AI capabilities without specialized knowledge. This democratization lowers the barrier to entry, allowing organizations to harness AI’s benefits for various applications without relying heavily on in-house AI expertise.

Future of AIaaS

The trajectory of AIaaS, particularly in the context of Generative AI and LLM, is swiftly evolving, promising diverse “as a Service” dimensions in the near future. Major organizations are vigorously developing products slated for service-based offerings. Nvidia, for instance, is committed to pioneering stronger and more powerful AI chips and supercomputers. At the recent Nvidia GTC, groundbreaking products were unveiled, poised to revolutionize the AI landscape, particularly in Generative AI.

It’s evident that these AI hardware advancements will be embraced by both startups like OpenAI, Mistral, and others, as well as tech giants such as Microsoft, Google, Nvidia, among others. They aim to democratize AI by offering more AI products as services to the broader AI community. This signals a clear shift in the trajectory of AIaaS, potentially fueling business growth powered by AI.

However, this evolution prompts questions within the open-source AI community and among developers who specialize in building AI solutions from scratch. Is this trend concerning for those who champion open-source principles and believe in the craftsmanship of creating AI solutions from the ground up?

In my perspective, NO! While there may be various arguments, analogies, and contrarian viewpoints on this matter, I’d like to highlight three key points I believe are pivotal.

Business Use cases: While AI has great potential, it’s not a fix for every problem. Similarly, Generative AI isn’t a one-size-fits-all solution for all AI-related tasks. How well Generative AI works depends on the specific business needs. Each situation needs its own features, solutions, and user experience. Just using AI as a Service (AIaaS) might not solve the real problem. Knowing what you’re doing is important. Solving complex issues needs careful thought and being proactive. There’s no easy answer for everything. So, AIaaS isn’t a solution for every part of an AI problem. To use AIaaS well, you need to plan carefully, not rush into it. To keep up with changes, it’s crucial to stay updated and try things out.
Overwhelming Services: As mentioned earlier, I’ve listed a few services in the realm of Generative AI and LLM. However, there are numerous such services out there. It can be challenging to pick the right ones for our specific needs. What works well today might become outdated tomorrow as this field is evolving rapidly. The best way to handle this array of services is to become knowledgeable and create custom solutions using open-source frameworks and starting from scratch. This approach offers more control and customization. Of course, we may still need a few AIaaS solutions for certain needs and integrations.
Security & Privacy Skepticism: As we focus more on Generative AI apps, worries about security and privacy pop up fast. People will become more skeptical as new AI as a Service (AIaaS) grows. Even if AIaaS gets super powerful, many users, customers, and clients don’t like using external APIs unless there’s a solid partnership. They worry because their data goes outside their own development setup. To deal with these worries about security and privacy, the best solution seems to be using open-source models and frameworks as much as possible. We can’t ignore AIaaS completely, but relying less on it can help ease these concerns.

Takeaway

As AI as a Service (AIaaS) continues to evolve, particularly in Generative AI and Large Language Models (LLMs), it offers promising opportunities for innovation and growth. However, concerns persist within the open-source AI community regarding overreliance on AIaaS. Balancing AIaaS with open-source frameworks, prioritizing security and privacy, and carefully considering business needs are crucial for navigating this evolving landscape effectively. By adopting a nuanced approach, organizations can harness the benefits of AIaaS while driving impactful and responsible AI-driven solutions.

I hope this article offered valuable insights into the survey of AIaaS in the context of Generative AI and LLM space. If you found the content informative and think it could be beneficial to others, I’d be grateful if you could like 👍, follow 👉, and share✔️ this piece.