Hey Everyone,
The intersection of AI and BigTech is heating up with a renewed focus now at Amazon, Apple, Google and the arrival of Nvidia as a major AI hub.
At GTC 2024, they announced NIM but I don’t feel it’s been given enough attention. NIM makes it smoother to deploy AI models into production. With a renewed focus on Enterprise AI in 2024, this is getting more important to AI’s development as a whole.
Defining NIM
NIM stands for Nvidia Inference Microservice, or NIM, the new Nvidia Enterprise AI component bundles everything a user needs, including AI models and integration code, all running in a preconfigured Kubernetes Helm chart that can be deployed anywhere.
Prebuilt container and Helm chart
Industry standard APIs
Domain specific code
Optimized inference engines
Support for custom models
Nvidia AI Enterprise
Meanwhile more recently OpenAI has been making their API more Enterprise AI friendly too. OpenAI’s enterprise customers who want to use generative AI but prize accuracy might be able to get that with new customization options for the GPT-4 API. Essentially they are adding new features to help developers have more control over fine-tuning and announcing new ways to build custom models with OpenAI.
OpenAI API has come a long ways since 2020
They include the ability to connect to third-party platforms to share information on fine-tuning, save fine-tuned models during additional training without retraining the entire model, and a new user interface to compare model performance and quality.
Others like Anthropic, Cohere, AI21 Labs and others have to get a lot better in Enterprise AI if they want to have a chance of generating serious revenue in the future. This at a time when Cohere’s Enterprise AI models have arrived finally to Azure AI as well.
So with NIM, it currently includes support for models from NVIDIA, A121, Adept, Cohere, Getty Images, and Shutterstock as well as open models from Google, Hugging Face, Meta, Microsoft, Mistral AI and Stability AI.
Nvidia says it is already working with Amazon, Google and Microsoft to make these NIM microservices available on SageMaker, Kubernetes Engine and Azure AI, respectively. They’ll also be integrated into frameworks like Deepset, LangChain and LlamaIndex.
NIM’s PR at GTC was sort of drown out among the more flashy announcements:
NIM Facilitates Integration and Deployment
As companies move from testing large language models (LLM) to actually deploying them in a production setting, they’re running into a host of challenges, from sizing the hardware for inference workloads to integrating outside data as part of a retrieval augmented generation (RAG) workflow to performing prompt engineering using a tool like LlamaIndex or LangChain.
The goal with NIM is to reduce the amount of integration and development work companies must perform to bring all of these moving parts together into a deployable entity. This will let companies move their GenAI applications from the proof of concept (POC) stage to production, or “zero to inference in just a few minutes,” said Manuvir Das, the vice president of enterprise computing at Nvidia.
On the inference engine side, Nvidia will use the Triton Inference Server, TensorRT and TensorRT-LLM. Some of the Nvidia microservices available through NIM will include Riva for customizing speech and translation models, cuOpt for routing optimizations and the Earth-2 model for weather and climate simulations.
Nvidia is showing its evolution in AI software that makes its partner network much more valuable and efficient.
“NIM leverages optimized inference engines for each model and hardware setup, providing the best possible latency and throughput on accelerated infrastructure,” Nvidia says in a blog post. “In addition to supporting optimized community models, developers can achieve even more accuracy and performance by aligning and fine-tuning models with proprietary data sources that never leave the boundaries of their data center.”
Current Customers of NIM
Among NIM’s current users are the likes of Box, Cloudera, Cohesity, Datastax, Dropbox and NetApp.
NVIDIA NIM is designed to bridge the gap between the complex world of AI development and the operational needs of enterprise environments, enabling 10-100X more enterprise application developers to contribute to AI transformations of their companies.
NIM’s core efficiencies:
Deploy anywhere
Develop with industry-standard APIs
Leverage domain-specific models
Run on optimized inference engines
Support for enterprise-grade AI
I have high hopes for NIM facilitating Nvidia’s growing ecosystem of integrations. NIM leverages industry standard APIs and microservices, both to integrate the various components that make up a GenAI app (model, RAG, data, etc.) as well as to expose and integrate the final GenAI application with their business applications. When they’re ready to deploy the NIM, they have a choice of platforms to automatically deploy to.
I believe Nvidia is building its own version of the AI Cloud, a bit like the early days of AWS, but specifically for emerging technologies.
Customization and optimization means companies like OpenAI have a distinct advantage in diversifying their revenue over other foundation LLM builders. With their own recent announcements, OpenAI also officially announced assisted fine-tuning, where OpenAI employees help customers fine-tune GPT-4 as a part of its Custom Models program, for example.
If consumer AI products are difficult to scale and difficult to sustain, Enterprise AI are key for Generative AI startups to develop sustainable revenue catalysts and diversified revenue to remain viable and not simply burn cash. Even such quality firms as Cohere, Anthropic, Aleph Alpha and A21 Labs appear to be struggling with revenue generation, in proportion to the amount they have raised. Along with Open-source leaders like Mistral and dozens of their peers.
Nvidia’s announcement of NIM really impressed me at GTC. It’s all about eliminating as much of the tedious integration and deployment work so customers can get their GenAI into operation as quickly as possible. Making all aspects of Enterprise adoption more seamless, integrated, easy and convenient is also why I’m so bullish on firms like Databricks and Snowflake who I believe will continue to acquire AI startups to do just that.
Industry-standard APIs, domain-specific code, efficient inference engines, and enterprise runtime are all included in NVIDIA NIM, a containerized inference microservice. While NIM provides prebuilt models, it also allows organizations to bring their own proprietary data and will support and help to accelerate Retrieval Augmented Generation (RAG) deployment.
As for Microsoft, it’s doign a lot for the Enteprrise AI side of course. The Azure AI model catalog now offers more than 1,600 foundation models (LLMs and SLMs) from Databricks, Deci AI, Hugging Face, Meta, Microsoft Research, Mistral AI, NVIDIA, OpenAI, Stability AI – and now Cohere - enabling Azure customers to choose the best model for their use case.
The winners and facilitators of Enterprise AI for Generative AI really will be important companies. This is why I’ve had my eye on startups like Anyscale.
Anyscale's core mission is centered around simplifying and accelerating the development of distributed applications, enabling developers to build and scale their software with ease. It’s an exciting time for Generative AI, less so on the consumer side in 2024, but more so on how actual and real businesses slowly pilot adoption of Generative AI’s capabilities and building on their own proprietary data with custom solutions for their business needs and specific requirements.