Release Notes
TOC
AI 2.1.0New and Optimized FeaturesImage-Based Model SupportModel Compression ToolkitEvent-Driven AutoscalingNotebook Base Image LibraryTrustyAI Drift DetectionSafety GuardrailsLanguage Model Evaluation HarnessDeprecated FeaturesFixed IssuesKnown IssuesAI 2.1.0
New and Optimized Features
Image-Based Model Support
The platform now supports deploying models using container images. By leveraging the ModelCar capability in KServe, users can package models as OCI container images and create model inference services directly from these images without downloading model artifacts at runtime.
Using OCI containers for model storage and distribution provides several benefits:
- Reduced startup time – Model artifacts are packaged within the container image, avoiding repeated downloads when deploying or scaling inference services.
- Lower disk space usage – Container image layer reuse reduces redundant storage of identical model files across nodes.
- Improved inference performance stability – Images can be pre-fetched and cached on nodes, enabling faster and more predictable service startup.
This capability standardizes the model deployment workflow and leverages the container image ecosystem for efficient model versioning, distribution, and lifecycle management.
Model Compression Toolkit
A Model Compression Toolkit has been introduced by integrating the llm-compressor library to provide model compression capabilities for large language models.
The toolkit supports advanced optimization techniques such as weight quantization, activation quantization, and model sparsification. These techniques enable users to reduce the computational and memory requirements of large models while maintaining model quality. Compression jobs can be executed within Notebook environments or automated pipelines, helping organizations reduce hardware costs and improve inference performance.
Event-Driven Autoscaling
Event-driven autoscaling capabilities have been introduced through integration with KEDA, enabling model inference services to automatically scale based on real-time workload signals.
Unlike traditional autoscaling strategies that rely solely on CPU or GPU utilization, event-driven autoscaling can react to metrics such as request rate, queue length, or message events. This enables more responsive scaling of inference services and improves overall resource efficiency and system stability.
Notebook Base Image Library
A new Notebook base image library has been added to provide prebuilt development environments for data science and AI workloads.
These images include commonly used machine learning frameworks, deep learning libraries, and data processing tools, allowing users to quickly start Notebook environments for experimentation and model development while reducing environment setup overhead.
TrustyAI Drift Detection
The platform introduces model drift detection capabilities powered by TrustyAI.
This feature continuously monitors inference data distributions and model behavior to detect potential data drift or prediction drift in production environments. It helps teams identify model performance degradation early and maintain the reliability of deployed AI systems.
Safety Guardrails
Safety guardrails for generative AI applications have been introduced through TrustyAI.
This feature enables policy-based monitoring and filtering of model outputs, allowing organizations to detect and restrict unsafe or non-compliant content generated by AI models. It helps improve the safety, governance, and compliance of generative AI services.
Language Model Evaluation Harness
A language model evaluation harness has been introduced to support standardized evaluation of large language models.
The evaluation framework supports multiple benchmark tasks and datasets, enabling users to systematically measure model performance and make data-driven decisions when selecting or optimizing models.
Deprecated Features
None.
Fixed Issues
- When updating the inference service resource yaml through the page, the volumeMount field is missing, which can cause the inference service to fail to start properly
- In older versions, GraphQL queries (POST by default) were incorrectly intercepted by the gateway layer and checked for create permission. In the new version, requests sent to the /api/graphql interface are correctly treated as get read permissions by the RBAC interceptor, ensuring that users with read-only roles can read and access page content containing GraphQL data streams without problems.
Known Issues
- After deleting a model, the list page fails to reflect the deletion result immediately, and the deleted model still briefly exists in the list. Temporary solution, manually refresh the page.
- Modifying library_name in Gitlab by directly editing the readme file does not synchronize the model type change on the page.
Temporary solution: Use UI operation to modify the library_name to avoid direct operation in Gitlab.