Model Storage
You must store your model before you can deploy it. You can store a model in an S3 bucket, a Persistent Volume Claim (PVC), or Open Container Initiative (OCI) containers.
In cloud-native inference scenarios, model storage determines the startup speed, version management granularity, and scalability of inference services. KServe loads models through two main mechanisms:
- Storage Initializer (Init Container): For S3 and PVC, downloads/mounts data before the main container starts.
- Sidecar: For OCI images, achieves second-level loading using the container runtime's layered caching capability.
TOC
Using S3 Object Storage for model storageAuthentication ConfigurationS3 Key Configuration ParametersDeploy Inference ServiceUsing OCI containers for model storageModel Image PackagingOption 1: Using Busybox Base Image (Alauda AI Recommendation)Option 2: Using UBI Micro Base Image (Red Hat Recommendation)Building and Pushing the Model ImageDeploy Inference ServicePrerequisitesUsing PVC for model storageUploading model files to a PVCPrerequisitesProcedureVerificationNext StepsDeploy Inference ServiceUsing S3 Object Storage for model storage
This is the most commonly used mode. It implements credential management through Secret with specific labels.
Authentication Configuration
It is recommended to create separate ServiceAccount and Secret for each project.
S3 Key Configuration Parameters
- Replace
YOUR_BASE64_ENCODED_ACCESS_KEYwith your actual Base64-encoded AWS access key ID. - Replace
YOUR_BASE64_ENCODED_SECRET_KEYwith your actual Base64-encoded AWS secret access key. - Replace
your_s3_service_ip:your_s3_portwith the actual IP address and port of your S3 service. - Set
serving.kserve.io/s3-usehttpsto "1" if your S3 service uses HTTPS, or "0" if it uses HTTP.
Deploy Inference Service
- Replace
Qwen2.5-0.5B-Instructwith your actual model name. aml.cpaas.io/runtime-type: vllmspecifies the code runtime type. For more information about custom inference runtimes, see Extend Inference Runtimes.- Replace
aml-vllm-0.11.2-cpuwith the runtime name that is already installed in your platform (corresponding to a ClusterServingRuntime CRD instance). storageUri: s3://models/Qwen2.5-0.5B-Instructspecifies the S3 bucket URI where the model is stored.
Using OCI containers for model storage
As an alternative to storing a model in an S3 bucket or PVC, you can store models in Open Container Initiative (OCI) containers. Deploying models from OCI containers is also known as modelcars in KServe. This approach is ideal for offline environments and enterprise internal registries such as Quay or Harbor.
Using OCI containers for model storage can help you:
- Reduce startup times by avoiding downloading the same model multiple times.
- Reduce disk space usage by reducing the number of models downloaded locally.
- Improve model performance by allowing pre-fetched images.
Model Image Packaging
Create a Containerfile to build the model image.
Option 1: Using Busybox Base Image (Alauda AI Recommendation)
Option 2: Using UBI Micro Base Image (Red Hat Recommendation)
Note
- Specify a base image that provides a shell (for example,
ubi9-micro). You cannot specify an empty image that does not provide a shell, such asscratch, because KServe uses the shell to ensure the model files are accessible to the model server.- Change the ownership of the copied model files and grant read permissions to the root group. This ensures that the model server can access the files, since containers may run with a random user ID and the root group ID.
Building and Pushing the Model Image
After creating the Containerfile, build and push the image to your registry:
-
Create a temporary directory and copy your model files into a
models/subfolder: -
Build the OCI container image:
-
Push the image to your container registry:
Note If your repository is private, ensure that you are authenticated to the registry before pushing the image.
Deploy Inference Service
Prerequisites
- The namespace where the inference service is located must have PSA (Pod Security Admission) Enforce set to Privilege.
KServe supports native OCI protocol:
- Replace
Qwen2.5-0.5B-Instructwith your actual model name. aml.cpaas.io/runtime-type: vllmspecifies the code runtime type. For more information about custom inference runtimes, see Extend Inference Runtimes.- Replace
aml-vllm-0.11.2-cpuwith the runtime name that is already installed in your platform (corresponding to a ClusterServingRuntime CRD instance). storageUri: oci://build-harbor.alauda.cn/test/qwen-oci:v1.0.0specifies the OCI image URI with tag where the model is stored.
Using PVC for model storage
Uploading model files to a PVC
When deploying a model, you can serve it from a preexisting Persistent Volume Claim (PVC) where your model files are stored. You can upload your local model files to a PVC in the IDE that you access from a running workbench.
Prerequisites
-
You have access to the Alauda AI dashboard.
-
You have access to a project that has a running workbench.
-
You have created a persistent volume claim (PVC).
-
The workbench is attached to the persistent volume (PVC).
For instructions on creating a workbench and attaching a PVC, see Create Workbench.
-
You have the model files saved on your local machine.
Procedure
Follow these steps to upload your model files to the PVC within your workbench:
-
From the Alauda AI dashboard, click Workbench to enter the workbench list page.
-
Find your running workbench instance and click the Connect button to enter the workbench.
-
In your workbench IDE, navigate to the file browser:
- In JupyterLab, this is the Files tab in the left sidebar.
- In code-server, this is the Explorer view in the left sidebar.
-
In the file browser, navigate to the home directory. This directory represents the root of your attached PVC.
Note Any files or folders that you create or upload to this folder persist in the PVC.
-
Optional: Create a new folder to organize your models:
- In the file browser, right-click within the home directory and select New Folder.
- Name the folder (for example, models).
- Double-click the new models folder to enter it.
-
Upload your model files to the current folder:
- Using JupyterLab:
- Click the Upload button in the file browser toolbar.
- In the file selection dialog, navigate to and select the model files from your local computer. Click Open.
- Wait for the upload to complete.
- Using code-server:
- Drag the model files directly from your local file explorer and drop them into the file browser pane in the target folder within code-server.
- Wait for the upload process to complete.
- Using JupyterLab:
Verification
Confirm that your files appear in the file browser at the path where you uploaded them.
Next Steps
When deploying a model from a PVC, set the storageUri in the format pvc://<pvc-name>/<optional-path>. For example:
pvc://model-pvc— loads from the root of the PVC.pvc://model-pvc/models/Qwen2.5-0.5B-Instruct— loads from a specific subdirectory.
Deploy Inference Service
- Replace
Qwen2.5-0.5B-Instructwith your actual model name. aml.cpaas.io/runtime-type: vllmspecifies the code runtime type. For more information about custom inference runtimes, see Extend Inference Runtimes.- Replace
aml-vllm-0.11.2-cpuwith the runtime name that is already installed in your platform (corresponding to a ClusterServingRuntime CRD instance). storageUri: pvc://model-pvcspecifies the PVC name where the model is stored.