Model Registry Error Catalog
Comprehensive searchable catalog of all Model Registry errors with troubleshooting steps and auto-fix functions.
Last Updated: November 11, 2025
Quick Search
| Error Code | Category | Severity | Description |
|---|---|---|---|
S3_UPLOAD_FAILED |
Storage | Critical | S3 upload operation failed |
S3_DOWNLOAD_FAILED |
Storage | High | S3 download operation failed |
S3_DELETE_FAILED |
Storage | Medium | S3 delete operation failed |
S3_ERROR |
Storage | High | Generic S3 service error |
S3_UNAVAILABLE |
Network | Critical | S3 unavailable (circuit breaker open) |
INVALID_S3_PATH |
Validation | Medium | S3 path format is invalid |
MODEL_VALIDATION_FAILED |
Validation | Medium | Model metadata validation failed |
DEPLOYMENT_FAILED |
Deployment | High | Model deployment failed |
CONCURRENT_DEPLOYMENT |
Deployment | Medium | Race condition during deployment |
DUPLICATE_MODEL |
Versioning | Low | Model version already exists |
CIRCUIT_BREAKER_OPEN |
Network | Critical | Circuit breaker is open |
DATABASE_ERROR |
Database | Critical | Database operation failed |
MODEL_NOT_FOUND |
Database | Medium | Model not found in database |
Storage Errors
S3_UPLOAD_FAILED
Category: Storage
Severity: Critical
Class: S3UploadError
Description: Failed to upload model artifacts to S3 storage.
Common Causes:
- Network timeout or connectivity issues
- Bucket access denied (IAM permissions)
- Invalid AWS credentials
- Model size exceeds upload limit (100MB standard, 5GB multipart)
- Circuit breaker is open due to previous failures
Troubleshooting Steps:
import { S3UploadError } from "@/lib/ai/training/modelRegistryErrors";
try {
await registry.saveModel(model, metadata);
} catch (error) {
if (error instanceof S3UploadError) {
console.error(error.getTroubleshootingGuide());
// Step 1: Auto-fix - verify bucket access
const fixed = await error.attemptAutoFix();
if (fixed) {
console.log("✅ Bucket access verified, retrying...");
await registry.saveModel(model, metadata);
} else {
// Manual intervention required
console.error("Manual steps:");
console.error(
"1. Check AWS service health: https://status.aws.amazon.com/",
);
console.error("2. Verify IAM role has s3:PutObject permission");
console.error("3. Check model size is under limit");
console.error("4. Review circuit breaker status");
}
}
}
Related Documentation:
Runbook: /docs/runbooks/s3-upload-failure.md
S3_DOWNLOAD_FAILED
Category: Storage
Severity: High
Class: S3DownloadError
Description: Failed to download model artifacts from S3 storage.
Common Causes:
- Object not found at specified S3 path
- Network timeout during download
- Corrupted data or incomplete upload
- Access denied (IAM permissions)
- Invalid S3 path format
Troubleshooting Steps:
try {
const metadata = await registry.getModel("my-model", "1.0.0");
// Download triggered automatically during validation
} catch (error) {
if (error instanceof S3DownloadError) {
console.error(error.getTroubleshootingGuide());
// Manual steps:
// 1. Verify object exists
const s3Path = error.context.s3Path;
console.log("Check if object exists:", s3Path);
// 2. Check permissions
console.log("Verify IAM role has s3:GetObject permission");
// 3. Validate path format
console.log("S3 path format: s3://bucket-name/path/to/object");
}
}
Related Documentation:
Runbook: /docs/runbooks/s3-download-failure.md
S3_DELETE_FAILED
Category: Storage
Severity: Medium
Class: S3DeleteError
Description: Failed to delete model artifacts from S3 storage.
Common Causes:
- Insufficient IAM permissions (s3:DeleteObject)
- Object protected by versioning or object lock
- Bucket policy denies delete operations
Troubleshooting Steps:
// Manual steps only (no auto-fix available)
try {
await s3Storage.deleteModel({ bucket, key });
} catch (error) {
if (error instanceof S3DeleteError) {
console.error("1. Verify IAM permissions include s3:DeleteObject");
console.error("2. Check if object is protected by versioning");
console.error("3. Review bucket policy for delete restrictions");
}
}
S3_ERROR
Category: Storage
Severity: High
Class: S3Error
Description: Generic S3 service error (catch-all for S3 issues).
Troubleshooting Steps:
try {
await s3Operation();
} catch (error) {
if (error instanceof S3Error) {
console.error("1. Check AWS service health");
console.error("2. Verify AWS credentials are configured");
console.error("3. Review S3 bucket configuration");
}
}
S3_UNAVAILABLE
Category: Network
Severity: Critical
Class: S3UnavailableError
Description: S3 service is temporarily unavailable due to circuit breaker being open.
Common Causes:
- Too many consecutive S3 failures (>50% error rate)
- S3 service degradation or outage
- Network connectivity issues
Troubleshooting Steps:
try {
await registry.saveModel(model, metadata);
} catch (error) {
if (error instanceof S3UnavailableError) {
console.error(error.getTroubleshootingGuide());
// Wait for circuit breaker auto-recovery (30s default)
console.log("Waiting 30s for circuit breaker to recover...");
await new Promise((resolve) => setTimeout(resolve, 30000));
// Retry
await registry.saveModel(model, metadata);
}
}
Auto-Recovery: Circuit breaker automatically closes after 30s cooldown if service recovers.
Manual Reset (admin only):
import { s3CircuitBreaker } from "@/lib/ai/training/S3CircuitBreaker";
// Force reset (use with caution!)
s3CircuitBreaker.reset();
Related Documentation:
Runbook: /docs/runbooks/s3-circuit-breaker-open.md
INVALID_S3_PATH
Category: Validation
Severity: Medium
Class: InvalidS3PathError
Description: S3 path format is invalid or malformed.
Valid Format: s3://bucket-name/path/to/object
Common Mistakes:
- Missing
s3://prefix - No bucket name
- Invalid characters in path
- Relative paths instead of absolute
Troubleshooting Steps:
try {
await registry.saveModel(model, {
...metadata,
s3Path: "invalid-path", // ❌ WRONG
});
} catch (error) {
if (error instanceof InvalidS3PathError) {
console.error("Invalid path:", error.context.invalidPath);
console.error("Use format: s3://bucket-name/path/to/object");
// Fix and retry
await registry.saveModel(model, {
...metadata,
s3Path: "s3://trillbot-models/model.json", // ✅ CORRECT
});
}
}
Validation Errors
MODEL_VALIDATION_FAILED
Category: Validation
Severity: Medium
Class: ModelValidationError
Description: Model metadata failed schema validation.
Common Causes:
- Missing required fields (modelId, version, s3Path, etc.)
- Invalid version format (must follow semantic versioning)
- Invalid model type (must be: lstm, transformer, neural, ensemble)
- Invalid input/output shapes (must be number arrays)
- Constraint violations (accuracy not between 0-1, etc.)
Troubleshooting Steps:
import { z } from "zod";
try {
await registry.saveModel(model, {
modelId: "test",
version: "invalid", // ❌ Not semantic versioning
// ... other fields
});
} catch (error) {
if (error instanceof ModelValidationError) {
console.error(error.getTroubleshootingGuide());
// Review validation errors
const validationErrors = error.context.validationErrors;
console.error("Validation errors:", validationErrors);
// Fix common issues:
// 1. Use semantic versioning: "1.0.0", "2.1.3"
// 2. Verify all required fields are present
// 3. Check model type is valid
// 4. Ensure shapes are number arrays
}
}
Valid Model Types: lstm, transformer, neural, ensemble
Version Format: Semantic versioning (e.g., 1.0.0, 2.1.3)
Related Documentation:
Deployment Errors
DEPLOYMENT_FAILED
Category: Deployment
Severity: High
Class: DeploymentError
Description: Model deployment operation failed.
Common Causes:
- Model not found in database
- Deployment transaction failed (database issue)
- Race condition (concurrent deployment)
- Environment validation failed
Troubleshooting Steps:
try {
await registry.deployModel("my-model", "1.0.0", {
environment: "production",
});
} catch (error) {
if (error instanceof DeploymentError) {
console.error(error.getTroubleshootingGuide());
// Step 1: Verify model exists
const model = await registry.getModel("my-model", "1.0.0");
if (!model) {
console.error("Model not found. Check model ID and version.");
return;
}
// Step 2: Check deployment transaction logs
console.error("Review database logs for transaction details");
// Step 3: Ensure no concurrent deployments
const activeModel = await registry.getActiveModel("my-model", "production");
console.log("Currently active:", activeModel?.version);
}
}
Runbook: /docs/runbooks/deployment-failure.md
CONCURRENT_DEPLOYMENT
Category: Deployment
Severity: Medium
Class: ConcurrentDeploymentError
Description: Race condition detected - multiple deployments attempted simultaneously.
Troubleshooting Steps:
try {
await registry.deployModel("my-model", "1.0.0", {
environment: "production",
});
} catch (error) {
if (error instanceof ConcurrentDeploymentError) {
console.log("Concurrent deployment detected. Waiting 2s and retrying...");
await new Promise((resolve) => setTimeout(resolve, 2000));
// Retry deployment
await registry.deployModel("my-model", "1.0.0", {
environment: "production",
});
}
}
Prevention: Use deployment locks in high-concurrency environments.
Related Documentation:
Versioning Errors
DUPLICATE_MODEL
Category: Versioning
Severity: Low
Class: DuplicateModelError
Description: Model with specified (modelId, version) already exists.
Troubleshooting Steps:
try {
await registry.saveModel(model, {
modelId: "my-model",
version: "1.0.0", // Already exists
// ...
});
} catch (error) {
if (error instanceof DuplicateModelError) {
console.error("Model version already exists");
// Solution 1: Use different version number
await registry.saveModel(model, {
modelId: "my-model",
version: "1.1.0", // Incremented version
// ...
});
// Solution 2: Use version tagging to reference existing
// (if you want to tag existing version)
}
}
Best Practice: Always increment version for model updates following semantic versioning.
Related Documentation:
Network Errors
CIRCUIT_BREAKER_OPEN
Category: Network
Severity: Critical
Class: CircuitBreakerOpenError
Description: Circuit breaker is in OPEN state, preventing operations.
Common Causes:
- Too many consecutive failures (>50% error rate)
- Service degradation or outage
- Network connectivity issues
Troubleshooting Steps:
try {
await s3Operation();
} catch (error) {
if (error instanceof CircuitBreakerOpenError) {
console.error(error.getTroubleshootingGuide());
// Wait for auto-recovery (30s)
console.log("Circuit breaker is open. Waiting 30s...");
await new Promise((resolve) => setTimeout(resolve, 30000));
// Check circuit breaker stats
const response = await fetch("/api/monitoring/circuit-breaker");
const stats = await response.json();
console.log("Circuit breaker stats:", stats);
// Retry operation
await s3Operation();
}
}
Auto-Recovery: Circuit automatically transitions to HALF_OPEN after 30s, then CLOSED if service healthy.
Manual Reset (admin only):
# API endpoint for manual reset
curl -X POST /api/admin/circuit-breaker/reset \
-H "Authorization: Bearer $ADMIN_TOKEN"
Runbook: /docs/runbooks/s3-circuit-breaker-open.md
Database Errors
DATABASE_ERROR
Category: Database
Severity: Critical
Class: DatabaseError
Description: Database operation failed.
Common Causes:
- Connection timeout
- Query syntax error
- Constraint violation
- Transaction failure
- Supabase service unavailable
Troubleshooting Steps:
try {
await registry.saveModel(model, metadata);
} catch (error) {
if (error instanceof DatabaseError) {
console.error(error.getTroubleshootingGuide());
// Check database connection
const { data, error: pingError } = await supabase
.from("trained_models")
.select("count")
.limit(1);
if (pingError) {
console.error("Database connection failed:", pingError);
console.error("1. Check SUPABASE_URL is correct");
console.error("2. Verify SUPABASE_SERVICE_ROLE_KEY");
console.error("3. Check Supabase service status");
} else {
console.log("Database connection OK");
console.error("Review database logs for specific error");
}
}
}
Related Documentation:
Runbook: /docs/runbooks/database-error.md
MODEL_NOT_FOUND
Category: Database
Severity: Medium
Class: ModelNotFoundError
Description: Requested model not found in database.
Troubleshooting Steps:
try {
const model = await registry.getModel("my-model", "1.0.0");
if (!model) {
throw new ModelNotFoundError("my-model", "1.0.0");
}
} catch (error) {
if (error instanceof ModelNotFoundError) {
console.error("Model not found");
// List available versions
const versions = await registry.getModelVersions("my-model");
console.log("Available versions:", versions);
// List all models
const allModels = await registry.listModels({ limit: 100 });
console.log(
"All models:",
allModels.map((m) => m.modelId),
);
// Check if archived
const archivedModels = await registry.listModels({
deploymentStatus: "archived",
});
console.log("Archived models:", archivedModels.length);
}
}
Error Handling Best Practices
Pattern 1: Auto-Fix with Retry
async function saveModelWithRetry(model, metadata, maxRetries = 3) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
return await registry.saveModel(model, metadata);
} catch (error) {
if (error instanceof ModelRegistryError) {
// Attempt auto-fix
const fixed = await error.attemptAutoFix();
if (fixed && attempt < maxRetries - 1) {
console.log(
`✅ Auto-fix succeeded (attempt ${attempt + 1}), retrying...`,
);
continue; // Retry
} else {
console.error(error.getTroubleshootingGuide());
throw error; // Give up
}
} else {
throw error; // Unexpected error
}
}
}
}
Pattern 2: Graceful Degradation
try {
await registry.saveModel(model, metadata);
} catch (error) {
if (error instanceof S3UnavailableError) {
// Fallback: Save to local filesystem
console.warn("S3 unavailable, falling back to local storage");
await saveToLocalStorage(model, metadata);
} else {
throw error;
}
}
Pattern 3: Monitoring Integration
import { logger } from "@/lib/logger";
try {
await registry.saveModel(model, metadata);
} catch (error) {
if (error instanceof ModelRegistryError) {
// Log structured error
logger.error("Model save failed", {
...error.toJSON(),
environment: process.env.NODE_ENV,
});
// Send to Sentry/monitoring service
Sentry.captureException(error, {
contexts: {
error: error.toJSON(),
},
});
}
throw error;
}
Next Steps
Last Updated: November 11, 2025