Model Registry Error Catalog

Model Registry Error Catalog

Comprehensive searchable catalog of all Model Registry errors with troubleshooting steps and auto-fix functions.

Last Updated: November 11, 2025


Error Code Category Severity Description
S3_UPLOAD_FAILED Storage Critical S3 upload operation failed
S3_DOWNLOAD_FAILED Storage High S3 download operation failed
S3_DELETE_FAILED Storage Medium S3 delete operation failed
S3_ERROR Storage High Generic S3 service error
S3_UNAVAILABLE Network Critical S3 unavailable (circuit breaker open)
INVALID_S3_PATH Validation Medium S3 path format is invalid
MODEL_VALIDATION_FAILED Validation Medium Model metadata validation failed
DEPLOYMENT_FAILED Deployment High Model deployment failed
CONCURRENT_DEPLOYMENT Deployment Medium Race condition during deployment
DUPLICATE_MODEL Versioning Low Model version already exists
CIRCUIT_BREAKER_OPEN Network Critical Circuit breaker is open
DATABASE_ERROR Database Critical Database operation failed
MODEL_NOT_FOUND Database Medium Model not found in database

Storage Errors

S3_UPLOAD_FAILED

Category: Storage Severity: Critical Class: S3UploadError

Description: Failed to upload model artifacts to S3 storage.

Common Causes:

  1. Network timeout or connectivity issues
  2. Bucket access denied (IAM permissions)
  3. Invalid AWS credentials
  4. Model size exceeds upload limit (100MB standard, 5GB multipart)
  5. Circuit breaker is open due to previous failures

Troubleshooting Steps:

import { S3UploadError } from "@/lib/ai/training/modelRegistryErrors";

try {
  await registry.saveModel(model, metadata);
} catch (error) {
  if (error instanceof S3UploadError) {
    console.error(error.getTroubleshootingGuide());

    // Step 1: Auto-fix - verify bucket access
    const fixed = await error.attemptAutoFix();
    if (fixed) {
      console.log("✅ Bucket access verified, retrying...");
      await registry.saveModel(model, metadata);
    } else {
      // Manual intervention required
      console.error("Manual steps:");
      console.error(
        "1. Check AWS service health: https://status.aws.amazon.com/",
      );
      console.error("2. Verify IAM role has s3:PutObject permission");
      console.error("3. Check model size is under limit");
      console.error("4. Review circuit breaker status");
    }
  }
}

Related Documentation:

Runbook: /docs/runbooks/s3-upload-failure.md


S3_DOWNLOAD_FAILED

Category: Storage Severity: High Class: S3DownloadError

Description: Failed to download model artifacts from S3 storage.

Common Causes:

  1. Object not found at specified S3 path
  2. Network timeout during download
  3. Corrupted data or incomplete upload
  4. Access denied (IAM permissions)
  5. Invalid S3 path format

Troubleshooting Steps:

try {
  const metadata = await registry.getModel("my-model", "1.0.0");
  // Download triggered automatically during validation
} catch (error) {
  if (error instanceof S3DownloadError) {
    console.error(error.getTroubleshootingGuide());

    // Manual steps:
    // 1. Verify object exists
    const s3Path = error.context.s3Path;
    console.log("Check if object exists:", s3Path);

    // 2. Check permissions
    console.log("Verify IAM role has s3:GetObject permission");

    // 3. Validate path format
    console.log("S3 path format: s3://bucket-name/path/to/object");
  }
}

Related Documentation:

Runbook: /docs/runbooks/s3-download-failure.md


S3_DELETE_FAILED

Category: Storage Severity: Medium Class: S3DeleteError

Description: Failed to delete model artifacts from S3 storage.

Common Causes:

  1. Insufficient IAM permissions (s3:DeleteObject)
  2. Object protected by versioning or object lock
  3. Bucket policy denies delete operations

Troubleshooting Steps:

// Manual steps only (no auto-fix available)
try {
  await s3Storage.deleteModel({ bucket, key });
} catch (error) {
  if (error instanceof S3DeleteError) {
    console.error("1. Verify IAM permissions include s3:DeleteObject");
    console.error("2. Check if object is protected by versioning");
    console.error("3. Review bucket policy for delete restrictions");
  }
}

S3_ERROR

Category: Storage Severity: High Class: S3Error

Description: Generic S3 service error (catch-all for S3 issues).

Troubleshooting Steps:

try {
  await s3Operation();
} catch (error) {
  if (error instanceof S3Error) {
    console.error("1. Check AWS service health");
    console.error("2. Verify AWS credentials are configured");
    console.error("3. Review S3 bucket configuration");
  }
}

S3_UNAVAILABLE

Category: Network Severity: Critical Class: S3UnavailableError

Description: S3 service is temporarily unavailable due to circuit breaker being open.

Common Causes:

  1. Too many consecutive S3 failures (>50% error rate)
  2. S3 service degradation or outage
  3. Network connectivity issues

Troubleshooting Steps:

try {
  await registry.saveModel(model, metadata);
} catch (error) {
  if (error instanceof S3UnavailableError) {
    console.error(error.getTroubleshootingGuide());

    // Wait for circuit breaker auto-recovery (30s default)
    console.log("Waiting 30s for circuit breaker to recover...");
    await new Promise((resolve) => setTimeout(resolve, 30000));

    // Retry
    await registry.saveModel(model, metadata);
  }
}

Auto-Recovery: Circuit breaker automatically closes after 30s cooldown if service recovers.

Manual Reset (admin only):

import { s3CircuitBreaker } from "@/lib/ai/training/S3CircuitBreaker";

// Force reset (use with caution!)
s3CircuitBreaker.reset();

Related Documentation:

Runbook: /docs/runbooks/s3-circuit-breaker-open.md


INVALID_S3_PATH

Category: Validation Severity: Medium Class: InvalidS3PathError

Description: S3 path format is invalid or malformed.

Valid Format: s3://bucket-name/path/to/object

Common Mistakes:

  • Missing s3:// prefix
  • No bucket name
  • Invalid characters in path
  • Relative paths instead of absolute

Troubleshooting Steps:

try {
  await registry.saveModel(model, {
    ...metadata,
    s3Path: "invalid-path", // ❌ WRONG
  });
} catch (error) {
  if (error instanceof InvalidS3PathError) {
    console.error("Invalid path:", error.context.invalidPath);
    console.error("Use format: s3://bucket-name/path/to/object");

    // Fix and retry
    await registry.saveModel(model, {
      ...metadata,
      s3Path: "s3://trillbot-models/model.json", // ✅ CORRECT
    });
  }
}

Validation Errors

MODEL_VALIDATION_FAILED

Category: Validation Severity: Medium Class: ModelValidationError

Description: Model metadata failed schema validation.

Common Causes:

  1. Missing required fields (modelId, version, s3Path, etc.)
  2. Invalid version format (must follow semantic versioning)
  3. Invalid model type (must be: lstm, transformer, neural, ensemble)
  4. Invalid input/output shapes (must be number arrays)
  5. Constraint violations (accuracy not between 0-1, etc.)

Troubleshooting Steps:

import { z } from "zod";

try {
  await registry.saveModel(model, {
    modelId: "test",
    version: "invalid", // ❌ Not semantic versioning
    // ... other fields
  });
} catch (error) {
  if (error instanceof ModelValidationError) {
    console.error(error.getTroubleshootingGuide());

    // Review validation errors
    const validationErrors = error.context.validationErrors;
    console.error("Validation errors:", validationErrors);

    // Fix common issues:
    // 1. Use semantic versioning: "1.0.0", "2.1.3"
    // 2. Verify all required fields are present
    // 3. Check model type is valid
    // 4. Ensure shapes are number arrays
  }
}

Valid Model Types: lstm, transformer, neural, ensemble

Version Format: Semantic versioning (e.g., 1.0.0, 2.1.3)

Related Documentation:


Deployment Errors

DEPLOYMENT_FAILED

Category: Deployment Severity: High Class: DeploymentError

Description: Model deployment operation failed.

Common Causes:

  1. Model not found in database
  2. Deployment transaction failed (database issue)
  3. Race condition (concurrent deployment)
  4. Environment validation failed

Troubleshooting Steps:

try {
  await registry.deployModel("my-model", "1.0.0", {
    environment: "production",
  });
} catch (error) {
  if (error instanceof DeploymentError) {
    console.error(error.getTroubleshootingGuide());

    // Step 1: Verify model exists
    const model = await registry.getModel("my-model", "1.0.0");
    if (!model) {
      console.error("Model not found. Check model ID and version.");
      return;
    }

    // Step 2: Check deployment transaction logs
    console.error("Review database logs for transaction details");

    // Step 3: Ensure no concurrent deployments
    const activeModel = await registry.getActiveModel("my-model", "production");
    console.log("Currently active:", activeModel?.version);
  }
}

Runbook: /docs/runbooks/deployment-failure.md


CONCURRENT_DEPLOYMENT

Category: Deployment Severity: Medium Class: ConcurrentDeploymentError

Description: Race condition detected - multiple deployments attempted simultaneously.

Troubleshooting Steps:

try {
  await registry.deployModel("my-model", "1.0.0", {
    environment: "production",
  });
} catch (error) {
  if (error instanceof ConcurrentDeploymentError) {
    console.log("Concurrent deployment detected. Waiting 2s and retrying...");

    await new Promise((resolve) => setTimeout(resolve, 2000));

    // Retry deployment
    await registry.deployModel("my-model", "1.0.0", {
      environment: "production",
    });
  }
}

Prevention: Use deployment locks in high-concurrency environments.

Related Documentation:


Versioning Errors

DUPLICATE_MODEL

Category: Versioning Severity: Low Class: DuplicateModelError

Description: Model with specified (modelId, version) already exists.

Troubleshooting Steps:

try {
  await registry.saveModel(model, {
    modelId: "my-model",
    version: "1.0.0", // Already exists
    // ...
  });
} catch (error) {
  if (error instanceof DuplicateModelError) {
    console.error("Model version already exists");

    // Solution 1: Use different version number
    await registry.saveModel(model, {
      modelId: "my-model",
      version: "1.1.0", // Incremented version
      // ...
    });

    // Solution 2: Use version tagging to reference existing
    // (if you want to tag existing version)
  }
}

Best Practice: Always increment version for model updates following semantic versioning.

Related Documentation:


Network Errors

CIRCUIT_BREAKER_OPEN

Category: Network Severity: Critical Class: CircuitBreakerOpenError

Description: Circuit breaker is in OPEN state, preventing operations.

Common Causes:

  1. Too many consecutive failures (>50% error rate)
  2. Service degradation or outage
  3. Network connectivity issues

Troubleshooting Steps:

try {
  await s3Operation();
} catch (error) {
  if (error instanceof CircuitBreakerOpenError) {
    console.error(error.getTroubleshootingGuide());

    // Wait for auto-recovery (30s)
    console.log("Circuit breaker is open. Waiting 30s...");
    await new Promise((resolve) => setTimeout(resolve, 30000));

    // Check circuit breaker stats
    const response = await fetch("/api/monitoring/circuit-breaker");
    const stats = await response.json();
    console.log("Circuit breaker stats:", stats);

    // Retry operation
    await s3Operation();
  }
}

Auto-Recovery: Circuit automatically transitions to HALF_OPEN after 30s, then CLOSED if service healthy.

Manual Reset (admin only):

# API endpoint for manual reset
curl -X POST /api/admin/circuit-breaker/reset \
  -H "Authorization: Bearer $ADMIN_TOKEN"

Runbook: /docs/runbooks/s3-circuit-breaker-open.md


Database Errors

DATABASE_ERROR

Category: Database Severity: Critical Class: DatabaseError

Description: Database operation failed.

Common Causes:

  1. Connection timeout
  2. Query syntax error
  3. Constraint violation
  4. Transaction failure
  5. Supabase service unavailable

Troubleshooting Steps:

try {
  await registry.saveModel(model, metadata);
} catch (error) {
  if (error instanceof DatabaseError) {
    console.error(error.getTroubleshootingGuide());

    // Check database connection
    const { data, error: pingError } = await supabase
      .from("trained_models")
      .select("count")
      .limit(1);

    if (pingError) {
      console.error("Database connection failed:", pingError);
      console.error("1. Check SUPABASE_URL is correct");
      console.error("2. Verify SUPABASE_SERVICE_ROLE_KEY");
      console.error("3. Check Supabase service status");
    } else {
      console.log("Database connection OK");
      console.error("Review database logs for specific error");
    }
  }
}

Related Documentation:

Runbook: /docs/runbooks/database-error.md


MODEL_NOT_FOUND

Category: Database Severity: Medium Class: ModelNotFoundError

Description: Requested model not found in database.

Troubleshooting Steps:

try {
  const model = await registry.getModel("my-model", "1.0.0");
  if (!model) {
    throw new ModelNotFoundError("my-model", "1.0.0");
  }
} catch (error) {
  if (error instanceof ModelNotFoundError) {
    console.error("Model not found");

    // List available versions
    const versions = await registry.getModelVersions("my-model");
    console.log("Available versions:", versions);

    // List all models
    const allModels = await registry.listModels({ limit: 100 });
    console.log(
      "All models:",
      allModels.map((m) => m.modelId),
    );

    // Check if archived
    const archivedModels = await registry.listModels({
      deploymentStatus: "archived",
    });
    console.log("Archived models:", archivedModels.length);
  }
}

Error Handling Best Practices

Pattern 1: Auto-Fix with Retry

async function saveModelWithRetry(model, metadata, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await registry.saveModel(model, metadata);
    } catch (error) {
      if (error instanceof ModelRegistryError) {
        // Attempt auto-fix
        const fixed = await error.attemptAutoFix();

        if (fixed && attempt < maxRetries - 1) {
          console.log(
            `✅ Auto-fix succeeded (attempt ${attempt + 1}), retrying...`,
          );
          continue; // Retry
        } else {
          console.error(error.getTroubleshootingGuide());
          throw error; // Give up
        }
      } else {
        throw error; // Unexpected error
      }
    }
  }
}

Pattern 2: Graceful Degradation

try {
  await registry.saveModel(model, metadata);
} catch (error) {
  if (error instanceof S3UnavailableError) {
    // Fallback: Save to local filesystem
    console.warn("S3 unavailable, falling back to local storage");
    await saveToLocalStorage(model, metadata);
  } else {
    throw error;
  }
}

Pattern 3: Monitoring Integration

import { logger } from "@/lib/logger";

try {
  await registry.saveModel(model, metadata);
} catch (error) {
  if (error instanceof ModelRegistryError) {
    // Log structured error
    logger.error("Model save failed", {
      ...error.toJSON(),
      environment: process.env.NODE_ENV,
    });

    // Send to Sentry/monitoring service
    Sentry.captureException(error, {
      contexts: {
        error: error.toJSON(),
      },
    });
  }
  throw error;
}

Next Steps


Last Updated: November 11, 2025

Found an issue with this documentation? Edit this page on GitHub