Troubleshooting
This guide provides solutions to common problems you might encounter while running Ordinaut.
Diagnostic Checklist
When an issue occurs, start with these steps:
-
Check System Health: Query the main health endpoint.
Look for any components that are notcurl http://localhost:8080/health | jq
healthy
. -
Check Container Status:
Ensure all services aredocker compose ps
Up
andhealthy
. -
Check Service Logs: View the logs for the specific component that seems to be failing.
# Example: Check the API logs docker compose logs -f api
Common Problems
Authentication Errors (401
/403
)
- Symptom: You receive a
401 Unauthorized
or403 Forbidden
error. - Solution:
401 Unauthorized
: This means your token is missing, invalid, or expired. Ensure you are providing a valid JWT access token in theAuthorization: Bearer <token>
header.403 Forbidden
: This means your token is valid, but the authenticated agent does not have the requiredscopes
to perform the requested action.- Review Security Warnings: Check the Authentication guide for critical security warnings about the current state of the authentication system, as these may be the source of your issue.
Tasks are not executing
- Symptom: You create a task, but it never runs.
- Solution:
- Check the
due_work
queue: Connect to the PostgreSQL database and runSELECT COUNT(*) FROM due_work WHERE run_at <= now();
. If the count is high, your workers may be overloaded or stuck. - Check the scheduler logs: Run
docker compose logs scheduler
to see if it's correctly calculating and enqueuing run times. - Check the task status: Ensure the task is
active
and notpaused
by queryingGET /tasks/{id}
.
- Check the
Pipeline step is failing
- Symptom: A task run has a
success: false
status. - Solution:
- Get the run details: Query
GET /runs/{id}
for the failed run. - Examine the
error
field: This will contain the specific error message, such as a tool timeout, a validation error, or a connection failure. - Check worker logs: The worker logs will contain a detailed stack trace and context for the failure.
- Get the run details: Query
Service Fails to Start
- Symptom: A container (e.g.,
api
) exits immediately or is in a restart loop. - Solution:
- Check for Missing Secrets: For production deployments (
start.sh prod
orstart.sh ghcr
), ensure you have created an.env
file in theops/
directory and set a secureJWT_SECRET_KEY
. The application will fail to start without it. - Check Database/Redis Health: Ensure the
postgres
andredis
containers areUp (healthy)
before the other services start. If they are not, check their logs for errors.
- Check for Missing Secrets: For production deployments (