Setup guides, screenshots, and operator checklists

Run Zephyr like a platform, not a collection of fixes.

This handbook is written to reduce support load. It covers the real operating flows inside Zephyr: node setup, first deployment, customer handoff, routing, troubleshooting, and how to answer common questions without retyping them every week.

Best starting pointsFresh node install, first deploy, customer handoff, troubleshooting map.

Built for operatorsUse this page for onboarding staff and for reducing repeated support explanations.

Fresh node install

Bring a new Debian or Ubuntu server online with the one-line Zephyr installer.

First deployment

Create a product, assign a customer, and provision with fewer missing steps.

Portal handoff

Check the customer path before the service leaves provisioning and lands in support.

Troubleshooting map

Start from the most common failure patterns: nodes, provisioning, routing, and playback.

Platform overview

Zephyr is the operator layer for your hosted media service. It ties together node health, customer appboxes, product templates, provisioning, support, portal handoff, and request workflow. When the platform is documented clearly, the panel becomes self-serve. When it is not, support turns into tribal knowledge.

This handbook is intentionally direct. It is designed to help an operator open the correct page, check the correct state, and follow the correct order instead of hunting through the UI and hoping the right panel appears.

✓

Use the demo panel previews in this guide so operators can match the handbook to the real workspace.

✓

Give new staff this handbook before giving them elevated access.

✓

Route customers to the portal and customer-facing guidance instead of explaining the same login steps manually.

Operator overview

The main dashboard your team works from every day

Open full view

Quick start checklist

If you are bringing up a new environment or training someone new, use this order. It catches most avoidable mistakes early.

Confirm the control plane is healthy

Open Health & Capacity. Confirm the panel is healthy, agent targets are current, and no core service is already drifting before you add load.

Go to Nodes, create the node, run the install command on the target server, and confirm the node becomes online.

Create products before creating manual exceptions

Use Products to define the expected service tier, app type, CPU, RAM, stream allowance, and notes operators should verify.

Create the customer and deploy from the product template

Open Customers, create the account, then provision from Deployments or the linked workflow.

Verify route, credentials, and portal handoff

Do not call the job done when the container exists. Confirm the URL loads, the service is reachable, credentials are correct, and the portal gives the customer enough context to self-serve.

Most support savings come from standardizing the order of operations, not from adding more tools.

Daily operator routine

Operators work faster when they check the same signals in the same order each day.

✓

Check health and capacity first for node drift, agent drift, or pressure hotspots.

✓

Check deployments next for any service still starting, retrying, or failed.

✓

Review support tickets with linked customer and appbox context before replying.

✓

Review requests and automation queues so customers are not left waiting on silent failures.

✓

Only after that handle new provisioning, changes, or ad-hoc requests.

Navigation and dashboard

Overview, provisioning, and operations in one workspace

Open full view

What each major area is for

Overview tracks overall state, recent services, customer summary, and capacity.
Nodes and Health are the source of truth for whether the fleet can actually take work.
Products define what gets sold and how it should be deployed.
Customers and Appboxes show ownership, runtime state, and service handoff context.
Support and Requests are where daily noise either gets organized or gets worse.

Node setup guide

Node setup is the highest-friction part of the system, so this section is intentionally procedural.

Node list

Fleet overview, agent versions, and resource visibility

Open full view

Before you install a node

✓

Use a Linux server with outbound internet access.

✓

Create a fresh node record in the panel and use its real install command.

✓

If the server previously ran an Orion-era or older Zephyr agent, clean stale services before retrying.

✓

Know whether the host is a fresh Debian or Ubuntu machine so the one-line installer can bootstrap Docker safely.

Fresh server onboarding

Create the node in the panel

Open Nodes, fill in the name and target details, then copy the generated install command.

Run the one-line installer

On a fresh Debian or Ubuntu server, the installer can bootstrap Docker if it is missing, write the env file, download the latest agent binary, and register the service.

Check the service locally

Run systemctl status zephyr-agent. Look for a running service with heartbeats or websocket traffic instead of repeated auth or reconnect errors.

Confirm the panel sees the node online

Back in Zephyr, wait for the node to move from provisioning to online and begin reporting capacity and versions.

If a node stays stuck in provisioning, check for stale multiple agent processes, an old env file, or a deleted node identity still installed on the server.

What to verify when a node misbehaves

The agent version shown in the panel matches the current downloadable binary.
Only one agent process is running on the server.
/etc/zephyr/node-agent.env contains the correct node ID and token.
The server can reach the panel base URL and websocket URL.
The host has enough free CPU, RAM, and disk to actually accept work.

Products and appboxes

Appboxes and deployed services

Customer instances, statuses, and routes

Open full view

Products define what gets deployed

Products are templates. Appboxes are customer instances created from those templates. If products are vague, deployments become custom support jobs instead of repeatable platform work.

Standardize these fields first

✓

Clear product name and app type.

✓

Default RAM and CPU that actually match the promised tier.

✓

Allowed streams and transcodes that align with what the customer bought.

✓

Any staff notes that should be checked before handoff.

First deployment recipe

Select the right product

Match the purchased service to the correct template instead of hand-editing limits in the moment.

Deploy and watch the queue

Use Deployments and the provisioning console to follow the runtime state instead of refreshing the appbox page and guessing.

Open the appbox detail

Confirm customer linkage, node placement, route, and service status before moving to handoff.

Standard naming and defaults matter more than it looks. Inconsistent products create support debt later.

Customer handoff

Customers page

Accounts, appbox counts, and allowance overview

Open full view

The cleanest support ticket is the one you prevent during handoff.

✓

Customer account exists with the correct email address.

✓

Appbox is active and not still provisioning, starting, or failed.

✓

Primary URL resolves and loads.

✓

Customer credentials are known and tested.

✓

Portal link works and the connect guide makes sense for the chosen app.

What to send the customer

The portal link.
The correct app name for their service: Plex, Emby, or Jellyfin.
A short note on whether to use platform credentials or local server credentials.
Where to raise support tickets if something fails.

Portal and access

The portal is your customer-facing self-service layer. The better this page is, the less support work escapes upward.

What the portal should answer by itself

✓

What is my server URL?

✓

What username and password do I use?

✓

Which app should I download?

✓

How do I invite my household members?

✓

How do I raise a support ticket?

Docs and portal should reinforce each other. If the docs explain a flow but the portal omits the matching context, customers still open tickets.

Requests and automation

Requests only feel automatic when operators understand the path. Use this section when customers ask why something is queued, delayed, unavailable, or missing.

Request statuses explained

Queued means the request was accepted but is not processing yet.
Processing means the automation chain has picked it up.
Available means it should now be visible in the customer library.
Failed or Rejected means operator review or a policy explanation is needed.

Requests queue

Automation state in the demo workspace

Open full view

Plans and billing

Document plans in operator language

Do not just document price. Document the practical behavior of each plan: node allowance, stream allowance, transcode expectations, and whether the customer should reasonably expect 4K direct play.

Billing rules worth making explicit

✓

When upgrades take effect.

✓

When downgrades take effect.

✓

What happens on failed payment.

✓

What suspension means operationally.

✓

What data is removed on cancellation.

Any billing rule that stays implicit turns into a support argument later.

Integrations and routing

Integrations should be documented as a chain, not as isolated toggles

Operators need to understand how Cloudflare, routing, billing automation, and media tooling connect to the appbox lifecycle. If those relationships are not clear, failures get blamed on the wrong layer.

✓

Document which domain or route pattern each service should use.

✓

Document which systems are required for orders versus optional for enhancement.

✓

Document what success looks like after a route or integration change.

✓

Document rollback steps before asking staff to modify production routing.

Cloudflare should be treated as part of the delivery path, not an afterthought.
Node routing and connectivity should be checked before blaming deployment templates.
WHMCS or automation changes should always be tested with a single product flow first.

Troubleshooting

Support and issue context

Demo support page with linked customer and service context

Open full view

Buffering or poor playback

Check whether the user is transcoding instead of direct playing.
Check plan stream limits and whether the service is saturated.
Check node CPU and RAM pressure if multiple heavy sessions are landing on the same machine.
Check the customer’s connection and playback device before blaming the server.

Appbox stuck provisioning

Open Deployments and the provisioning console first.
Confirm the target node is online and still reporting resources.
Check routing prerequisites like Cloudflare or domain state.
Verify the service did not start partially with a restart loop or missing mounted prerequisite.

Node online but actions do nothing

Confirm the installed agent is current.
Check for a stale second agent process.
Confirm the env file matches the current node record.
Check websocket or heartbeat auth failures in the agent logs.

Support triage sequence

✓

Identify the customer and linked appbox first.

✓

Check service status and route next.

✓

Check node health after that.

✓

Only then decide whether it is customer-side or platform-side.

Support playbook

Support quality improves when responses are consistent and grounded in the same checks every time.

Before replying to a ticket

✓

Open the customer record.

✓

Open the linked appbox or service if present.

✓

Check node health if the issue is runtime-related.

✓

Look for deployment or integration errors before asking the customer for repeated screenshots.

What customers should include in a ticket

Device and app name.
What they were trying to do.
The error message or screenshot.
Whether the issue affects only them or all invited users.

Putting these prompts directly into the support form later would reduce low-information tickets even more.

FAQ

Do I need technical knowledge to use Zephyr?

No for customers, some for operators. The customer side should be portal-driven. The operator side is procedural if you follow the documented sequences.

What are the highest-friction areas that need the best docs?

Nodes, first deployment, routing, customer handoff, and troubleshooting buffering or provisioning issues.

Can the docs use real visuals without drifting out of date?

Yes. This page uses framed previews from the public demo panel so the visuals track the actual UI instead of static mock screenshots.

What should be documented next?

WHMCS order flow, Cloudflare tunnel recipes, product-template examples, and customer-facing device guides for Plex, Emby, and Jellyfin.

How do I reduce avoidable support volume fastest?

Improve the handoff checklist, enforce better ticket prompts, document the node install path clearly, and make the portal answer the repeated customer questions by itself.