Run Zephyr like a platform, not a collection of fixes.
This handbook is written to reduce support load. It covers the real operating flows inside Zephyr: node setup, first deployment, customer handoff, routing, troubleshooting, and how to answer common questions without retyping them every week.
Bring a new Debian or Ubuntu server online with the one-line Zephyr installer.
Create a product, assign a customer, and provision with fewer missing steps.
Check the customer path before the service leaves provisioning and lands in support.
Start from the most common failure patterns: nodes, provisioning, routing, and playback.
Platform overview
Zephyr is the operator layer for your hosted media service. It ties together node health, customer appboxes, product templates, provisioning, support, portal handoff, and request workflow. When the platform is documented clearly, the panel becomes self-serve. When it is not, support turns into tribal knowledge.
This handbook is intentionally direct. It is designed to help an operator open the correct page, check the correct state, and follow the correct order instead of hunting through the UI and hoping the right panel appears.
Quick start checklist
If you are bringing up a new environment or training someone new, use this order. It catches most avoidable mistakes early.
Daily operator routine
Operators work faster when they check the same signals in the same order each day.
What each major area is for
- Overview tracks overall state, recent services, customer summary, and capacity.
- Nodes and Health are the source of truth for whether the fleet can actually take work.
- Products define what gets sold and how it should be deployed.
- Customers and Appboxes show ownership, runtime state, and service handoff context.
- Support and Requests are where daily noise either gets organized or gets worse.
Node setup guide
Node setup is the highest-friction part of the system, so this section is intentionally procedural.
Before you install a node
Fresh server onboarding
systemctl status zephyr-agent. Look for a running service with heartbeats or websocket traffic instead of repeated auth or reconnect errors.What to verify when a node misbehaves
- The agent version shown in the panel matches the current downloadable binary.
- Only one agent process is running on the server.
/etc/zephyr/node-agent.envcontains the correct node ID and token.- The server can reach the panel base URL and websocket URL.
- The host has enough free CPU, RAM, and disk to actually accept work.
Products and appboxes
Products define what gets deployed
Products are templates. Appboxes are customer instances created from those templates. If products are vague, deployments become custom support jobs instead of repeatable platform work.
Standardize these fields first
First deployment recipe
Customer handoff
The cleanest support ticket is the one you prevent during handoff.
What to send the customer
- The portal link.
- The correct app name for their service: Plex, Emby, or Jellyfin.
- A short note on whether to use platform credentials or local server credentials.
- Where to raise support tickets if something fails.
Portal and access
The portal is your customer-facing self-service layer. The better this page is, the less support work escapes upward.
What the portal should answer by itself
Requests and automation
Requests only feel automatic when operators understand the path. Use this section when customers ask why something is queued, delayed, unavailable, or missing.
Request statuses explained
- Queued means the request was accepted but is not processing yet.
- Processing means the automation chain has picked it up.
- Available means it should now be visible in the customer library.
- Failed or Rejected means operator review or a policy explanation is needed.
Plans and billing
Document plans in operator language
Do not just document price. Document the practical behavior of each plan: node allowance, stream allowance, transcode expectations, and whether the customer should reasonably expect 4K direct play.
Billing rules worth making explicit
Integrations and routing
Integrations should be documented as a chain, not as isolated toggles
Operators need to understand how Cloudflare, routing, billing automation, and media tooling connect to the appbox lifecycle. If those relationships are not clear, failures get blamed on the wrong layer.
- Cloudflare should be treated as part of the delivery path, not an afterthought.
- Node routing and connectivity should be checked before blaming deployment templates.
- WHMCS or automation changes should always be tested with a single product flow first.
Troubleshooting
Buffering or poor playback
- Check whether the user is transcoding instead of direct playing.
- Check plan stream limits and whether the service is saturated.
- Check node CPU and RAM pressure if multiple heavy sessions are landing on the same machine.
- Check the customer’s connection and playback device before blaming the server.
Appbox stuck provisioning
- Open Deployments and the provisioning console first.
- Confirm the target node is online and still reporting resources.
- Check routing prerequisites like Cloudflare or domain state.
- Verify the service did not start partially with a restart loop or missing mounted prerequisite.
Node online but actions do nothing
- Confirm the installed agent is current.
- Check for a stale second agent process.
- Confirm the env file matches the current node record.
- Check websocket or heartbeat auth failures in the agent logs.
Support triage sequence
Support playbook
Support quality improves when responses are consistent and grounded in the same checks every time.
Before replying to a ticket
What customers should include in a ticket
- Device and app name.
- What they were trying to do.
- The error message or screenshot.
- Whether the issue affects only them or all invited users.
FAQ
Do I need technical knowledge to use Zephyr?
No for customers, some for operators. The customer side should be portal-driven. The operator side is procedural if you follow the documented sequences.
What are the highest-friction areas that need the best docs?
Nodes, first deployment, routing, customer handoff, and troubleshooting buffering or provisioning issues.
Can the docs use real visuals without drifting out of date?
Yes. This page uses framed previews from the public demo panel so the visuals track the actual UI instead of static mock screenshots.
What should be documented next?
WHMCS order flow, Cloudflare tunnel recipes, product-template examples, and customer-facing device guides for Plex, Emby, and Jellyfin.
How do I reduce avoidable support volume fastest?
Improve the handoff checklist, enforce better ticket prompts, document the node install path clearly, and make the portal answer the repeated customer questions by itself.