How to Debug the VCF Management Service (VCF 9.1)

Published by Valentin on

With the release of VMware Cloud Foundation (VCF) 9.1, many customers are discovering the VCF Management Service for the first time. For most, it feels like a black box.

This article breaks down that black box and shows you how to troubleshoot and debug the VCF Management Service effectively using built-in tools like kubectl.

Understanding the VCF Management Service Architecture

Before troubleshooting, it is essential to understand how the service is structured.

The VCF Management Service is composed of two types of nodes:

  • Control Nodes
    These nodes manage and maintain the infrastructure. They form the control plane of the service. In a high-availability deployment, there are typically three control nodes.
  • Worker Nodes
    These nodes run the actual services and host the virtual IPs (VIPs) required by the platform. The number of worker nodes depends on the sizing and redundancy options selected during deployment.

All nodes (control and worker) use the CIDR block defined during deployment, typically /27 (small deployment) or /29 (large deployment).

By default, the service is deployed on the VM Management Network. However, you can specify a different network using the API with the xRegionNetwork argument.

If your CIDR range is too small, it can be expanded after deployment.

Remark: for the moment, it’s only possible to deploy the VCF Management Service on VLAN and not on Overlay Network

Mandatory vs Optional Services

During your first VCF deployment, some services are mandatory while others are optional.

  • Mandatory
    • VCF Service Runtime
    • SDDC Lifecycle
    • Telemetry
    • Salt Master
    • Software Depot
    • Fleet Lifecycle
    • Salt RaaS
  • Optionnal
    • Log Management
    • Real-time metrics
    • Identity broker

Locate Nodes and Service Details

To identify nodes and understand their roles:

  1. Go to VCF Operations
  2. Navigate to Build → Lifecycle
  3. Select VCF Management
  4. Open VCF Services Runtime

Here, you can view all nodes along with their assigned IP addresses.

Access the Control Plane

To begin troubleshooting:

  1. SSH into one of the control nodes using the vmware-system-user account
  2. Use the password defined during deployment
  3. Elevate privileges:

kubectl requires root access, so switch user:

vmware-system-user@vco-service-23rnd [ ~ ]$ sudo -i
[sudo] password for vmware-system-user:
root@vvco-service-23rnd [ ~ ]#

Using kubectl for Troubleshooting

Once connected, you can use kubectl to inspect the system.

List all pods across namespaces:

kubectl get pods -A

Filter by namespace (example for logs):

kubectl get pods -n ops-logs

Filter pods by status:

  • Pods not running or completed:
    kubectl get pods -A | grep -vi ‘Run|Compl’
  • Pods not running:
    kubectl get pods -A | grep -vi ‘Run’
  • Completed jobs:
    kubectl get pods -A | grep Compl

Understanding Pods and Jobs

In the VCF Management Service:

  • Some pods represent continuously running services
  • Others are jobs triggered by actions in the UI or API

Common job examples:

  • VCF Appliance (VCFA) import during upgrades
  • Scaling the VCF Management Service
  • Expanding storage for a service
  • Upgrade pre-checks

Most job-related pods should end in a Completed state. If you see Error, further investigation is required.

Retrieving Logs from Pods

To analyze a specific pod:

kubectl -n <namespace> logs <pod-name>

Example:

kubectl -n ops-logs get pods
kubectl -n ops-logs logs log-processor-0

root@vco-service-23rnd [ ~ ]# kubectl -n ops-logs get pods
NAME                           READY   STATUS      RESTARTS   AGE
log-processor-0                2/2     Running     0          14d
log-processor-1                2/2     Running     0          14d
log-processor-2                2/2     Running     0          6d19h
log-store-0                    1/1     Running     0          14d
log-store-1                    1/1     Running     0          14d
log-store-2                    1/1     Running     0          14d
ops-logs-volume-resize-6h9pz   0/1     Completed   0          14d

You can combine this with common Linux tools:

  • View logs interactively:
    kubectl -n ops-logs logs log-processor-0 | less
  • Search for errors:
    kubectl -n ops-logs logs log-processor-0 | grep -i error

These commands provide a simple but powerful way to troubleshoot issues by directly inspecting service logs.

Final Thoughts

The VCF Management Service is not a black box once you understand its Kubernetes-based architecture. By leveraging kubectl and understanding how nodes, pods, and jobs interact, you can quickly identify and resolve issues in your environment.


0 Comments

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *