How to Debug the VCF Management Service (VCF 9.1)
With the release of VMware Cloud Foundation (VCF) 9.1, many customers are discovering the VCF Management Service for the first time. For most, it feels like a black box.
This article breaks down that black box and shows you how to troubleshoot and debug the VCF Management Service effectively using built-in tools like kubectl.
Understanding the VCF Management Service Architecture
Before troubleshooting, it is essential to understand how the service is structured.
The VCF Management Service is composed of two types of nodes:
- Control Nodes
These nodes manage and maintain the infrastructure. They form the control plane of the service. In a high-availability deployment, there are typically three control nodes. - Worker Nodes
These nodes run the actual services and host the virtual IPs (VIPs) required by the platform. The number of worker nodes depends on the sizing and redundancy options selected during deployment.
All nodes (control and worker) use the CIDR block defined during deployment, typically /27 (small deployment) or /29 (large deployment).
By default, the service is deployed on the VM Management Network. However, you can specify a different network using the API with the xRegionNetwork argument.
If your CIDR range is too small, it can be expanded after deployment.
Remark: for the moment, it’s only possible to deploy the VCF Management Service on VLAN and not on Overlay Network
Mandatory vs Optional Services
During your first VCF deployment, some services are mandatory while others are optional.
- Mandatory
- VCF Service Runtime
- SDDC Lifecycle
- Telemetry
- Salt Master
- Software Depot
- Fleet Lifecycle
- Salt RaaS
- Optionnal
- Log Management
- Real-time metrics
- Identity broker

Locate Nodes and Service Details
To identify nodes and understand their roles:
- Go to VCF Operations
- Navigate to Build → Lifecycle
- Select VCF Management
- Open VCF Services Runtime
Here, you can view all nodes along with their assigned IP addresses.

Access the Control Plane
To begin troubleshooting:
- SSH into one of the control nodes using the vmware-system-user account
- Use the password defined during deployment
- Elevate privileges:
kubectl requires root access, so switch user:
vmware-system-user@vco-service-23rnd [ ~ ]$ sudo -i
[sudo] password for vmware-system-user:
root@vvco-service-23rnd [ ~ ]#Using kubectl for Troubleshooting
Once connected, you can use kubectl to inspect the system.
List all pods across namespaces:
kubectl get pods -A
Filter by namespace (example for logs):
kubectl get pods -n ops-logs
Filter pods by status:
- Pods not running or completed:
kubectl get pods -A | grep -vi ‘Run|Compl’ - Pods not running:
kubectl get pods -A | grep -vi ‘Run’
- Completed jobs:
kubectl get pods -A | grep Compl
Understanding Pods and Jobs
In the VCF Management Service:
- Some pods represent continuously running services
- Others are jobs triggered by actions in the UI or API
Common job examples:
- VCF Appliance (VCFA) import during upgrades
- Scaling the VCF Management Service
- Expanding storage for a service
- Upgrade pre-checks
Most job-related pods should end in a Completed state. If you see Error, further investigation is required.
Retrieving Logs from Pods
To analyze a specific pod:
kubectl -n <namespace> logs <pod-name>Example:
kubectl -n ops-logs get pods
kubectl -n ops-logs logs log-processor-0
root@vco-service-23rnd [ ~ ]# kubectl -n ops-logs get pods
NAME READY STATUS RESTARTS AGE
log-processor-0 2/2 Running 0 14d
log-processor-1 2/2 Running 0 14d
log-processor-2 2/2 Running 0 6d19h
log-store-0 1/1 Running 0 14d
log-store-1 1/1 Running 0 14d
log-store-2 1/1 Running 0 14d
ops-logs-volume-resize-6h9pz 0/1 Completed 0 14d
You can combine this with common Linux tools:
- View logs interactively:
kubectl -n ops-logs logs log-processor-0 | less - Search for errors:
kubectl -n ops-logs logs log-processor-0 | grep -i error

These commands provide a simple but powerful way to troubleshoot issues by directly inspecting service logs.
Final Thoughts
The VCF Management Service is not a black box once you understand its Kubernetes-based architecture. By leveraging kubectl and understanding how nodes, pods, and jobs interact, you can quickly identify and resolve issues in your environment.


0 Comments