Wednesday 10 June 2020

New Homelab candidate - Dell Optiplex 7050

I have been looking at a replacement to my steam powered DL360 for some time.  Due to the noise and heat it produced, it spent most of its life powered off - so I realised I need to look for a replacement. 

I like the idea of the NUC cluster, but I wanted to find something that would hopefully be a little more cost effective - and I found the Dell Optiplex 7050. 


Reasons for looking at this as a candidate was mainly its size, i5 processor, ability to have an M.2 SSD as well as a SATA 2.5" disk, RAM expandable to 32GB and the Intel NIC (I was hoping to avoid having to search for drivers for another NIC and was hoping and preying the NIC would covered by the Intel drivers included in the base ESXi image).
The model I bought was from eBay and came with a 256GB SATA SSD and 16 GB RAM - a good starting point to prove a theory and see how it would run ESXi...

Unfortunately to increase the RAM, both slots are already used - so both DIMMs need to be replaced with 16 GB DIMMs (not cheap!)

I installed a 256 GB M.2 SSD for a cache tier (I bought this one which seems to work well despite getting HCL warnings)...

Already it's starting to get expensive to get the 32 GB ram, 256 GB M.2 cache and consider upgrading the storage from 256 GB to maybe 1TB...  but still maybe more cost effective than an NUC!

To install the OS, I wanted to try and use the VCF CloudBuilder VIA (VMware Imaging Appliance) but as that seemed to be expecting some specific hardware which my Optiplex didn't meet, although I got the PXE boot working, the build didn't complete successfully.  The only place I had an Ethernet socket and an HDMI device was in my living room - the size of the Optiplex next to a 55" TV...!


Eventually I build the machine from USB key using these instructions & was so relieved to see that ESXi had found its NIC...!  View from vCenter:


There are some steps you need to jump through to configure a cluster which vSphere 7 has made a little simpler than previous versions.
To see if this is viable as a lab candidate, I wanted to see what performance I got.   As this is a single node vSAN cluster, the first attempt at deploying a VM was the below; I knew this would be a problem but wanted to check the starting position before changing FTT and stripe settings.
Sam McGeown has done a great write up on a single node vSAN cluster here which covers these settings well and despite being written is 2017, still works for vSphere 7.0.

I was eventually able to deploy the HCI Bench which is a VMware Fling stress test utility to see what performance you can expect to get out of your cluster (albeit mine currently is a single node with half the RAM and a much smaller storage SSD than I would want).  

It's configured through a very familiar UI.  The HCI Bench site has a good walk through guide to show how this is configured.

When you start a test, in addition to the HCIBench_2.3.1 VM that is deployed through the OVA downloaded from the VMware Flings site, additional VMs are created which run the actual load tests. 
There is an 'easy' option which I used which specifies a number of predefined workloads and I managed to get between 8k and 15k IOPS depending on the workload profile which I was quite happy with (considering when I started sizing VMware clusters on VI 3.0, 10 IOPS per VM was used as a starting point for an average VM!)


Conclusion:
The Dell Optiplex 7050 is definitely a good alternative to an Intel NUC, but I thought the cost savings would be greater than they probably will be, especially as I am looking at a memory upgrade a lot sooner than I thought I would need to (below is the memory consumption of just the hypervisor and vSAN!)



Thanks for reading and happy labbing!!

Tuesday 17 September 2019

vAPI Rejecting Logon on a Session error message from vRealize Orchestrator

I have used some of the vAPI out of the box vRO workflows (and some additional found here - thanks to Oliver Leach!) in order to add a number of vSphere Tags to machines I provisioned using vRealize Automation as part of a larger Event Broker workflow.

The problem I was facing was that after a certain amount of calls to the vAPI endpoint, I received an error message:  "Rejecting Logon on a session where login failed".  Not very helpful!


I also noticed the same message in /var/log/messages:  "java.lang.RuntimeException: Rejecting login on a session where login failed".

To get things working again quickly, you can restart the vmware-vapi-endpoint service on the vCenter server...

This seems to happen due to a threshold limit of open connections within the vAPI (although I tried playing with some advanced parameters and could not resolve it; although if the problem is due to unnecessarily open sessions, we probably want to fix it properly...!)

I tried to stipulate the credentials used to ensure it wasn't a permissions issue; which it wasn't.  As such, changes away from the commented out line (see below the fourth line beginning with //) are probably unnecessary.  This may be useful however, if you want to use different credentials other than those that you added the vAPI endpoint to vRO with.

Eventually, I reviewed the API guide for the vAPI endpoint and notice there was a .close(); parameter which was not being used. 

After reviewing all tagging workflows and adding the client.close(); line at the end of each scriptable task, the problem went away... very simple but not entirely clear when looking at the error message! 

N.B., check the initial variable name is client (the third line in the above image; i.e., var client) as I believe some of the workflows use a different variable name other than client.


Enjoy!

Tuesday 11 June 2019

Moving all standalone hosts into a cluster with PowerShell

The issue I hit was cause by trying to run a tool to disable TLS 1.0 and 1.1 across a number of deployments I'm working on (details on this can be found here)...

The TLS tool can target either a cluster to work through disabling SSL and TLS 1.0 and 1.1, or it can target a single host.  The customer I have been working with has many hundreds of hosts, across a number of sites... so on each vCenter, they have 50+ hosts ready to add to a cluster for consumption, depending on where the resource is required.  As such, to try and use the TLS tool to disable everything except TLS 1.2 would have been running a script 50 + times, having to specify the administrator@vsphere.local password each time.  Very time consuming

The easy solution as I saw it, was to add these hosts to a temporary cluster, run the script against the cluster level, then remove them...  Here's how I did it:

From PowerCLI, after connecting to the vCenter in question (connect-viserver vcenter.domain.local):

$clusterless  = get-vmhost | ?{ $_.Parent.Name -eq "host" }

This ran through all hosts registered against the vCenter, and found those with a parent of  'host' - i.e.  not a cluster member.

From this, you can use the 'Move-VMHost' command to move  all of these hosts into a temporary cluster.  I named mine 'tmp':

 This will move all of these hosts into your cluster (I had by this point restarted all of these hosts hence them not responding):

For the reverse, you can run the same command but replace "host" with the cluster name:
 N.B., I should have probably changed the variable name from $clusterless...!

After which, you can move these back to the root datacentre level by re-running the 'Move-VMHost' command:



Disabling TLS 1.0 and 1.1 on vCenter and ESXi

I have been working with a customer who's needed to disable TLS 1.0 and 1.1 across all of their estate; I was focusing on vCenter, ESXi and NSX.

This is covered in the official documentation is all covered  https://docs.vmware.com/en/VMware-vSphere/6.5/com.vmware.vsphere.security.doc/GUID-82028A21-8AB5-4E2E-90B8-A01D1FAD77B1.html

The process that I followed was vCenter --> ESXi Hosts -->  NSX Manager.

To start with, the TLS Reconfigurator tool is downloadable from here

Choose the version of vCenter you have (.rpm for vCSA and .msi for Windows):

For me, I am using the VCSA and I think most deployments use this now - so the following instructions cover the VCSA...

To begin, copy the rpm file to the VCSA.  Use something like WinSCP and FileZilla.  Note, you may need to enable the bash shell for root to allow this:

Copy it somewhere sensible like /tmp.  Once you've done this, install the tool by running rpm -Uvh /tmp/VMware-vSphereTlsReconfigurator-**********.x86_64.rpm.  This will install the tool into /usr/lib/vmware-vSphereTlsReconfigurator.  CD to this.


There are two sub folders here; we will start in VcTlsReconfigurator...

To disable both TLS 1.0 and 1.1 against the vCenter, first take a snapshot, then run the following command:

reconfigureVc update -p TLSv1.2

This will run through all of the components of vCenter, report what the version of TLS currently is, and will report after the script has run, what the new version of TLS is after the execution has completed.  Please note, this will stop the vCenter services:

Before:

After:


Done - this should take no more than 5 minutes.  Onto the ESXi hosts; run cd ../EsxiTlsReconfigurator.

 Assuming you want to disable both TLS 1.0 and 1.1, there are two options you can run here...  Against a cluster and against an individual host:
  • reconfigureEsx vCenterCluster -c {clustername} -u administrator@vsphere.local -p TLSv1.2
  • reconfigureEsx vCenterHost -h {hostname.fqdn} -u administrator@vsphere.local -p TLSv1.2
It is so much easier to run this against a cluster than an individual host; you will need to specify the administrator@vsphere.local password each time, so it will be time consuming to do against an individual host...  However, it's worth knowing the option is available.  

Initially, this will not make any changes; in order to apply the changes, you need to restart each host to set the TLS settings:

After which, all of the ESXi hosts will be patched.  On the vCenter, you can remove the .rpm file from /tmp and reset the root user's shell back to the Appliance shell by running chsh -s /bin/appliancesh root.


Finally, onto the NSX Manager.  Connect to the manager UI, login with the admin credentials, click on 'Manage Appliance Settings', and click on 'Edit' next to 'FIPS Mode and TLS settings':


Again, this will restart the NSX Manager services but will not impact service.

Tuesday 16 April 2019

Getting vCloud Director vApp details through REST API

A colleague of mine asked me to verify the configuration of a vApp by checking the API, to ensure what we were seeing in the GUI was the same as the API.  I have spent a lot of time with postman which is a great tool for making REST calls from a client (either Windows, Mac or Linux are available from https://www.getpostman.com/ - and I should add I am in no means affiliated with them :))

So the two issue I had was that for vCD, most of the API calls need a bearer token rather than basic authentication, and that the API guide refers to a vApp number... but I wasn't sure how to get this.  Performing a GET on /api/objectOfInterest will normally return all available objects in the space, but I guess because vCD is multi-tenant platform, you have to specify very clearly which organization or VDC you're working within).

The steps involved are outlined below:


 Get API Versions
You may already know this if you're familiar with the version of vCD you're using but in case you're not.... there are multiple versions of the API available, which you can check by issuing a GET to https://vcd.domain.local/api/versions using basic authentication:

In my deployment, the latest API version was 30.0.  You can see the authentication is basic using 'administrator@system' for the username, although this could be something different if you have an account with API access. 

Get Bearer token from session
Now that you have the API version, you need to get a bearer token.  vCD doesn't support basic authentication beyond a basic set of calls, but to do anything of interest, you will need to get a bearer token.  

To do this, the API call is https://vcd.domain.local/api/sessions, again with basic authentication, although you will need to specify the header:  Accept : application/*+xml;version=30.0 where 30.0 is whatever you retrieved in the first step


 This will return 'X-VMWARE-VCLOUD-ACCESS-TOKEN' which I have blanked out here... but copy this.  You will need it for all subsequent calls.

Get vApp URL
Now that you have the bearer token, you will need to identify what the URL is for the specific vApp you want to work with...  All of the documentation refers to /api/vApp/vapp-7, which in my opinion is a little misleading as it looks like such a low number that this must be a sequential number issued to vApps when they are created.  This is not the case :)....

Issue the call https://vcd.domain.local/api/extension/vapps/query, only changing your authentication to bearer and pasting the token retrieved from the previous step:

This will return an xml body of all vApps in your environment, so you should be able to find the vApp you need along with its specific URL:


 Horribly formatted because I had to remove customer-specific details!

Now you have that URL, you can issue an API call against that URL to return all data specific to that vApp, as well as individual URLs for returning only specific data (for example, I wanted to return the startup and shutdown timer values for each VM).

Get vApp Specific URL
By hitting the vApp specific URL (https://vcd.domain.local/api/vApp/vapp-{long guid string}/), you will be able to pull specific URLs relevant to the information you require...  For me, the interesting part was the startupSelection URL suffix which returned all values as shown below:


 This is the start, from here you can go through the API guide to do anything you need to do from the API...    Enjoy!

Tuesday 29 January 2019

Using vRA CloudClient REST calls to export and Import property groups and definitions

I have used the CloudClient a lot in the past to export/import machine and XaaS blueprints, resource actions, etc. but there isn't a native way to export/import property groups and property definitions. 

Recently, I've needed to move  config from the development to a production platform.  The ideal approach to this would be to use vRealize Suite Life Cycle Manager (or vRSLCM if you insist on an acronym) which would enable a more CI/CD approach for vRealize Suite products for moving config through development, test, acceptance and production including the deployment and upgrade process, as well as customer specific config such as property groups...

It it covered nicely here by @virtualhobbit.  However, it's a product which is often overlooked when  discussions around vRA start, so I needed an easy way to move property groups & definitions without vRSLCM...

As I mentioned, I used the CloudClient where I can because it's easy to get started with, something customers can get familiar with without having to understand how to build up REST API calls using curl and has a quickly understandable command syntax...

You can still use the CloudClient's REST functionality to do this...  There are plenty of blogposts covering how to connect to the CloudClient so I won't go into details on that - but using the REST client is new to some people...  First off, I tried running the below command to ensure I had the required permissions to make REST calls...  This was logged onto the source vRA instance.

vra rest get --service identity --uri tenants

Which successfully returns the tenants that I have available as part of this vRA deployment.

Next, using 'vra rest get' commands, I can return the property groups and property definitions and export these to JSON:

vra rest get --service properties-service --uri propertygroups --export c:\somewhere\pg.json --format JSON
 vra rest get --service properties-service --uri propertdefinitions --export c:\somewhere\pd.json --format JSON

This will export the existing property definitions and groups into JSON files:


JSON isn't the easiest thing to read & understand, so I tend to use http://jsoneditoronline.org/ which shows the JSON script graphically.  This is also very useful when having to re-import the property groups / definitions.

Unfortunately the 'vra rest post' command will expect a single object rather than the array which we have just exported.  As such, I used jsoneditoronline.org to split out the individual objects:

By collapsing the individual objects in the array, you can copy them out to another editor window to verify they are correctly formatted JSON, then save them to individual files...  Not ideal from a speed point of view, but this approach does ensure consistency between platforms that is essential.

From the destination vRA instance, again after logging in, I ran some 'vra rest post' commands in order to post the property groups & definitions:

vra rest post --service properties-service -uri propertygroups --format JSON --data @c:\somewhere\pg.json --headers no

I can specify to include the headers which will show the full return rather than just the success code of 201:
 
vra rest post --service properties-service -uri propertygroups --format JSON --data @c:\somewhere\pg.json --headers yes

The same applies for the property definitions as well, although below I hit a case sensitivity issue where the command didn't like '\t' in a filename path, presumably because it's a tab character... so I had to use an uppercase in the string:

vra rest post --service properties-service --uri propertydefinitions --format JSON --data @c:\somewhere\file.json  (n.b., I didn't specify headers here so it defaulted to no):


 The result...  All properties groups and definitions added to the destination vRA instance successfully.  This might seem like a lot of work for you, but I am relying quite heavily on property groups and want to ensure consistency between deployments.
Next task I am going to look at is how to use vRO and the rest client to return the JSON string from the source and run through a 'for each' loop for every object in the array of definitions / groups... but for now this is way more work to automate something than is really necessary for the time it will take to do manually.

The REST client through CloudClient can be used for pretty much anything else, not just properties...  This information is available in the API guide which is on the appliance:

https://vra.domain.local/component-registry/services/docs


Happy automating!



Friday 25 January 2019

Issues with vRO and Active Directory only returning 1000 items

I have seen a few references to this but it's not a common problem, mainly because most people can stick to under 1000 items in an OU.  For those that cannot, the problem is this...

By default, when performing a search using any tool into AD, it will only return the first 1000 which is a default limit.  I've seen a few blog posts with application-specific approaches to this but couldn't find one for vRealize Orchestrator specifically. 

Firstly, I created an empty OU and used vRO to create 1500 groups:
The script is simple. I defined an OU as an attribute and run the script while the variable nextNumber is less than 1500.
It's worth pointing out the pad function.  That is really useful when a customer wants to use a custom naming convention within vRA and insists on having say 4 digits.  The pad function above will add the additional zeros on by calling the function pad for the variable longNumber...  Then I increment nextNumber by the line nextNumber++.

The result from AD is clear - 1500 groups:

  
However the vRO plugin only returns the first 1000:

There is a way of setting the plugin parameters from vRO, but this doesn't change anything... The limit is a restriction from AD.

So the answer is this...  On your domain controller (or any that vRO is targeting), use ntdsutil to change the MaxPageSize limit... The commands are below:
ntdsutil
ldap policies
connections
connect to server {your dc}
quit
set MaxPageSize to {pick a number- I went for 2000}
Commit Changes
 quit

And can be confirmed by running show values:

After rebooting AD, it returned the full 1500 results (I am not sure how necessary this is...  it's Microsoft, it felt right :) plus the plugin was not behaving beforehand although this might be due to my running the DC and vRO appliance on my laptop and starting to struggle...) 
 I tested that the limit was due to the change using ntdsutil by modifying my script to start at 1500 and run while nextNumber < 3000:
Although this hit another MS limit when I was trying to browse AD using dsa.msc (Users and Computers)
 However, it shows that the value set by ntdsutil limits what AD will return:

The problem with the above fix is this...  Most Active Directory administrators won't be happy about making changes to their domain controllers using ntdsutil without a very good reason.  In reality, having more than 1000 items in an OU is unworkable; either you have more groups than you need, or not enough granularity in your OU structure...  However I am not an authority on AD design, so I will highlight the vRO workaround I used instead!

I could add that you could deploy a read-only domain controller and that might appease your awkward AD team, although this does seem a little overkill

Workaround:  I use AD groups within configuration elements.  This provides a referable item that you can get your vRO workflows to lookup every time they run.  I like this approach because it means that when exporting code from one vRO platform to another, changes are easy and obvious (rather than having to find in code where something references a test or dev group as opposed to a production group, for example)...  What I have done below is create a configuration element with the highest item available to me (it was 2000) and exported it to a .vsoconf file:

 Then I modified this to update the value to one that is not returned by the default search limit (for me, the next available server 2001).  This is an XML string which contains the distinguished name of the object in question starting with CN for commonName (you can see CN%3D where %3D is the unicode for the equals symbol):
By the way, the text editor is Syntra Small - someone asked before...!

Then import the configuration and overwrite it:


This will allow you to store the configuration element with the new updated value which you can add to your workflows to use to call on for whatever you might need to do...
Not an ideal approach, I would strongly suggest addressing the enormous amount of groups in a single OU, but I realise that is not always possible...