Sunday, 5 November 2017

Docker Swarm clusters part 2 - running containers on Swarm

From my last blog post about getting started with docker , this post is about running containers at scale with Swarm and why this should be interesting to those deploying containers on Docker. 

The current status of my Swarm cluster is that I have a total of 5 nodes, three of which are managers and two worker nodes.

To start with, on a single node, I am going to pull Mediawiki from Docker Hub to see what the default behaviour is; so docker pull mediawiki, then docker run --name medwik -p 8080:80 -d mediawiki.  If you're getting familiar with Docker and what the above command has done, then:

docker run (run a container) --name medwik (name that container 'medwik') --p 8080:80 (map port 8080 on the container host through to port 80 on the container) -d (detatched, i.e. run in the background) mediawiki:latest (run the latest mediawiki image...  I only ran docker pull mediawiki which will have pulled the latest by default).

After which, from a browser, to the IP address of the docker host, on port 8080, I can see my container image has started

As you'd expect, on the IP address of another docker host in the Swarm cluster, the Mediawiki site is not reachable:

So that's where Docker Swarm comes into its own;  using the Docker Service command, I can start the same container, but this time as a service:

(Please note, I also ran a 'docker pull mediawiki' on all other hosts before performing this step).

docker service create -p 8080:80 -d --name wikicluster mediawiki:latest --replicas 3

'docker run' has been replaced with 'docker service create', so this is now running as a Swarm service rather than a container.   The other parameters are the same with the exception of '--replicas 3'.  This defines how many versions of the container will run.  This can be verified using 'docker service ls' and 'docker service ps wikicluster':

We can see that under the 'Replicas' column of 'docker service ls' we have 3/3.  This means that the desired state is 3 running containers and the actual state is 3;  good news!  This is verified by running 'docker service ps wikicluster'.

What has happened is the below; although we have mediawiki running on node 01, 03 and 05, because it is running as part of a Swarm cluster, if I hit any node in the Swarm cluster, it'll redirect to a Docker host running the container.  All of this is managed by the Swarm managers.
 To prove this, if I hit node 02 or 04 (i.e., the nodes that are not running the container), port 8080 still responds!


The actual/desired state is important.  What it means is that Swarm will keep a constant eye on the state of the cluster and in the event of a problem with a host, will ensure that the desired state of 3 running clusters is met.  So to prove this, host 'docker01' had a pretty fundamental problem (I restarted it with 'init 6'!)

Which is confirmed by running 'docker node ls' on another host.   However, if I run 'docker service ls', I can see that the replicas state is 3/3 again...  What has happened?  I run 'docker service ps wikicluster' which shows that I now have wikicluster running on nodes 02, 03 and 05 - so swarm picked up the issue and immediately fixed it. 

This is very powerful; within minutes, I have been able to run up a cluster that is automatically load balanced and fault tolerant.  

How does Swarm compare with Kubernetes (K8s)?  From a very high-level, K8s needs a fair amount supporting infrastructure to run, but does provide additional features such as image management (not having to docker pull many times), autoscale, a better API (makes for a more cloud native experience), a larger scale, more tooling, etc.  whereas Swarm is something that you might deploy to get a small Container deployment running in a highly-available, fault tolerant way, quickly!  

This article has a great 'vs' section between Swarm and K8s.

Thanks for reading :-)

Friday, 20 October 2017

Getting started with Docker Swarm on PhotonOS

I have been playing with Docker as a bit of a hobby for the past few months, in between work, life, travel, etc.  so haven't had a chance to play with Swarm...  I have found a few hours each day thanks to a train commute, so that's now changed :)

So what is Swarm; in short, Swarm gives you the ability to cluster multiple Docker hosts to allow native Docker-level clustering, all implemented with a few easy steps...  Something which is very powerful and I will demonstrate in a future post (too much info for one post here!)

I started with PhotonOS (an old version - one I had on my laptop and didn't have the bandwidth to redownload!) so already I was behind...  Swarm is a feature of Docker 1.12 and above; with PhotonOS v1, Docker is on V1.11:

I had to configure the initial VM; run updates and drag Docker to a reasonable version...  All of which took more in terms of downloads than actually downloading a later image...!

I had to spend a bit of time familiarising myself with systemctl, networkctl, hostnamectl, etc... in order to configure the box.  To start, hostname.  Although the traditional 'hostname <new hostname> appears to work, it is not persistent after reboots.  You need hostnamectl:

hostnamectl set-hostname <new hostname> 

Reboot and the hostname remains.   I then needed to set the IP address;  to do this, touch /etc/systemd/network/ and add the following content:


You will also have to chmod 644 on the file.  This assumes your ethernet adapter is eth0; something which you can confirm by running networkctl

Good!  The last thing I needed to do was get the machine up-to-date...  On PhotonOS, yum is a cut down tool called tdnf.  So tdnf update, I had to reboot here, then systemctl start docker and we can see the version of docker is where in needs to be.
N.B., I also enabled ssh on this server by using systemctl start sshd.  

Now, I cloned a few VMs out so I also had docker02, docker03, docker04 and docker05.  I noticed that the VMs couldn't ping each other, something due to the default firewall being quite strict...  Quick and dirty fix time - systemctl stop iptables.  This is a lab running on my laptop, so not the end of the world.  In a prod environment you would speak with your security team, of course ;-)

So to create the Swarm cluster...  A couple of things to note...  Docker recommend using 3 or 5 'Manager' nodes.  You will start creating a cluster with the docker swarm init command which will force the first node to be a manager.  Although more managers can be added, it makes cluster election take time and is not recommended.  Secondly, by specifying the listen and advertise IP and port, you are ensuring that should your Docker host will use an IP:port that you want.  With that said:

Above, I have run three commands...  The first is the docker swarm init command to initialise the cluster.  The second gives the join string for manager node an the third (also output when I ran the init) gives the command to join a worker node:

docker swarm init --advertise-addr --listen-addr
docker swarm join-token manager
docker swarm join-token worker 

I will add an additional two manager nodes and the final two nodes will be worker nodes.  It's worth noting that all manager nodes are also be worker nodes - so by having 5 nodes in your cluster with 3 manager does not mean that you only have 2 worker nodes.

From Docker02:

I copied the command output from the previous 'join-token manager' command and suffixed it with the same 'advertise-addr' and 'listen-addr' switches to ensure that docker02 is listening on its adapter / port that I want it to.  Now, when I run docker node list, I get:

Both nodes added to the swarm cluster.   I will continue through the remainder of the cluster adding 03 as a manager and 04 and 05 as workers...

From the above, you can see in the 'MANAGER STATUS' column that we have 3 managers, one 'Leader' and two 'Reachable'.  All managers will proxy commands through the node leader...  In the event of a failure of the leader, a re-election process will take place between the remaining managers.  We can see that nodes 4 and 5 have nothing in the manager column...  As we would expect.

One final thing to note before I close this post off (way too much to demo without a separate post here!) is the port that I used earlier; 2377...  This port is now an IANA-registered standard (it wasn't in 1.12), although you can specify your own port here if it's more convenient.

Coming up soon... How Swarm works & why it should be interesting to those looking at containers!

Friday, 28 July 2017

VCP7-CMA Section 6 - Blueprint Dissection

Last and final section in the VCP CMA 7 Exam...

Section 6: Extend a vRealize Automation Implementation
+ Objective 6.1: Configure vRealize Orchestrator for use with vRealize Automation
·      Configure vRealize Automation to use an external vRealize Orchestrator server
Administration > vRO Configuration > Server Configuration:

Default port is 8281.
·      Configure default vRealize Orchestrator settings in vRealize Automation
From default tenant (https://vra-app/vcac/org/vsphere.local) as administrator@vsphere.local, vRO Configuration – then modify from there:

·      Set tenant specific vRealize Orchestrator settings in vRealize Automation
From a specific tenant (https://vra-app/vcac/org/engineering) as someone with tenant admin rights, then ‘Administration’ tab, vRO Configuration, Server configuration and configure as above.

Objective 6.2: Create and Manage Event Broker Subscriptions
Normally EBS used for extensibility during machine provisioning or life cycle management; for example, when provisioning a server, you might want to talk to an IPAM or CMDB, or a Linux server might need to have a DNS entry added to AD.  The same in reverse when disposing of the server.
·      Determine appropriate subscription option based on design (blockable, replyable, schema)
Replyable = something like Pre approval if a response from an external system is needed to be returned to vRA (i.e., if a VM creation is dependent on another approval system).  Look at the schema to identify what the output parameter from vRO needs to be:

Event Topic (for example, pre-approval, machine provisioning, business group configuration):  Choose the type of event you want to execute a workflow on
Conditions – normally Life cycle stage or similar.
Blocking:  Blocking will wait for the workflow to complete until continuing deployment
Priority:  Used to determine the order in which EBS should be run; the LOWER the priority number = the sooner it will run

·      Configure subscription conditions based on the design (data, core event message values)
Administration > Events > Subscriptions > +

Select the event type…  For the example of add a server in CMDB, you can do this through machine provisioning:

Click Next…  Add conditions.  Typically, something like:

Data > Lifecycle state > Lifecycle State Name = VMPSMasterWorkflow32.MachineProvisioned

Now choose the workflow you wish to run.  In the example here, I am writing out to an SQL database with some info about the VM… 

Change details as required:

·      Configure subscription workflow including input and output parameters based on the design
Input must be a property with a sensible name (most people tend to use ‘payload’ – according to the extensibility guide:
“To use a single parameter, configure one parameter with a type of Properties. You can provide any useful name. For example, you can use payload as the parameter name.”
Outputs must match the reply schema (see the picture under the ‘blockable, replyable, schema’ section)
·      Configure subscription details based on the design (priority, timeout, blocking)
Covered in ‘blockable, replyable, schema’.

+ Objective 6.3: Configure Virtual Machine Lifecycle Automation
All of the below assume you have vRO workflows for specific lifecycle events – i.e., a workflow which will create a DNS entry in AD for a Linux server, or interact with a CMDB to create or delete entries).  
·      Configure automatic post-provisioning actions based on design criteria
·      Configure automatic deactivation of a virtual machine based on condition criteria
·      Configure automated event brokering for different status or event criteria based on design requirements
Covering off all of the above – from the Administration tab, click on ‘Subscriptions’ and ‘Events’.  Depending on when you want the specific use case is, you can select a particular event to trigger a workflow. 


And can be selected from an Event Broker Subscription workflow

Add conditions to trigger the particular workflows:
·      Data > Lifecycle state > Lifecycle State Name = VMPSMasterWorkflow32.something (see below)
·      Data > Lifecycle state > State Phase = PRE or POST
·      Anything else that you need to specify, i.e., I tend to use things like the blueprint names (assuming they have the OS type in there).

and specify a workflow to run in the event:

Objective 6.4: Install and Configure Plugins in vRealize Orchestrator
·      Install and configure plug-in in vRealize Orchestrator
o   Install and configure vRealize Automation plugin
vRO 7.0 Get the automation plugin – either from a vRO server that is shipped with vRA, from the vRA server (connecting through WinSCP or similar and browsing to /usr/lib/vco/app-server/plugins) OR from the solution Exchange

And get the vCAC plugin…

N.B., by default the standard vRO server will not have these plugins so for a simple quick install, the vRA version of vRO is going to save you a lot of time…

Click on ‘Plugins’:

Find the .vmoapp plugin file (or .DAR file extension) that you want to install…

From vRO 7.1+ The plugin is already installed in vRO so no need to download / install it.

o   Install and configure VMware NSX plugin
Get the pugin from VMware Downloads:

And install in the same method as above:

·      Run configuration workflows in vRealize Orchestrator client
o   Run configuration workflows for vRealize Automation plugin
This has already been demonstrated *** using the vRO console, but can also be done using vRA > Administration > vRO Configuration > Server Endpoints > New+:

And workflows can be run from vRO directly here:

o   Run configuration workflows for NSX plugin
Outlined in section 5.1
o   Run configuration workflows for vSphere plugin
From within vRO, run the workflow Library > vCenter > Configuration > Add a vCenter Server Instance:

·      Determine if a plugin is enabled
Ensure check-box in the plugins screen from the vco-controlcenter

Objective 6.5: Modify and Run Basic vRealize Orchestrator Workflows
Enormous topic covered by a few lines, and vRO is definitely something that you will need to work with extensively…  There are week-long training courses for datacentre automation with vRO, but this VMware freeby (albeit slightly old) is well worth a look:
·      Execute vRealize Orchestrator workflows with defined parameters
From within vRO, select a workflow, right-click and select ‘Start Workflow’

·      Troubleshoot vRealize Orchestrator workflow errors
Again, huge topic;  from each WF run:

Review the logs here. 
·      Modify vRealize Orchestrator workflows
Varies depending on vRO version; find the ‘Edit’ pencil icon:

Depending on what needs to be modified, click on the ‘per-item’ edit button:

(This has changed with vRO 7.2 where you’ll half the screen is reserved for the WF and half is for the edit field (In, Out, Exception, Visual Bindings, Scritping, etc))

Sunday, 7 May 2017

VCP7-CMA Section 5 - Blueprint Dissection

Section 5: Configure and Administer Fabric Groups and Endpoints
+ Objective 5.1: Create and Manage VMware Endpoints
·      Integrate vRealize Automation with NSX
Starts at the vSphere End point (outlined below), on the network profile (Infrastructure > Reservations > Network Profiles to define Routed and NAT profiles) and on a blueprint

·      Add a vRealize Orchestrator endpoint to vRealize Automation
Administration > vRO Configuration > Endpoints

If you are using the internal vRO server then the Address is https://<IP or FQDN of vRO/vco
If you are using external vRO server then the address is https://<IP or FQDN of vRO:8281/vco
Add VMware.VCenterOrchestrator.Priority as a custom property with a value of 1 – this is essential for NSX.

·      Configure the NSX plugin in vRealize Orchestrator
This is possible from vRO, but should be done from vRA…  To do it from vRO, run the workflow Library, NSX, Configuration, Create NSX Endpoint:

And you can verify it’s added in the Inventory tab:

·      However, assuming you’ve set vRA up correctly (i.e., the vRO Endpoint as mentioned above)…   Add NSX to the vSphere (vCenter) endpoint as mentioned in the below section ‘Configure NSX Network and Security for the vSphere endpoint’

After you’ve added the NSX Networking and Security to the vSphere endpoint, you can do a data collection against Network and Security Inventory:

When that succeeds, when you look in vRO, you’ll see the NSX plugin has registered itself with the NSX Endpoint:

·      Perform data collection in vRealize Automation
Infrastructure > Compute Resources > hover over the arrow next to the compute resources and select ‘Data Collection’:

and click on ‘Request Now’ under the required item you wish to ‘data collect’…  Normally, this is useful if you change templates in vSphere and want the templates to be reflected in vRA.

·      Create and configure a vSphere Endpoint
Infrastructure > Endpoints > Endpoints > +New > Virtual > vSphere (vCenter)

·      Configure NSX Network and Security for the vSphere endpoint
The checkbox above ‘Specify manager for network and security platform’

(typo on the IP address J)
·      Create and configure a vCloud Air Endpoint
Same as for vSphere, but select +New > Cloud > vCloud Air

Objective 5.2: Create and Manage Fabric Groups, Reservations and Network Profiles
·      Create and configure a fabric group
You need to be an IaaS Administrator
Infrastructure > Endpoints >  Fabric Groups > + New

Choose name, description, Fabric administrators, plus the actual compute resource you wish to use. 

Also, you can see in the above screenshot the AWS resources are added to a fabric group in the same way as vSphere resources.

·      Select compute resources to include in the fabric group
See above - plus can be retrospectively added by editing the FG:

·      Configure compute resource Data Collection
Infrastructure > Endpoints > Endpoints - hover over the arrow next to the compute resource > View compute Resources
Hover over the Compute resource and select  'Data Collection'

From here, you can run a data collection, adjust the frequency, or enable/disable data collections

·      Create a vSphere reservation
Covered in section 4.5
·      Assign a business group to the vSphere reservation
Covered in section 4.5
·      Create a vCloud Air Reservation
First, the vCloud Air Endpoint needs to be added...  Infrastructure > Endpoints > Endpoints > +New:

After which, you can create the reservation... Infrastructure > Reservations > Reservations > + New > vCloud Air

Complete the reservation in the same way as a vSphere reservation, selecting the vCA resource from the Resources tab.

·      Assign a business group to the vSphere reservation
Covered in section 4.5
·      Create and configure network profile types
Infrastructure > Reservations > Network Profiles > +New >
Covered in section 1 incorrectly (section 1 is looking for blueprint config (i.e., add a network to the blueprint and select which network you want to use)
o   For static IP address assignment
o   External network profiles
o   NAT network profiles
o   Routed network profile
·      Create and configure machine prefixes
Covered very briefly in section 2.3
Infrastructure > Administration > Machine Prefixes > +New
Select the prefix name, number of digits vRA will append and the next number (i.e., on the first creation, where the numbering will start from).  After which, this can be configured on the business group or on the blueprint: