Case Study – Boosting performance of a customer management system and hardening systems to improve the cybersecurity posture.
A client recently approached us looking for a way to improve the performance of their client management solution (CMS). Staff were complaining of long wait times for reports, slow responses while navigating through menus, and regular system crashes.
Because the CMS stores personally identifiable information (PII) and other sensitive information, our priority was to safeguard what they have, while improving the user experience.
The move to the cloud was already in progress (as a clone of the on-premise installation) so our tasks was significantly more serious than previously stated. However, the challenge was accepted and we began work.
The technical solution as it was before starting work was described as: A single application server running a Java application, and a single back-end server storing files and the database. Backups were to a centralised backup solution that would soon be decommissioned.
Our recommendation (which was ultimately accepted) was to add multiple applications servers behind an Nginx load balancer. Add a second database server to form a cluster. And to split the file server off into a separate server with only one role.
The challenges were significant. Because the CMS was accessible from the Internet, security was our highest concern. We had a firewall with Geo Fencing (and other features) put in front of the Nginx load balancer, and sent all logs (everything) to a newly installed SIEM. The SIEM was configured to send alerts for triggers to the helpdesk for prioritised remediation.
The Java application did not share session states between application servers. Therefore, staff would be constantly asked to re-authenticated as their session (managed by the Nginx load balancer) moved between application servers. Our solution was to configure “sticky” sessions on Nginx which worked very nicely. It’s not a perfect solution, but staff were fairly nicely balanced across application servers.
Backups remained an issue. In fact, so did the disaster recovery plan. We configured two addition servers in the on-premise environment to handle the application server and a combined server for the files and database. The database and files replicated in real time from the production environment to the DR environment. Both the production database server (one of them) and the DR database server did regular backups throughout the day to ensure the RTO and RPO were met.
Server patching was managed through Canonical’s Landscape. This product can schedule patching for Ubuntu servers. Using a combination of Ansible playbooks, Landscape, and AWS APIs, we were able to schedule servers to power off, power on, patch, and join the load-balanced pool.
Staff did not report any outage moving forward. There were some performance issues such as with large reports but those were minimised to negligible with further database performance improvements.
The project was completed within budget and on-time. Maintenance is an ongoing component of cluster which continues to be positive asset to the business.