Mission possible: a project with no simple tasks

Yevgen
Mospan

Solution Architect
@EPAM

Mission possible: a project with no simple tasks


A global project is a huge responsibility. In our case, it covers a shipment delivery business and addresses all possible ways of sending shipment s in all possible regions. The solution consists of nine functional components that must correspond in their respective environments and work together.

The components were implemented with Play! Framework, Spring MVC, and Adobe AEM; there is a component on a database layer, a logging component, and so on. The front-end engineers work primarily with Angular JS; back-end engineers must know Quartz Scheduler and Terracotta in addition to Spring, Play, and CQ; DBA engineers work with Oracle. There are also DevOps who automate deployment with Ansible.

On a project with such a wide range of technologies, there can be only seriously technical Solution Architects (SA) and Delivery Managers (DM). Architects must know what functionality will be added and how that will affect the system globally. Delivery Manager must organize the work of distributed teams and not let them block each other. The SA-DM connection here is tighter than usual.

Architecture-wise, the customer needed us to account for a large number of quality attributes – a difficult task when they are often contradictory. Let’s say the client asks for three seconds for end-to-end – all the complex functionality and configuration has to fit in that time. And there are a multitude of non-ЕРАМ systems that have to be integrated with ours.

Another quality attribute is 24/7 accessibility and specific requirements for deployment. The client has its own data centers and needs active-active replication to ensure failover. To satisfy this requirement, we have to replicate huge volumes of data in real time, which entails limitations on the database and how it is constructed.

 The challenges motivate us! The Play! Framework used in one of the components, for example, means a reactive approach to writing the business logic out of the box and difficulties in implementing functionality. Our logic is sequential – the business is in end-to-end transactions – but a reactive approach implies an event model. Our engineers and architects had to come up with a way to use the advantages of the reactive approach while addressing the business goals of various transactions. Bringing Play together with Spring Security, Spring Integration, Spring Cache Abstraction, Spring Data, and other frameworks was a whole other challenge.

With CMS, we searched long and hard for middle ground between the content management platform and the business code of the front-end application in Angular JS. But a balance was struck, and now everyone knows the business functionality and what possibilities content authors have (content requirements were extremely tough, as different countries require different formats and directions).

pic 6

The product has no simple tasks. Any seemingly simple task becomes extremely difficult in a global context. Caching primary keys in Нibernate is a common ‘out of the box’ feature for improving performance, but with active-active replication it’s not so easy – there are additional requirements for forming primary keys.

We need expertise in many areas: a person managing feature development should know АЕМ and Angular JS, the nuances of Java and its frameworks, Нibernate, database layer Oracle…. It’s nearly impossible to fit this all into one engineer. For reference: the biggest use case concerned the process of sending a shipment; it covered 90 pages and was divided into nearly the same number of sub-use cases with around a thousand stories. What at first glance seems like a simple matter involves a colossal amount of functionality that has to be somehow configured. A separate component was developed for configuration management in every country and every individual user (for the client, the ability to provide individual service to each user is critically important). When there are two million users, this is definitely not a trivial task.

In these conditions, close interaction between Solution Architect and Delivery managers is a must for the entire duration of the project. The high-level architecture was set in the summer of 2015 and has since undergone only iterative changes. It turned out that low level design can’t always be entrusted to team leads – they handle teams where developers create various functionality and handle business tasks. Do all of them really understand what is happening on a global level? Sure, they can handle a story, but not everyone knows how it affects other stories and quality attributes.

This was the task for a Solution Architecture project group. They handled design, complex functional tasks, and quality attributes, but they also had an educational mission, explaining how to implement tasks in a big and demanding ecosystem. The group included tech leads brought in from the market as well as those who had rose through the ranks on the project. A group of engineers wanted to grow in architecture, and I became their mentor in the SA mentoring program and SA School. Our team gradually became a bridge between business and implementation, converting business requirements into technical requirements. Now the customer asks us how any new functionality will affect the system overall.

At this scale, ensuring CI/CD is a tall order – so the SA and DMs created a transparent delivery process. Any change means a long route from development on a local machine to deployment in the environment: after development, developers must cover their code unit with integration tests to make sure it doesn’t break the CI process (there are a lot of developers, and we try to minimize the number of commits that break builds). Code also undergoes statistical analysis in Sonar, review in Gerrit with checks from Sonar, unit tests and build compilations to prevent obvious gaps. After code comes to CI, there is a build of all artifacts and deployment of the system as a whole. Here, there is a full set of integration tests and Smoke end-to-end tests from our automation engineers. If the artifact successfully passes all quality gates (with requirements for quality attributes, security, and performance issues), and SonarQube doesn’t show any new issues, it receives green status and goes to the QA engineers for manual testing and recommendations on regression automation. Only then is the artifact ready for deployment on further environments. We deploy to the customer environment during the first iteration to make sure we aren’t breaking anything, and at the end of every round we form a release candidate for each iteration. One important quality gate is the absence of critical, blocker, and major issues in the code. Conditions for developers are severe, but that’s because we recognize the scale of the system.

We worked for a long time to build communications with the business. With the number of stakeholders and requirements, the client is satisfied only with an ideal solution. At first they weren’t ready to compromise, but eventually we showed them that it’s better to have something that’s stable than everything that doesn’t work. But in the end, we prioritized the requirements and have a stable, working solution.

There were a lot of tasks, and the product is complex and interesting. We always need strong specialists with either strong technical or management skills. Working on such a huge system is a great opportunity to improve your skills.