Page 2 of 4

Introducing Azure AD – Understanding User Authentication

Azure AD is a cloud-based mechanism that provides the tools to address our security needs. Backed by Microsoft AD, an industry-standard and, importantly, proven secure authentication andauthorization system, it gives both cloud-first (that is, stored and managed entirely in the cloud) and hybrid (a mix of cloud and on-premises) solutions.

Some of these tools are included by default when you create an Azure AD tenant. Others require a Premium add-on, which we will cover later.

These tools include the following:

  • Self-service password resets: Allowing your users to reset their passwords themselves (through the provision of additional security measures) without needing to call the helpdesk.
  • MFA: MFA enforces a second form of identification during the authentication process—a code is generated and sent to the user, and this is entered along with the password. The code is typically sent to a user’s device as either a text message or an MFA authentication app on their mobile device.
  • You can also use biometric devices such as fingerprint or face scanners.
  • Hybrid integration with password writebacks: When Azure AD is synchronized to an on-premises AD with AD Connect, changes to the user’s password in Azure AD is sent back to the on-premises AD to ensure the directories remain in sync.
  • Password protection policies: Policies in Azure can be set to enforce complex passwords or the period between password changes. These policies can be integrated with on-premises directories to ensure consistency.
  • Passwordless authentication: For many organizations, the desire to remove the need for passwords altogether in favor of alternative methods is seen as the ultimate solution to many authentication issues. Credentials are provided through the use of biometrics or a FIDO2 security key. These cannot be easily duplicated, and this removes the need for remembering complex passwords.
  • Single sign-on (SSO): With SSO, users only need to authenticate once to access all their applications—regardless of whether they sign on through their on-premises directory or Azure AD, the single authentication process should identify the user across different environments.
  • CA: To further tighten security, CA policies can provide further restrictions to user sign-in, or when different rules may apply. For example, MFA can be set not to be required when signing in from specific Internet Protocol (IP) ranges, such as a corporate network range.

Differentiating authentication from authorization – Understanding User Authentication

User security is perhaps one of the most critical aspects of a system and, therefore, its architecture. Security has, of course, always been important to protect sensitive information within an organization. However, as we move our applications online and widen our audience, the need to ensure only the correct people gain access to their data has become crucial.

In this chapter, we explore the key differences between authentication and authorization, what tooling we have available within Azure to ensure the safety of user accounts, and how we design solutions according to different business needs.

In this chapter, we will examine the following topics:

  • Differentiating authentication from authorization
  • Introducing Active Directory (AD)
  • Integrating AD
  • Understanding Conditional Access (CA), Multi-Factor Authentication (MFA), and security defaults
  • Using external identities

Differentiating authentication from authorization

A significant and essential role of any platform is that of authentication and authorization. These two terms are often confused and combined as a single entity. When understanding security on platforms such as Azure, it’s vital to know how the different technologies are used.

Authentication is the act of proving who you are, often performed with a username/password combination. If you can provide the correct details, a system authenticates you.

Authentication does not give you access to anything; it merely proves who you are.

Once a system knows the who, it then checks to see what you have access to—this is termed authorization.

In Azure, authorization is the act of checking whether you have access to a particular resource such as a storage account, and what actions you can perform, such as creating, deleting, modifying, or even reading the data in the storage account.

Because of the number of different services and their associated actions that are available to a user in Azure, and the importance of ensuring the validity of a user, the ensuing mechanisms that control all this can become quite complicated.

Luckily, Azure provides a range of services, broken down into authentication and authorization services, that enable you to strictly control how users authenticate and what they can then access, in a very granular way.

Traditionally, authentication has been via simple username/password combinations; however, this is ineffective on its own, and therefore you need to consider many factors and strategies when designing an authentication mechanism. For example, the following scenarios may apply:

  • A user may choose too simple a password, increasing the chances of it being compromised.
  • Complex passwords or regular changes mean users are more likely to forget their password.
  • There may be delays in the authentication process if a user needs to call a helpdesk to request a password reset.
  • A username/password combination itself is open to phishing attacks.
  • Password databases can be compromised.

Important note

A phishing attack is an action whereby a malicious person will attempt to steal your password by sending you to a dummy website that looks like the one you want to access but is, in fact, their site. You enter your details, thinking it is the correct site, and now they have your personal information and can then use this to log in to the real site.

When systems are hosted on a physically isolated network, some of these issues are mitigated as you first need physical access to a building or at least a device set up with a virtual private network (VPN) connection that, in turn, would require a certificate.

However, in cloud scenarios, and especially hybrid systems, whereby you need external authentication mechanisms that must also map or sync to internal systems, this physical firewall cannot always be achieved.

With these scenarios in mind, we need to consider how we might address the following:

  • Managing and enforcing password complexity rules
  • Providing additional layers over and above a password
  • How to securely store and protect passwords

Now that we understand some of the issues we face with authentication systems, especially those that rely on username/password combinations, we can investigate what options are available to mitigate them. First, we will examine Microsoft’s established security platform, AD.

Network monitoring – Principles of Modern Architecture

CPU and RAM utilization are not the only source of problems; problems can also arise from misconfigured firewalls and routing, or misbehaving services causing too much traffic.

Traffic analytics tools will provide an overview of the networks in the solution and help identify sources that generate high traffic levels. Network performance managers offer tools that allow you to create specific tests between two endpoints to investigate particular issues.

For hybrid environments, VPN meters specifically monitor your direct connection links to your on-premises networks.

Monitoring for DevOps and applications

For solutions with well-integrated DevOps code libraries and deployment pipelines, additional metrics and alerts will notify you of failed builds and deployments. Information, support tickets, or work tasks can be automatically raised and linked to the affected build.

Additional application-specific monitoring tools allow for an in-depth analysis of your application’s overall health, and again will help with troubleshooting problems.

Application maps, artificial intelligence (AI)-driven smart detection, usage analytics, and component communications can all be included in your designs to help drive operational efficiencies and warn of future problems.

We can see that for every aspect of your solution design—security, resilience, performance, and deployments—an effective monitoring and alerting regime is vital to ensure the platform’s ongoing health. With proper forethought, issues can be prevented before they happen. Forecasting and planning can be based on intelligent extrapolation rather than guesswork, and responding to failure events becomes a science instead of an art.

Summary

In this chapter, we looked at a high-level view of the architecture and the types of decisions that must be considered, agreed upon, and documented.

By thinking about how we might design for security, resilience, performance, and deployment and monitor all our systems, we get a greater understanding of our solution as a whole.

The last point is important—although a system design must contain the individual components, they must all work together as a single, seamless solution.

In the next chapter, we will look at the different tools and patterns we can use in Azure to build great applications that align with best-practice principles.

Architecting for monitoring and operations – Principles of Modern Architecture

For the topics we have covered in this chapter to be effective, we must continually monitor all aspects of our system. From security to resilience and performance, we must know what is happening at all times.

Monitoring for security

Maintaining the security of a solution requires a monitoring solution that can detect, respond, and ultimately recover from incidents. When an attack happens, the speed at which we respond will determine how much damage is incurred.

However, a monitoring solution needs to be intelligent enough to prioritize and filter false positives.

Azure provides several different monitoring mechanisms in general and, specifically, in terms of security, and can be configured according to your organization’s capabilities. Therefore, when designing a monitoring solution, you must align with your company’s existing teams to effectively direct and alert appropriately, and send pertinent information as required.

Monitoring requirements cover more than just alerts; the policies that define business requirements around configuration settings such as encryption, passwords, and allowed resources must be checked to confirm they are being adhered to. The Azure risk and compliance reports will highlight any items that deviate so that the necessary team can investigate and remediate.

Other tools, such as Azure Security Center, will continually monitor your risk profile and suggest advice on improving your security posture.

Finally, security patching reports also need regular reviews to ensure VMs are being patched so that insecure hosts can be investigated and brought in line.

Monitoring for resilience

Monitoring your solution is not just about being alerted to any issues; the ideal scenario is to detect and remediate problems before they occur—in other words, we can use it as an early warning system.

Applications should include in their designs the ability to output relevant logs and errors; this then enables health alerts to be set up that, when combined with resource thresholds, provide details of the running processes.

Next, a set of baselines can be created that identify what a healthy system looks like. When anomalies occur, such as long-running processes or specific error logs, they are spotted earlier.

As well as defined alerts that will proactively contact administrators when possible issues are detected, visualization dashboards and reporting can also help responsible teams see potential problems or irregular readings as part of their daily checks.

Monitoring for performance

The same CPU, RAM, and input/output (I/O) thresholds used for early warning signs of errors also help identify performance issues. By monitoring response times and resource usage over time, you can understand usage patterns and predict when more power is required.

Performance statistics can either manually set scaling events through the use of schedules or set automated scaling rules more accurately.

Keeping track of scaling events throughout the life cycle of an application is useful. If an application is continually scaling up and down or not scaling at all, it could indicate that thresholds are set incorrectly.

Again, creating and updating baseline metrics will help alert you to potential issues. If resources for a particular service are steadily increasing over time, this information can predict future bottlenecks.

Architecting for deployment – Principles of Modern Architecture

One area of IT solutions in which the cloud has had a dramatic impact is around deployment. Traditional system builds, certainly at the infrastructure level, were mostly manual in their process. Engineers would run through a series of instructions then build and configure the underlying hosting platform, followed by another set of instructions for deploying the software on top.

Manual methods are error-prone because instructions can be misunderstood or implemented wrongly. Validating a deployment is also a complicated process as it would involve walking back through an installation guide, cross-checking the various configurations.

Software deployments led the way on this with automated mechanisms that are scripted, which means they can be repeated time and time again consistently—in other words, we remove the human element.

We can define our infrastructure in code within Azure, too, using either Azure Resource Manager (ARM) templates or other third-party tools; the entire platform can be codified and deployed by automated systems.

The ability to consistently deploy and re-deploy in a consistent manner gives rise to some additional opportunities. Infrastructure as Code (IaC) enables another paradigm—immutable infrastructure.

Traditionally, when modifications are required to the server’s configuration, the process would be to manually make the configuration on the server and record the change in the build documentation. With immutable infrastructure, any modifications are made to the deployment code, and then the server is re-deployed. In other words, the server never changes; it is immutable. Instead, it is destroyed and recreated with the new configuration.

IaC and immutable infrastructure have an impact on our designs. PaaS components are more straightforward to automate than IaaS ones. That is not to say you can’t automate IaaS components; however, PaaS’s management does tend to be simpler. Although not a reason to use PaaS in its own right, it does provide yet one more reason to use technologies such as web apps over VMs running Internet Information Services (IIS).

You also need to consider which deployment tooling you will use. Again, Microsoft has its own native solution in the form of Azure DevOps; however, there are other third-party options. Whichever you choose will have some impact around connectivity and any additional agents and tools you use.

For example, most DevOps rules require some form of deployment agent to pull your code from a repository. Connectivity between the repository, the agent, and the Azure platform is required and must be established in a secure and resilient manner.

Because IaC and DevOps make deployments quicker and more consistent, it is easier to build different environments—development, testing, staging, and production. Solution changes progress through each environment and can be checked and signed off by various parties, thus creating a quality culture—as per the example in the following diagram:

Figure 2.5 – Example DevOps flow

The ability to codify and deploy complete solutions at the click of a button broadens the scope of your solution. An entire application environment can be encapsulated and deployed multiple times; this, in turn, provides the opportunity to create various single-tenant solutions instead of a one-off multi-tenant solution. This aspect is becoming increasingly valuable to organizations as it allows for better separation of data between customers.

In this section, we have introduced how deployment mechanisms can change what our end-state solution looks like, which impacts the architecture. Next, we will look in more detail at how monitoring and operations help keep our system healthy and secure.

Architecting for performance – Principles of Modern Architecture

As we have already seen, resilience can be closely linked to performance. If a system is overloaded, it will either impact the user experience or, in the worst case, fail altogether.

Ensuring a performant solution is more than just increasing resources; how our system is built can directly impact the options available and how efficient they are.

Breaking applications down into smaller discrete components not only makes our solution more manageable but also allows us to increase resources just where they are needed. If we wish to scale in a monolithic, single-server environment, our only option is to add more random-access memory (RAM) and CPU to the entire system. As we decompose ourapplications and head toward a microservices pattern whereby individual services are hosted independently, we can apportion additional resources where needed, thus increasing performance efficiently.

When we need to scale components, we have two options: the first is to scale up—add more CPU and RAM; the second option is to scale out—deploy additional instances of our services behind a load balancer, as per the example in the following diagram:

Figure 2.3 – Scale-out: identical web servers behind a load balancer

Again, our choice of the underlying technology is important here—virtual servers can be scaled up or out relatively quickly, and with scale, sets can be dynamic. However, virtual servers are slower to scale since a new machine must be imaged, loaded, and added to the load balancer. With containers and PaaS options such as Azure Web Apps, this is much more lightweight and far easier to set up; containers are exceptionally efficient from a resource usage perspective.

We can also decide what triggers a scaling event; services can be set to scale in response to demand—as more requests come in, we can increase resources as required and remove them again when idle. Alternatively, we may wish to scale to a schedule—this helps control costs but requires us to already know the periods when we need more power.

An important design aspect to understand is that it is generally more efficient to scale out than up; however, to take advantage of such technologies, our applications need to avoid client affinity.

Client affinity is a scenario whereby the service processing a request is tied to the client; that is, it needs to remember state information for that client from one request to another. In a system built from multiple backend hosts, the actual host performing the work may change between requests.

Particular types of functions can often cause bottlenecks—for example, processing large volumes of data for a report, or actions that must contact external systems such as sending emails. Instead of building these tasks as synchronous activities, consider using queuing mechanisms instead. As in the example in the following diagram, requests by the User are placed in a Job Queue and control is released back to the User. A separate service processes the job that was placed in the Job Queue and updates the User once complete:

Figure 2.4 – Messaging/queueing architectures

Decoupling services in this fashion gives the perception of a more responsive system and reduces the number of resources to service the request. Scaling patterns can now be based on the number of items in a queue rather than an immediate load, which is more efficient.

By thinking about systems as individual components and how those components respond—either directly or indirectly—your solution can be built to not just scale, but to scale in the most efficient manner, thereby saving costs without sacrificing the user experience.

In this section, we have examined how the right architecture can impact our solution’s ability to scale and perform in response to demand. Next, we will look at how we ensure these design considerations are carried through into the deployment phase.

Using architectural best practices – Principles of Modern Architecture

Through years of research and experience, vendors such as Microsoft have collected a set of best practices that provide a solid framework for good architecture when followed.

With the business requirements in mind, we can perform a Failure Model Analysis (FMA). An FMA is a process for identifying common types of failures and where they might appear in our application.

From the FMA, we can then start to create a redundancy and scalability plan; designing with scalability in mind helps build a resilient solution and a performant one, as technologies that allow us to scale also protect us from failure.

A load balancer is a powerful tool for achieving scale and resilience. This allows us to build multiple service copies and then distribute the load between them, with unhealthy nodes being automatically removed.

Consider the cost implications of any choices. As mentioned previously, we need to balance the cost of downtime versus the cost of providing protection. This, in turn, may impact decisions between the use of Infrastructure-as-a-Service (IaaS) components such as VMs or Platform-as-a-Service (PaaS) technologies such as web apps, functions, and containers. Using VMs in our solution means we must build out load balancing farms manually, which are challenging to scale, and demand that components such as load balancers be explicitly included. Opting for managed services such as Azure Web Apps or Azure Functions can be cheaper and far more dynamic, with load-balancing and auto-scaling technologies built in.

Data needs to be managed effectively, and there are multiple options for providing resilience and backup. Replication strategies involving geographically dispersed copies provide the best RPO as the data is always consistent, but this comes at a financial cost.

For less critical data or information that does not change often, daily backup tools that are cheaper may suffice, but these require manual intervention in the event of a failure.

A well-defined set of requirements and adherence to best practices will help design a robust solution, but regular testing should also be performed to ensure the correct choices have been made.

Testing and disaster recovery plans

A good architecture defines a blueprint for your solution, but it is only theory until it is built; therefore, solutions need to be tested to validate our design choices.

Work through the identified areas of concern and then forcefully attempt to break them. Document and run through simulations that trigger the danger points we are trying to protect.

Perform failover and failback tests to ensure that the application behaves as it should, and that data loss is within allowable tolerances.

Build test probes and monitoring systems to continually check for possible issues and to alert you to failed components so that these can be further investigated.

Always prepare for the worst—create a disaster recovery plan to detail how you would recover from complete system failure or loss, and then regularly run through that plan to ensure its integrity.

We have seen how a well-architected solution, combined with robust testing and detailed recovery plans, will prepare you for the worst outcomes. Next, we will look at a closely related aspect of design—performance.

Architecting for resilience and business continuity – Principles of Modern Architecture

Keeping your applications running can be important for different reasons. Depending on your solution’s nature, downtime can range from a loss of productivity to direct financial loss. Building systems that can withstand some form of failure has always been a critical aspect of architecture, and with the cloud, there are more options available to us.

Building resilient solutions comes at a cost; therefore, you need to balance the cost of an outage against the cost of preventing it.

High Availability (HA) is the traditional option and essentially involves doubling up on components so that if one fails, the other automatically takes over. An example might be a database server—building two or more nodes in a cluster with data replication between them protects against one of those servers failing as traffic would be redirected to the secondary replica in the event of a failure, as per the example in the following diagram:

Figure 2.2 – Highly available database servers

However, multiple servers are always powered on, which in turn means increased cost. Quite often, the additional hardware is not used except in the event of a failure.

For some applications, this additional cost is less than the cost of a potential failure, but it may be more cost-effective for less critical systems to have them unavailable for a short time. In such cases, our design must attempt to reduce how long it takes to recover.

The purpose of HA is to reduce the Mean Time Between Failures (MTBF). In contrast, the alternative is to reduce the Mean Time To Recovery (MTTR)—in other words, rather than concentrating on preventing outages, spend resources on reducing the impact and speeding up recovery from an outage. Ultimately, it is the business who must decide which of these is the most important, and therefore the first step is to define their requirements.

Defining requirements

When working with a business to understand their needs for a particular solution, you need to consider many aspects of how this might impact your design.

Identifying individual workloads is the first step—what are the individual tasks that are performed, and where do they happen? How does data flow around your system?

For each of these components, look for what failure would mean to them—would it cause the system as a whole to fail or merely disrupt a non-essential task? The act of calculating costs during a transactional process is critical, whereas sending a confirmation email could withstand a delay or even complete failure in some cases.

Understand the usage patterns. For example, a global e-commerce site will be used 24/7, whereas a tax calculation service would be used most at particular times of the year or at the month-end.

The business will need to advise on two important metrics—the Recovery Time Objective (RTO) and the Recovery Point Objective (RPO). The RTO dictates an acceptable amount of time a system can be offline, whereas the RPO determines the acceptable amount of data loss. For example, a daily backup might mean you lose up to a day’s worth of data; if this is not acceptable, more frequent backups are required.

Non-functional requirements such as these will help define our solution’s design, which we can use to build our architecture with industry best practices.

Defense-in-Depth – Principles of Modern Architecture

Modern solutions, especially those built in the cloud using microservice patterns, will be made from many components. Although these provide excellent power and flexibility, they also offer numerous attack points.

Therefore, when considering our security controls, we need to consider multiple layers of defense. Always assume that your primary measures will fail, and ensure you have backup controls in place as well.

Known as Defense-In-Depth (DID), an example would be data protection in a database that serves an e-commerce website. Enforcing an authentication mechanism on a database might be your primary control, but you need to consider how to protect your application if those credentials are compromised. An example of a multilayer implementation might include (but not be limited to) the following:

  • Network segregation between the database and web app
  • Firewalls to only allow access from the web app
  • TDE on the database
  • Field-level encryption on sensitive data (for example, credit card numbers; passwords)
  • A WAF

The following diagram shows an example of a multilayer implementation:

Figure 2.1 – Multiple-layer protection example

We have covered many different technical layers that we can use to protect our services, but it is equally important to consider the human element, as this is often the first point of entry for hacks.

User education

Many attacks originate from either a phishing/email attack or social data harvesting.

With this and an excellent technical defense in mind, a solid education plan is also an invaluable way to prevent attacks at the beginning.

Training users in acceptable online practices that help prevent them from leaking important information, and therefore any plan, should include the following:

  • Social media data harvesting: Social media platforms are a gold mine for hackers; users rarely consider the information they routinely supply could be used to access password-protected systems. Birth dates, geographical locations, even pet names and relationship information are routinely supplied and advertised—all of which are often used as security questions when confirming your identity through security questions, and so on.

Some platforms present quizzes and games that again ask questions that answer common security challenges.

  • Phishing emails: A typical exploit is to send an email stating that an account has been suspended or a new payment has been made. A link will direct the user to a fake website that looks identical to an official site, so they enter their login details, which are then logged by the hacker. These details can not only be used to access the targeted site in question but can also obtain additional information such as an address, contact information, and, as stated previously, answers to common security questions.
  • Password policies: Many people reuse the same password. If one system is successfully hacked, that same password can then be used across other sites. Educating users in password managers or the dangers of password reuse can protect your platform against such exploits.

This section has looked at the importance of design security throughout our solutions, from understanding how and why we may be attacked, to common defenses across different layers. Perhaps the critical point is that good design should include multiple layers of protection across your applications.

Next, we will look at how we can protect our systems against failure—this could be hardware, software, or network failure, or even an attack.

Patching – Principles of Modern Architecture

When working with virtual machines (VMs), you are responsible for managing the operating system that runs on them, and attackers can seek to exploit known vulnerabilities in that code.

Regular and timely patching and security updates with anti-virus and anti-malware agents are the best line of defense against this. Therefore, your solution design needs to include processes and tools for checking, testing, and applying updates.

Of course, it is not just third-party code operating systems that are susceptible; your application code is vulnerable too.

Application code

Most cloud services run custom code, in the form of web apps or backend application programming interface (API) services. Hackers often look for programming errors that can open holes in the application. As with other forms of protection, multiple options can be included in your architecture, and some are listed here:

  • Coding techniques: Breaking code into smaller, individually deployed components and employing good development practices such as Test-Driven Design (TDD), paired programming, or code reviews can help ensure code is cleaner and error-free.
  • Code scanners: Code can be scanned before deployment to check for known security problems, either accidental or malicious, as part of a deployment pipeline.
  • Web application firewalls (WAFs): Unlike layer 3 or 4 firewalls that block access based on Internet Protocol (IP) or protocol, WAFs inspect network packet contents, looking for arbitrary code or common exploits such as SQL injection attacks.

Application-level security controls help protect you against code-level exploits; however, new vulnerabilities are uncovered daily, so you still need to prepare for the eventuality of a hacker gaining data access.

Data encryption

If the data you hold is sensitive or valuable, you should plan for the eventuality that your security controls are bypassed by making that data impossible to read. Encryption will achieve this; however, there are multiple levels you can apply. Each level makes your information more secure, but at the cost of performance.

Encryption strategies should be planned carefully—standard encryption at rest is lightweight but provides a basic protection level and should be used for all data as standard.

For more sensitive data such as credit card numbers, personal details, passwords, and so on, additional levels can be applied. Examples of how and where we can apply controls are given here:

  • Databases: Many databases now support Transparent Data Encryption (TDE), whereby the data is encrypted. Applied by the database engine itself, consuming applications are unaware and therefore do not need to be modified.
  • Database fields: Some databases provide field-level encryption that can be applied by the database engine itself or via client software. Again, this can be transparent from a code point of view but may involve additional client software.
  • Applications: Applications themselves can be built to encrypt and decrypt data before it is even sent to the database. Thus, the database is unaware of the encryption, but the client must be built specifically to perform this.
  • Transport: Data can be encrypted when transferring between application components. HyperText Transfer Protocol Secure (HTTPS) using Secure Sockets Layer (SSL) certificates is the most commonly known for end-user websites, but communications between elements such as APIs should also be protected. Other transport layer encryption is also available—for example, SQL database connections or file shares.

Data can be encrypted using either string keys or, preferably, certificates. When using certificates, many cloud vendors, including Azure, offer either managed or customer-supplied keys. With managed keys, the cloud vendors generate, store, and rotate the certificates for you, whereas with customer-supplied keys, you are responsible for obtaining and managing them.

Keys, secrets, and certificates should always be stored in a suitably secure container such as a key vault, with access explicitly granted to the users or services that need them, and access being logged.

As with other security concerns, the variability and ranges of choices mean that you must carefully plan your encryption techniques.

On their own, each control can provide some protection; however, to give your solution the best defense, you need to implement multiple tactics.

« Older posts Newer posts »