So without further to-do, here are some central concepts of systems engineering that software developers should absolutely know. At the end of the day, they help you write better code.
Your software or code is only efficient if it can work in harmony with the system underneath.
The non-prod environments where the software gets built could be vastly different from how the production machines are set up. Your code needs to consider these beforehand. Although Docker alleviates some of the problems, it is still crucial to be aware of the following:
1.1 Disk I/O
If your software writes or reads from the filesystem, be aware of the fs type and its underlying disk I/O limitations. For instance, if your disk is an NFS mount, the I/O operations will be relatively slow.
As a safe rule, always consider batching requests for I/O operations.
1.2 Memory allocation
How much of the OS’s physical memory will the app use? Depending on the use case and hardware resources available to you, work out if you need to run it as a standalone application or a microservice.
This will determine how many operations you can concurrently perform in your code. Your
I/O buffer size,and
in-memory caching all depend on this.
1.3 Handling external connections
Most modern apps work by exposing HTTP APIs. The webserver that handles this needs to have limits set on important parameters, like
At a much lower level, the OS kernel tracks connection states and logical flows. For instance, on Linux, connection tracking is done by
conntrack, which defines connection limits. On RHEL, you can check it this way:
$ sysctl net.netfilter.nf_conntrack_count net.netfilter.nf_conntrack_maxnet.netfilter.nf_conntrack_count = 167012
net.netfilter.nf_conntrack_max = 262144
1.4 OS signals
The operating system issues various types of signals to running processes in order to maintain overall system stability.
The most important ones are
SIGHUP, which signal processes to terminate in the event of any failure. These signals give your code a chance to gracefully shut down and handle any critical in-progress transaction.
Failing to handle these signals is often the root cause of problems like stale file handles, ghost processes, memory leaks, and data corruption.
While deploying your application into integrated environments, there are multiple factors that will determine connectivity to your applications.
2.1 Network policies
Understand key concepts such as IP subnets, routing policies, and firewall and DNS rules. These determine external connectivity to and from your application.
2.2 SSL certificates
Your application might need client certs to be installed on the machine to connect and trust remote HTTPS endpoints. Or if you are a server, you might require a server cert signed by your organization’s root CA.
These are usually enforced by the security team in your organization. Be aware of these and evaluate what your software might need beforehand.
If your application runs on Docker, these certificates will have to be loaded into your container.
The build process and deployment of your software involves repetitive tasks that can be automated using modern tools like Jenkins and Travis CI. These are open source CICD (continuous deployment and continuous integration) tools that work by creating pipelines to orchestrate your build and deploy events.
The goal is to be able to check in your code changes and have a system automatically trigger a build, run tests, and deploy into integrated environment seamlessly. This significantly reduces time-to-market costs and increases the overall productivity of the organization.
Where does your code run?
Learn about the infrastructure landscape that your app runs on. This can greatly improve your software code and design.
For high availability, modern application deployments are often spread across multiple geolocations or regions. This means that you will need to design your software to support such an architecture to run as active-active across datacenters.
4.2 Serverless architecture
The serverless architecture allows your application code to be run as functions in the cloud without having to provision servers and virtual machines for it. For instance, instead of a new microservice, your use case might be more suited to run as Lamda functions on AWS (or cloud functions on GCP).
4.3 Load balancers
LBs (load balancers) play a critical role in determining how external traffic is routed to your applications. They are usually in front of all APIs to distribute the load across your app instances.
But to perform load balancing effectively, they require a health-check endpoint (for example,
https://my-app-1:5601/health) to be exposed by your application. This is to assess if an instance is dead or alive. Be sure to include a quick and efficient health-check logic for your app.
The two most common types of load balancing algorithms are:
- Round robin — works by spreading requests across all workloads
- Least connections — works by forwarding requests to instances that have fewer open connections. This way, if your application instance is swamped with load, it’s unlikely to receive a new request.