This article is mainly about using feature flags to enhance delivery stability and speed in a microservice-based SaaS platform. This article combined traffic routing and feature flags to realize “Testing in production” & “Progressive Delivery” to minimize the release risk.
Introduction
Microservice
Microservices — also known as microservice architecture — is an architectural style that structures an application as a collection of services that are: Independently deployable; Loosely coupled; Organized around business capabilities; Owned by a small team; Highly maintainable, and testable. Lots of software products have a microservice-based architecture. It’s more or less a fashion.
Reduce release risk: Testing in Production & Progressive release
To mitigate release stress and reduce release risk, we can take two steps before releasing a new version/feature to the entire public/final audience: Testing in Production and Progressive Delivery.
Testing in production (TIP)
is a software development practice in which new code changes are tested on live user traffic rather than in a staging environment. It is one of the testing practices found in continuous delivery.
Progressive Release
offers an additional strategy to help mitigate the risk of unexpected issues affecting users. They help by making a release available only to a specific percentage of a snap’s user base. This percentage can be initially small, perhaps 10% or 20%, and increases as confidence in a release grows.
Solution — Traffic routing
One solution is using traffic routing to switch between the current and the new release progressively. It often uses a service mesh (App Mesh, Istio, Linkerd, Open Service Mesh) or an ingress controller (Contour, Gloo, NGINX, Skipper, Traefik) for traffic routing.
For example, we have a new version of the microservice “sentence emoji classification algorithm”. We can use a traffic routing solution to deliver progressively from the old algorithm to the latest version.
For example, the flagger.app is an excellent tool for implementing the solution.
Solution — Feature flags
Feature flag
is a modern engineering technology that decouples code deployments from feature releases, giving you control over who sees each feature and when they see it.
With feature flags, you can progressively release or roll back individual features to or from a specific group of users without redeployment. You can toggle a feature on and off to subsets of users.
There are several platforms for feature flag management, such as Launchdarkly, Unleash, FeatBit, etc. In this article, I will use FeatBit as a demo tool in a real-world example because only FeatBit’s open-source version supports all I need in the demonstration.
Combined solution
The above table listed the main difference between traffic split & feature flags.
In many companies, both solutions are adopted to minimize the release risk. They combined two solutions to reduce the release risk. Here are the steps to deliver the new version and features progressively:
- Deploy new version services with new containers.
- Turn off feature flags of new features deployed on the new containers.
- Use route traffic routing solution to deploy the new version in production progressively.
- Turn on new features individually.
- Testing features individually in production.
- Progressive releases new features individually.
Steps 1–3 prevent you from delivering an unexecutable version. Steps 4–6 prevent you from delivering a new feature with a negative effect. Steps 4–6 also prevent you from rolling back the entire new version if only one of the new features encounters the problem (especially when only a small group of users are impacted).
Real-world example
I will use a housekeeping service platform to demonstrate how to implement feature flags to reduce release risk.
Introduction of Housekeeping service platform
The housekeeping service platform (HKSP) is an open-source project developed by a cloud provider developer community. HKSP is a Shopify-like platform for housekeeper services. Housekeeper companies can register on the platform and build their housekeeper “online shop” (which can have its domain name).
Scenario and context
HKOSS has developed a new version which includes some bug fixes, new features, and UI/UX enhancement. The notification module is one of the new features requested by VanveHKC (one of the HKOSS team’s customers). They want to deliver this feature to this customer without the risk that it might impact the customer’s end-users. The new version has changed something:
- Housekeeping staff office and customer center have been updated for feature “notification” and other new features.
- The message service has been added for the “notification module” feature.
- The publish service has been updated for feature “notification” and other new features. The message center needs to call the service provided by the new version of publish service.
- There’re also other changes related to other features. I won’t describe here.
Problem to resolve
- Different new features are delivered to different housekeeping company tenants. The “Notification module” feature will be delivered to the company “VanvesHSK”; other features will be delivered to other companies.
- If one feature encounters a bug, it shouldn’t impact the delivery of other features.
Strategy of delivery
- Step-1. Use feature flags to wrap up new features and turn off new features for all users.
- Step-2. Build new versions and deploy the new version online.
- Step-3. Use traffic routing solutions to progressively transfer the traffic from the old version to the new version. This prevents the new version containers from breaking the existing services.
- Step-4. Turn on the feature flag for the notification module, and let only the QA team and beta users access this feature. Test it in production.
- Step-5. Use a feature flag to deliver the “notification module” to end-users of VanveHKC progressively.
If we see it in detail, there are several solutions we can adopt based on the situation. For the “notification module”, I proposed the solution below:
- Add a feature flag for notification on the web app project. Use traffic routing to switch between the old and the new version progressively.
- No feature flag on the message service project. Because it’s a new independent service.
- No feature flag on the publish service project because it’s a new independent interface. Use traffic routing to switch between the old and the latest version progressively.
Step 1 — Use feature flags to wrap up new features.
1.1 Build your feature flag environment
You can learn how to build the system through FeatBit’s GitHub page.
1.2 Connect a feature flag SDK to projects
We need to use a feature flag to control the notification feature. We need to install the feature flag SDK to your project. Here we demonstrate how to install the feature flag SDK in a front-end application. Run command:
npm install featbit-js-client-sdk
Then we need to initiate SDK with users’ basic information. Here is a screen capture of what it looks like.
- Secret, a key to connect to the remote feature flag server.
- API URL, the address of the remote feature flag server.
- Configurable user information you will use to roll out features.
⚠️ I won’t show detail here because this isn’t a tutorial on how to use FeatBit. If you’re interested in FeatBit, you can visit its GitHub page for more information. FeatBit provides various SDKs for javascript, java, c#, python, go, etc.
1.3 Use the feature flag to wrap up and control the feature code.
In the web app project, find the code which runs the “notification module”, and use a feature flag to wrap up the related code. The image below shows an example:
The front-end app is written with VUE (a framework of javascript/typescript). If the flag code featureStore.flags['notification']
execute and returns true
, the "notification module" runs and is displayed in the front end APP. The "notification module" won't be executed if it returns false
.
1.4 Turn off the feature flag.
Before step 4 of the strategy, the new feature “notification module” shouldn’t be delivered to users when the new version is deployed to production. We need to configure it on the feature flag management platform. As shown in the image below, the feature flag “notification” is turned off, and will return a value false
.
Step 2 — Build a new version and deploy it to the microservice system.
In this demonstration, we must rebuild front-end app
projects, message service
, publish service
, etc. We then deploy the new version to separate containers on the microservice system.
Many Cloud solutions, DevOps tools, and microservice frameworks help us to implement this operation. Most of the methods I used before were registering the Dockerized APP/service images to an online register, then pushing the image to an online container service (like Azure web app, AWS ECS, K8s, etc.).
I won’t describe more about it in this post.
Step 3 — Use traffic routing to deploy the new version progressively.
Once you deploy the new version to the microservice system, you need a framework to manage the traffic so you can switch the traffic routing from the old version to the new version.
You can use a service mesh (App Mesh, Istio, Linkerd, Open Service Mesh) or an ingress controller (Contour, Gloo, NGINX, Skipper, Traefik) to achieve the goal. It depends on the microservices architecture. Many Clouds also provide a traffic routing for microservice.
You can try the Istio/flagger.app combined solution for practice. I won’t describe more about it in this post.
Step 4 — Testing in production
Do you remember the code we wrote in step 1? We need to configure the portal UI to control who sees the feature and when they see it. (user sees the feature only if the flag returns true).
Turn on the feature flag for the notification module, and let only the QA team and beta users see (access) this feature. Test it in production.
- If the user is in the QA group or is one of the beta users for the “notification module”,
featureStore.flags['notification']
should returntrue
. - If not,
featureStore.flags['notification']
returnfalse
4.1 In the Segment page of FeatBit, create a new segment QA team, input the QA members into the “Including users” list in the “Targeting users” tab, then click Review and save。
4.2 On the feature flag list page, choose the “Notification” flag and go to its Targeting page.
- a. Create a customized “Targeting rule”: If the user belongs to the QA segment, then the feature flag
featureStore.flags['notification']
returnstrue
; otherwise returnsfalse
. - b. You can add beta users directly to the true value list in the Individual targeting tab. If the user belongs to this list,
featureStore.flags['notification']
returnstrue
; otherwise returnsfalse
.
4.3 Complete the setting and save it. Below are demonstrations in the video:
⚠️ You can check FeatBit’s official open-source repo to get more information on how to use FeatBit
Step 5 — Deliver the “notification module” to one HKC progressively.
It’s easy to progressively release the “notification module” feature to users. You can change the default rule to serve: Rollout percentage
, then input the percentage of all users who receive a true value. As shown below, we roll out the feature to 10% of users.
5.1 Progressive release in a single tenant
But in this scenario, we won’t release the “notification module” to all end-users progressively because we want to only deliver the new feature to the business customer (a housekeeping service company (HKC )) who requested the feature.
So we need to create a rule that only users under sho.saas-housekeeper.chouldbu
domain can see the "notification module". To release the feature progressively, we need to configure the rule to serve: Rollout percentage
. As shown below, the "notification module" has been delivered progressively only to users under the domain sho.saas-housekeeper.chouldbu
.
5.2 Progressive release by a custom attribute
Service staff in a Housekeeping Company can be organized/grouped by small region (for example, a street). I call it a regional team. The staff users in the same regional team should have the same feature. If we roll out the feature by userKey
, members in one team will struggle in teamwork.
So instead of a percentage rollout identified by userKey
, I can use a customized attribute as a rollout key. In this scenario, I use team-region
as the rollout key.
Now, service staff in the same regional team will have the same feature.
Conclusion
Minimizing delivery risk is always critical for the teams who serve clients seriously. Combining traffic routing and feature flag solutions helps teams significantly reduce delivery risk.
In reality, this combined solution will also help you to ship new features faster. Because you’re fearless in shipping new features, the risk is under control. This allows you significantly innovate faster and increase your business income.
To minimize the delivery risk, we still have a lot of work to do. I will try to write more articles about this topic.
I hope this article is helpful for you in practice.
Leave your comment here or join our GitHub repo (https://githuib.com/featbit/featbit) to keep in contact.
If you find this helpful article, please Star us.