Feature Toggles

Observations FROM using feature toggles in The Real-World

To build reliable Software Quickly, we must separate shipping code from Delivering Features.

What Are Feature Toggles?

Feature Toggles (aka Feature Flags) is a practice that encourages the deployment of code with functionality disabled or inactive. Turning toggles on is a secondary activity that only happens after the system as a whole is ready.

 

The benefit of this practice is that it allows Engineering teams to focus on continuously shipping code while maintaining production integrity.

ACtual Benefits

The benefits of feature toggles far exceed the textbook theory. Here are some examples of situations where toggles helped me personally.

 

  • Pre-release testing found a new feature wasn't working correctly.
    • We just turned it off and the release continued as planned. Fixed in the next release.
  • A capability required changes in more than one service, but one of them wasn't ready in time for a coordinated release.
    • We deployed the ready services with the new capabilities disabled.
  • We changed how production worked, and we broke it in unforeseen ways.
    • We turned the toggle off, and production is back to normal.
  • Testing teams didn't have time to test new functionality before we pushed the release to production.
    • Deployed with it off, tested with the toggle on later when there was more time.

The 2 (really 3) types of Toggles

There are primarily 2 3 types of Feature Toggles. Each with its own benefits and drawbacks. Some allowing for more fine grain control than others.

  • Request/Transaction Level*
  • User Level
  • System Level

*Request Level parameters can be debated on if they are really a toggle or not. Either way, we can use them in the same way.

Request Level Toggles

Request Level toggles are oftentimes not the standard true/false type of toggle you'd expect. Instead, these are parameters within a request that determine how applications should route and process the request.

Example: New functionality can be rolled out to Mobile users first using HTTP headers. When processing the request, use the header to identify the request is from a mobile client. 

Request Level toggles are great for controlling when functionality is turned on but often harder to control when needing to turn functionality off (unless you control both client & server).

User Level Toggles

Most systems have some form of User, and often along with a User, is a "profile". A very powerful method of implementing toggles is to do so at a User Profile level. This allows for Engineering teams to enable functionality at a very fine grain. Either at a User or User Group level.

Examples of using User Level Toggles:

  • Enabling a new feature for internal users with an internal-users group
  • Allowing users to opt-in or opt-out of features via their own settings
  • Roll out a new feature to a single user, monitoring the user for issues
  • Enabling a new feature for groups with unique usage patterns (I.E., Premium vs General users)

System Level Toggles

Many talks/articles on Feature Flags talk about System Level Toggles. An on or off switch that controls features across the system as a whole. These toggles manifest in many ways, from dedicated services offering Feature Flags as a Service to specific Libraries or even Config parameters.

 

System-level toggles can be very powerful when used correctly, but the level of control is much less compared to User level toggles.

Toggles by Example (MicroServices Edition)

Orchestrator

New Service

Existing Service

Use Case: We have an existing backend flow between our Orchestrator and existing services. But we need to introduce a new service which will process the transaction before existing services.

User Level: We can roll out this new service to individual or groups of users by adding a User Toggle to the user's profile. Requests from these users will go through the new service, and other users will skip the new service.

System Level: With System Toggles, the code to call the new service can be added to the Orchestrator before the new service even exists. Once ready, we can turn the new service on or off with system toggles.

[Toggle On/Off]

Toggles by Example (Code Edition)

Use Case: We have an existing service that must process requests in a new way.

User Level: With User Level toggles, only requests from specific users with the toggle enabled will execute the new logic.

System Level: When the System Level toggle is enabled, every request will process using the new logic. 

func SomeFunc(...) {
	if user.Toggles("new-feature") {
		// do new work and get 
		// out of here
		return
	}

// do old work

}
func SomeFunc(...) {
	if cfg.Toggles("new-feature") {
		// do new work and get 
		// out of here
		return
	}

// do old work

}

Should I use USer or System Toggles?

As a platform which type of toggle should I use? Which is best for my needs? The answer is all of them. As a platform, you will get the most value out of using each of these three toggle types.

When to use the various toggles:

  • Request Level - Advantageous, but remember, rarely can you turn them off after they turned on (unless you own client & server).

  • User Level - Highly encouraged; use this whenever possible. This should be your default approach.

  • System-Level - Sometimes necessary, Use this when User Level doesn't make sense.

Effectively Using Toggles

Using Toggles for Development

In an earlier example, we discussed adding a new service to an existing flow. In this example, Team A (client) needs to call a service managed by Team B (server). Team B needs to add a new API endpoint for this new functionality that will take 4 sprints; Team A needs to add a call to the new API endpoint, which will take less than 1 sprint. 

Without Toggles, Team A can either write the code now, leaving the code unmerged and getting stale (or worse, merged and undeployed), or they can wait until Team B is done. Both options taking much coordination between the teams and leaving Team A's workload dependant on Team B.

With Toggles, Team A can write the code now, deploy it turned off, and wait until Team B is ready to test and enable the call. Team A can work on and deploy other features while Team B completes its task.

Using Toggles to Iterate and IMprove features

Beta Users have been a concept used in software engineering for a long time. GitHub is an example of a company that uses toggles to allow users to opt-in to Beta Features. With the understanding that these features may not be perfect, and feedback is welcomed.

Without Toggles, Any feature deployed to production is available to all users. Users expect these features to be final; any iterations can't change the fundamental user experience without creating issues for users.

With Toggles, Teams can deploy new features to users with the understanding of Beta testing. Users understand that functionality or user experience may change. Oftentimes, Beta Users are happy to provide valuable feedback, which can make the official feature roll out even more robust.

Testing With Toggles

As you incorporate toggles into your codebase, it is important to test code behavior with and without toggles turned on.

func TestMyAPI(t *testing.T) {
	t.Run("Test Default Behavior", func(t *testing.T) {
		// test codes default behavior
	})

	user.SetToggle("sometoggle", true)

	t.Run("Test New Behavior", func(t *testing.T) {
		// test codes new behavior
	})
}

When Developing with Toggles:

  • Toggle default values are false; false behavior must always be acceptable.
  • Test code paths both with and without the toggles turned on.
  • Testing with toggles applies for both functional and unit style testing.
  • Follow TDD style patterns, add a toggle, run tests, then add logic.

Managing Toggles

But what about managing toggles? We can't use toggles because we will eventually have hundreds of toggles to manage.

Yeah... Like anything else, Feature Toggles are Technical Debt, and Technical Debt needs to be managed. Eventually, you have to clean up and remove old toggles.

 

But... You have controlled rollouts and a reduced likelihood of breaking production with each code deployment. These benefits far outweigh the inconvenience of cleaning up toggles every so often.

Long Live Toggles

Not all toggles need to be removed. Ops teams can use toggles to control system behavior during production issues.

 

Our microservices example is a prime case for using toggles as controls for the platform. If our New Service fails for some reason and the Orchestrator takes a long time to timeout requests. We can use our original toggle to disable the new service until it is fixed.

Orchestrator

New Service

Existing Service

[Toggle On/Off]

Feature Toggle Best practices

  • Keep Toggles True or False; it makes them easier to remove later.

  • Build the concept of enabling user features into your user management system. UI's to manage user features and functions to check users' feature status makes this even easier to use.

  • When possible, create user groups that allow you to turn functionality on groups at a time.

  • The default value of toggles is False. When checking values of toggles, ensure that the default execution is always acceptable.

  • Not everything requires a feature toggle. Don't add toggles just because, but core logic and breaking changes should always be in a toggle.

  • Testing (especially functional testing) should happen with toggles on & off.

  • Make sure your unit tests cover toggles on and off as well; mocks are your friend.

  • Sometimes, you may want to add a toggle purely for Ops purposes. It's fine, do it.

EOF

Benjamin Cane

Like this talk? Check out my some of my other talks: