Why flags?
How does it work?
Tools
Types of feature toggles
Trade-offs and Challenges
Resources
Changelog
References

Feature toggles

Last updated Feb 7, 2026 Published May 8, 2024

The content here is under the Attribution 4.0 International (CC BY 4.0) license

Home / Software engineering / Feature toggles

Feature flags have become a standard practice for teams that want to deliver software fast into the hands of the users without disruption. With this approach, new challenges and ways of working come into play when writing code.

Pete Hodgson’s article, published on Martin Fowler’s blog, provides an introduction to feature toggle types and implementation patterns. We’ll use this taxonomy alongside the Feature Flag Best Practices handbook by Hodgson and Echagüe to explore the subject (Hodgson, 2017).

Why flags?

The first and foremost benefit of flags is the decoupling of deployment and release of a feature. For many software development teams, the release of a new functionality implies not only publicly opening it to their users, but it is also a stressful moment for the entire team, from the moment that the code is pushed until it gets into the users’ hands.

This model is all-in or nothing. If the feature that was released needs to be reverted, a new version needs to be deployed again. If it happens that a version contains an error or causes an unplanned outage in production, the team needs to roll back what was deployed code-wise. If things are related to data, it becomes even more complex. This scenario was described by Kent Beck as a “D-day”.

In my experience, D-day is often related with git-flow as a standard practice. With git-flow, the cycle to release changes for users goes like this:

branch off from the stable branch
apply the changes
spin up an environment that is just with this code.
do the manual testing
generate a tag with this release
integrate the work backward in the development branch
integrate the work in any branch that requires it

At first, it sounds like a good plan, but teams that work in that manner often reach an integration hell. More recently, practitioners have scrutinized this approach, with many advocating for simpler workflows such as trunk-based development (Hammant, 2017). Multiple practitioners and organizations have documented challenges with Git Flow in favor of simpler branching strategies (GitFlow considered harmful, Please stop recommending Git Flow!). On the other end of the spectrum, there are simpler approaches to delivering software, but with that, there are disciplines that need to proceed with the adoption of this technique.

Teams in software development ignore that technical practices matter. For example, it is important to have a continuous delivery pipeline with tests that developers trust and an environment that replicates what production looks like. The feedback loop is one of the most important things while developing software. This becomes crucial for developers that are onboarding a new team and exploring unknown areas of the code base.

With the correct feedback loop and the trustability of developers, this is the first step towards reducing the complexity and exploring trunk-based development. Trunk-based development abstracts away the complexities of many versions, live syncing patches, and the environmental complexities by using a single branch and keeping everyone on the same page at all times.

It sounds simple and actually it is. However, the complexity now turns to the team and the technical practices that the team uses. This becomes even more important in the AI area.

Team maturity and practices

In the advent of AI, agents and LLM’s the dicussion about dificulties in the extensive usage of AI tools, has arrived in the social media. In this tweet on a X group, Joven shares “How do you deal with collegues that are using AI tools to generate code that they can’t fully explain how it works?”. This is a common problem that teams face when they start using AI tools to generate code. The code generated by AI tools can be complex and difficult to understand, especially if the developers are not familiar with the underlying algorithms and techniques used by the AI. My answer to that was based on the basis of eXtreme Programming practices:

that’s what I value the most about extreme programming because the values and the principles are a way to the practice, not the opposite.

You use a tool based on the principles and values that you have. Using AI is a reality and there is no go back, but fundamentals at the moment I am writing this are not yet automated.

One big shift is starting to use the feature flag concept. Flags are things that someone can turn on or off to view new functionalities for specific users, specific environments, or to target a percentage of users in an application. Note that the complexity now lies in the code and not anymore in the version control system. One positive side effect is that there is no need to keep merging and syncing different branches with different code. The only worry is to make changes compatible with what already exists and avoid introducing breaking changes.

In the realm of trunk-based development and feature toggles, code duplication is a particular issue worth highlighting. Maintaining proper feedback loops and comprehensive testing ensures code remains clean, effectively managing code duplication. In the open-source community, multiple projects use feature flags to control when specific features are enabled or disabled.

ReactJS, for example, uses feature flags to experiment with new features. In their documentation, there is a reference to their style of approaching it: How to Contribute to React (accessed 14 Apr, 2024).

How does it work?

At its core, a feature flag is a conditional statement that wraps code, allowing you to enable or disable functionality at runtime without deploying new code. The mechanism is conceptually simple but powerful in practice.

Basic Implementation Pattern

The most fundamental pattern looks like this:

if (featureFlags.isEnabled('new-checkout-flow')) {
  // new implementation
  return newCheckoutProcess(cart);
} else {
  // old implementation
  return legacyCheckoutProcess(cart);
}

This simple conditional allows the team to deploy both implementations and switch between them dynamically. When the new checkout flow is ready for production, you enable the flag. If issues arise, you disable it instantly—no deployment needed.

Storage Mechanisms

Feature flags need to be stored somewhere accessible at runtime. Common approaches include:

Environment Variables: Simple boolean flags stored in your deployment configuration. Best for basic on/off switches that rarely change.
Configuration Files: JSON or YAML files that can be updated without code changes. Suitable for flags that need occasional updates.
Database: Flags stored in your application database, allowing runtime updates through admin interfaces.
Dedicated Services: Tools like LaunchDarkly, ConfigCat, or AWS AppConfig provide specialized infrastructure for flag management with advanced features like percentage rollouts and user targeting.

Evaluation Context

The power of feature flags extends beyond simple booleans. Modern implementations evaluate flags based on context:

function isFeatureEnabled(flagName: string, userContext: UserContext): boolean {
    const flag = getFlagConfig(flagName);
    
    // Environment-based
    if (flag.enabledEnvironments && !flag.enabledEnvironments.includes(currentEnv)) {
        return false;
    }
    
    // User-based targeting
    if (flag.targetUsers && flag.targetUsers.includes(userContext.id)) {
        return true;
    }
    
    // Percentage rollout
    if (flag.percentageRollout) {
        const userHash = hashCode(userContext.id) % 100;
        return userHash < flag.percentageRollout;
    }
    
    return flag.defaultEnabled;
}

This allows for sophisticated rollout strategies: start with 5% of users, monitor metrics, gradually increase to 25%, then 50%, and finally 100% once confidence is high.

The Lifecycle

A typical feature flag lifecycle involves:

Configuration: Define the flag in your system with initial state (usually disabled in production)
Implementation: Wrap new code with flag conditionals, keeping old code as fallback
Testing: Validate both code paths work correctly
Deployment: Ship code with flag disabled (or enabled for internal users only)
Gradual Rollout: Enable for progressively larger user segments
Monitoring: Track metrics and user feedback
Full Release: Enable for 100% of users
Cleanup: After confidence is established, remove the flag and old code path

A Concrete Example

Let’s walk through implementing a new payment provider:

Step 1: Add the feature flag configuration:

payment-provider-v2:
  enabled: false
  environments: [staging, production]
  percentage: 0

Step 2: Implement with the flag:

function processPayment(order: Order, customer: Customer): Promise<PaymentResult> {
  if (featureEnabled('payment-provider-v2', customer)) {
    return NewPaymentProvider.process(order);
  } else {
    return LegacyPaymentProvider.process(order);
  }
}

Step 3: Test both paths in your test suite:

describe('PaymentProcessor', () => {
  describe('with payment-provider-v2 enabled', () => {
    beforeEach(() => enableFeature('payment-provider-v2'));
    
    it('uses new provider', () => {
      // test new provider
    });
  });
  
  describe('with payment-provider-v2 disabled', () => {
    beforeEach(() => disableFeature('payment-provider-v2'));
    
    it('uses legacy provider', () => {
      // test legacy provider
    });
  });
});

Step 4: Deploy to staging with flag enabled, validate thoroughly, then deploy to production with flag disabled.

Step 5: Enable for 5% of production traffic, monitor error rates and payment success metrics.

Step 6: If metrics look good after 24 hours, increase to 25%, then 50%, then 100% over the following days.

Step 7: Once at 100% for a week with no issues, remove the flag and legacy code in a cleanup PR.

Development Workflow Changes

When working with feature flags, the first thing developers write is a test verifying the flag system works correctly. This includes testing that the flag properly enables/disables the feature and that both code paths function as expected. The testing strategy becomes more comprehensive as you need to validate all possible flag states.

Tools

The feature flag ecosystem ranges from simple configuration files to sophisticated SaaS platforms. The right choice depends on your team’s needs, scale, and budget. Let’s explore the spectrum of options.

The Spectrum: From Simple to Enterprise

At the simplest level, feature flags can be environment variables or boolean values in configuration files. As your needs grow—requiring dynamic updates, percentage rollouts, or A/B testing—dedicated tools become valuable. Understanding where your requirements fall on this spectrum guides your decision.

Open Source Solutions

Unleash

An open-source feature flag service that can be self-hosted or used as a managed service. Unleash provides a robust API, SDKs for multiple languages, and a web interface for managing flags. It’s particularly strong for teams that want control over their infrastructure while avoiding building from scratch.

Primary use case: Teams wanting self-hosted solutions with enterprise features
Key differentiator: Strong community, battle-tested in production environments
Pricing: Open source with optional commercial support

Flagsmith

Another popular open-source option offering both self-hosted and cloud variants. Flagsmith emphasizes ease of use and provides features like remote config, A/B testing, and multi-environment support out of the box.

Primary use case: Startups and mid-size companies wanting simple deployment
Key differentiator: User-friendly interface, quick setup
Pricing: Open source, with paid cloud hosting options

SaaS Platforms

ConfigCat

A developer-friendly service with a free tier that includes unlimited flags. Known for fast flag evaluation and simple APIs.

Primary use case: Startups to mid-size companies wanting managed services affordably
Key differentiator: Unlimited flags on free tier (up to 1,000 requests/month), transparent pricing, easy integration
Pricing: Free tier available, usage-based pricing beyond free limits

GrowthBook

Open-source platform specializing in A/B testing and experimentation, with feature flags as a core component.

Primary use case: Product teams focused on experimentation and metrics
Key differentiator: Built-in statistical engine for experiment analysis
Pricing: Open source with optional cloud hosting

PostHog

An all-in-one product analytics platform that includes feature flags as part of a broader toolkit.

Primary use case: Teams wanting combined analytics and feature flag management
Key differentiator: Single platform for analytics, session replay, and flags
Pricing: Open source and cloud options, usage-based

Cloud Provider Solutions

AWS AppConfig

Part of AWS Systems Manager, AppConfig provides feature flag and configuration management integrated with AWS services. If you’re already on AWS, it offers native integration with other AWS tools.

Primary use case: Teams heavily invested in AWS ecosystem
Key differentiator: Native AWS integration, no additional vendor
Pricing: Pay-per-use, typically lower cost for AWS-centric architectures
Best practices for validating AWS AppConfig feature flags

Azure App Configuration and Google Cloud Config offer similar capabilities for their respective ecosystems.

Simple Solutions (Start Here)

For teams just adopting feature flags, don’t over-engineer:

Environment Variables

const ENABLE_NEW_DASHBOARD = process.env.ENABLE_NEW_DASHBOARD === 'true';
const ENABLE_BETA_FEATURES = process.env.ENABLE_BETA_FEATURES === 'true';

Works for simple on/off switches. Requires deployment to change but costs nothing and has zero complexity.

Configuration Files

interface FeatureConfig {
  features: {
    newCheckout: boolean;
    experimentalUI: boolean;
    rolloutPercentage: number;
  };
}

const config: FeatureConfig = {
  features: {
    newCheckout: true,
    experimentalUI: false,
    rolloutPercentage: 50
  }
};

Allows runtime updates if you implement config reloading. Good for early-stage feature flag adoption. Database Flags A simple feature_flags table with admin UI for toggling. You’re building basic infrastructure but maintain full control.

Decision Framework

Starting out or small team (<10 developers)

Environment variables or config files
Graduate to simple database-backed solution
Consider ConfigCat’s free tier if you want managed service

Mid-size team (10-50 developers)

Unleash or Flagsmith for self-hosted
ConfigCat or GrowthBook for managed service
AWS/Azure/GCP native if committed to one cloud

Enterprise or heavy experimentation focus

LaunchDarkly for maximum features and support
Split.io if experimentation is core to your process
GrowthBook if you want open-source with enterprise features

Critical considerations:

API latency: Local evaluation (SDKs cache flags) vs. remote evaluation (every check hits API)
Data residency: Some industries require self-hosted for compliance
Lock-in risk: Open source solutions reduce vendor lock-in
Team expertise: Managed services reduce operational burden

Types of feature toggles

Pete Hodgson’s taxonomy, as outlined in the Martin Fowler article on feature toggles, categorizes flags into four distinct types based on their purpose and longevity. Understanding these categories is crucial because conflating them leads to technical debt and maintenance headaches.

Release Toggles

Purpose: Enable trunk-based development by allowing incomplete features to be merged into the main branch without being visible to users.

Lifespan: Short-lived (days to weeks). Should be removed immediately after full rollout.

Dynamism: Can be changed without redeployment, typically toggled through configuration.

Who controls: Engineering team, sometimes product managers during rollout.

Example: You’re building a new dashboard layout. The work takes three weeks across multiple developers. Rather than maintaining a long-lived feature branch, you merge incrementally to main with the feature behind a new-dashboard-layout flag. Once complete and validated, you enable it for all users and remove the flag within days.

Key characteristic: These flags have an expiration date. If a release toggle lives for months, it’s a code smell indicating either over-engineering or poor cleanup discipline.

Experiment Toggles (A/B Testing)

Purpose: Run controlled experiments to measure the impact of different implementations on user behavior and business metrics.

Lifespan: Medium-lived (weeks to months), lasting for the duration of the experiment plus analysis period.

Dynamism: Highly dynamic, often requiring sophisticated distribution logic to ensure proper experimental cohorts.

Who controls: Product managers, data scientists, or growth teams.

Example: You want to test whether a green “Buy Now” button converts better than the current blue one. You create an experiment flag that shows green to 50% of users and blue to the other 50%, tracking conversion rates for each cohort. After two weeks of data collection, you analyze results and make the winning variant permanent.

Key characteristic: These flags require careful user bucketing to avoid contaminating experimental results. A user must consistently see the same variant throughout the experiment.

Ops Toggles (Circuit Breakers)

Purpose: Control operational aspects of the system, often for managing system behavior during incidents or peak load.

Lifespan: Long-lived, potentially permanent. They’re part of your operational toolkit.

Dynamism: Must be changeable without deployment—often in real-time during incidents.

Who controls: Operations team, SREs, or on-call engineers.

Example: Your application integrates with a third-party recommendation engine. You implement an enable-recommendations ops toggle. During Black Friday, the recommendation service becomes overloaded and slow, dragging down your page load times. You disable the toggle instantly, falling back to showing popular items instead. Your site stays fast while the third party recovers.

Key characteristic: These flags are defensive mechanisms. They should gracefully degrade functionality rather than cause failures.

Permission Toggles (Feature Access)

Purpose: Control which users or accounts have access to specific features, often for premium tiers or beta programs.

Lifespan: Long-lived to permanent. These often evolve into your product’s access control system.

Dynamism: Changed through administrative interfaces, not requiring deployments.

Who controls: Customer success teams, account managers, or automated based on billing system.

Example: You offer a “Teams” feature only available to users on your Pro plan. The teams-feature permission toggle checks the user’s subscription tier and enables/disables accordingly. This might also be used for invite-only beta programs: “AI Assistant is available to users in the beta-testers group.”

Key characteristic: These flags are tied to your business model and user access patterns. They’re features themselves, not temporary development aids.

Comparison Table

Type	Lifespan	Dynamic?	Who Controls?	Primary Purpose
Release	Days-weeks	Yes	Engineering	Safe integration
Experiment	Weeks-months	Yes	Product/Data	Measure impact
Ops	Long-lived	Yes (critical)	Operations	System stability
Permission	Long-lived/Permanent	Yes	Business teams	Access control

Why This Matters

Teams that treat all flags the same accumulate technical debt rapidly. A common anti-pattern is keeping release toggles around “just in case,” turning them into permanent code complexity. The flag that was meant to ease a two-week feature merge becomes a permanent conditional that developers must understand forever.

Equally problematic: using release toggles for permissions. This creates specific security risks:

Authorization bypass risks: If flag evaluation happens client-side or without proper authentication checks, users may manipulate flags to access features they shouldn’t
Scattered authorization logic: Permission checks become distributed across feature flag code rather than centralized in access control systems, making security audits difficult
Compliance gaps: Traditional IAM systems provide audit logging, role management, and compliance reporting that ad-hoc flag-based permissions lack
Inconsistent enforcement: Different parts of the system might check flags differently, creating authorization gaps

The solution is ruthless cleanup of short-lived flags and proper infrastructure for long-lived ones. Release and experiment toggles should have automated reminders or CI checks that alert teams when they’ve overstayed their welcome.

Trade-offs and Challenges

While feature flags enable powerful capabilities, they introduce complexity and risks that teams must actively manage. Understanding these trade-offs is essential for successful adoption.

Increased Code Complexity

Every feature flag creates multiple code paths. What appears as a simple if statement doubles the possible execution paths through your code:

// One flag: 2 paths
if (flagA) {
    path1();
} else {
    path2();
}

// Two flags: 4 paths
if (flagA) {
    if (flagB) {
        path1();
    } else {
        path2();
    }
} else {
    if (flagB) {
        path3();
    } else {
        path4();
    }
}

With n independent flags in a code path, you have 2^n possible combinations. This exponential growth makes reasoning about system behavior increasingly difficult. While you won’t typically have many independent flags in a single code path, even a few create substantial complexity.

Testing Burden

Each flag combination should ideally be tested, but exhaustive testing becomes impractical quickly. Teams must balance thorough testing with pragmatism:

Test the most critical paths (flag enabled and disabled)
Test interactions between related flags
Use property-based testing to explore flag combinations
Accept that some edge cases won’t be tested until production (with careful monitoring)

A common pattern is testing each flag independently while ensuring good observability to catch unexpected interactions in production.

Technical Debt Accumulation

The most insidious challenge is flag proliferation. Feature flags that aren’t removed become permanent complexity. Teams need discipline and processes to prevent this:

Anti-patterns that cause debt:

“We might need to toggle this again someday” (YAGNI violation)
Forgetting flags exist after successful rollouts
Lacking ownership or process for flag cleanup
Using release toggles for permanent configuration

Mitigation strategies:

Set expiration dates for release and experiment toggles
Create Jira tickets or reminders for flag removal
Add CI warnings for flags older than expected lifespan
Make flag cleanup part of definition-of-done
Regular “flag debt” cleanup sprints

Some teams implement automated flag removal: if a release toggle has been 100% enabled for 30 days without issues, an automated PR removes it.

Configuration Sprawl

Managing flag states across multiple environments (development, staging, production) becomes complex:

Which flags should be enabled where?
How do you keep configuration synchronized?
What happens when staging differs from production?
How do you test production-like configurations locally?

Configuration management discipline becomes critical. Many teams adopt “configuration as code” approaches, storing flag states in version control and using CI/CD to deploy them consistently.

Performance Considerations

Every flag evaluation has a cost, even if small:

Local evaluation: Minimal cost (microseconds) but requires SDK integration and cache updates
Remote evaluation: Network latency on each check (milliseconds to seconds)
Complex targeting: Sophisticated rules increase evaluation time

Best practices:

Cache flag values when possible
Use local SDKs that periodically sync from central service
Avoid flag evaluation in hot code paths
Pre-compute flag values at request start rather than checking repeatedly

Operational Risks

Incorrect flag configuration can cause production incidents:

Accidentally disabling a feature for all users
Enabling incomplete features
Targeting rules that match wrong users
Flag dependencies creating unexpected behavior

Mitigation requires:

Audit logging of all flag changes
Gradual rollout strategies (start small, expand gradually)
Rollback procedures (know how to instantly disable a flag)
Monitoring and alerting on flag changes
Testing flag configurations in staging before production

Visibility and Discovery

As flag count grows, teams lose track of what flags exist and their purposes:

What does enable_exp_v2 actually control?
Is this flag still in use?
Who owns this flag?
What’s the flag’s business purpose?

Documentation becomes essential:

Descriptive flag names (new-checkout-flow not flag123)
Metadata: owner, purpose, creation date, expected expiration
Centralized flag registry or dashboard
Regular audits of flag inventory

The 2^n Problem in Production

The combination explosion isn’t just a testing problem—it’s a production reality. With multiple flags, you cannot exhaustively test all combinations, meaning some paths only execute in production. This requires:

Comprehensive observability (logging, metrics, tracing)
Error tracking that captures flag states during incidents
Gradual rollouts to limit blast radius of unexpected interactions
Quick rollback capabilities

When Not to Use Feature Flags

Sometimes feature flags are the wrong tool:

Simple, low-risk changes: Adding a logging statement doesn’t need a flag
Database migrations: Flags can’t help with irreversible schema changes
Security patches: These need immediate, consistent deployment
When technical debt is already high: Adding flags to legacy code might worsen maintainability

Feature flags are powerful but not free. Teams must weigh the benefits (deployment flexibility, safe rollouts, A/B testing) against the costs (complexity, testing burden, technical debt risk). For teams practicing continuous delivery with robust testing and operational discipline, the trade-offs favor flag adoption. For teams lacking these foundations, investing in testing and deployment automation first may be more valuable.

Resources

Changelog

Feb 02, 2025 - Major content expansion: added comprehensive “How does it work?” section with code examples, completed “Types of feature toggles” taxonomy, expanded “Tools” section with detailed comparisons, added “Trade-offs and Challenges” section, fixed grammar issues, added proper citations
Jan 30, 2026 - Added aws app config
May 08, 2024 - Initial version

References

Hodgson, P. (2017). Feature Toggles (aka Feature Flags). https://martinfowler.com/articles/feature-toggles.html
Hammant, P. (2017). Trunk Based Development. https://trunkbaseddevelopment.com/

Table of contents