Server Driven UI – Streamlining Mobile Development and Release

May 3, 2024

34 31 minutes read

Untitled design6 — ThomasChao medium 1714035978649.jpg

Transcript

Chao: We’re going to be talking about server driven UI primarily for mobile experiences. Some of the topics we’re going to cover will also tie in with web. How many people here are mobile developers who have dabbled a little bit with Android, iOS? How many people are web developers? I’m a mobile developer. Hopefully, I get the web parts relatively accurate. What are we going to be talking about? We’re going to be talking about server driven UI, and really what are some of the tradeoffs that you can think about when you’re trying to pick your different server driven UI technologies. Then we’ll go through, talk about what my personal experiences are, and when to use those different tradeoffs, as well as when server driven UI is actually applicable.

What Is Server Driven UI (SDUI)?

Let’s start with definitions. One of the big questions that people always ask is, what is server driven UI? Isn’t everything driven by the server? Yes, obviously, all the smartphones, they talk to the server in the cloud, and they get tons of information back from there. Historically, what we have is that we have mobile apps that more have like server driven data, I would describe it as. The client queries the backend. The backend responds with some data, and then the client just renders it. This works pretty well for all of us. It’s very easy for us to reason about the data flow, there’s clear responsibilities for both the client and the backend. You’re making a query. You’re getting a response. You’re rendering it. Unfortunately, as products have gotten more elaborate, there’s this greater and increasing appetite across industry to experiment and to find unusual experiences, ultimately, to make more money. Since we’re talking about experimentation, data is only really one piece of that puzzle.

Terminology

When I think about mobile apps, I actually break it down into a few different pieces. There are generally three areas that I like to think about. The first one is the UI or layout. For example, you open up Gmail, it might show a list of different emails that you’re rendering. You open up Uber, it might show a carousel of different vehicle options that you want to try and choose. Then you’ve got the data piece. For example, with the Lyft application, you might be able to see the ETA for when a vehicle is going to arrive, or the estimated fare for a given trip. Finally, there’s actions. When you click on a button, what does it do? Does it redirect to another page? Does it fire off a network call? What analytics are fired when a button is clicked, or a view is rendered, or a list is scrolled? Most of these tend to be traditionally controlled on the client. What you end up with is this thick backend with an equally thick client.

Challenges

What’s wrong with that? After all, mobile has been around for about 15 years, we’ve got some world-class mobile developers in the industry now. Mobile is really nice. It’s actually very powerful. You can do a lot of things with it. There are a few problems with this. The first one is iteration speed. The mobile release cycle is incredibly slow. Normally, what happens is this, you’ll land a diff, after a while a build is cut. After that has been cut, it sits for a few days just soaking in dog food. This is really critical to get that testing because mobile is unique, you can’t roll it back. Once you deploy a binary out to production and someone’s updated it, they have that. It doesn’t downgrade. There is none of that. You have to be very careful that you’re not shipping out something that’s going to crash or has a security vulnerability, or leak information, or any of that. What happens then? You’ve built it. You’ve waited a few days. You’ve cut it. Now it’s soaking in dog food, and everything’s all good. Then you send it over to Apple and Google to get it approved. These days, they’re pretty fast. Google tends to get them approved within about 24 hours, Apple within about 72 hours, they have a long tail. Unfortunately, in the worst-case scenario, this can be up to about a week in scenarios like time around Thanksgiving, WWDC, Christmas. That takes a little bit more time. Then you do this phased rollout. The phased rollout is just the standard thing, we will launch it to 1%, 5%, 10%. Once again, we want to make sure that we’re catching any production issues as early as possible before worldwide blast radius.

Finally, a few weeks later, as you can see on the diagram, it’s rolled out to 100%, we’re done. It’s amazing. No. Once it’s actually rolled out to 100%, now the users can actually start updating. Unfortunately, the usual adoption curve for mobile applications is relatively asymptotic. You’ll get a spike of users updating very quickly at the beginning. As you can see, from here, it’s about 85% over the first couple weeks, and then it tails off as people refuse to update. There’s a lot of reasons for that. People might not update because they have no space on their phones, because they’re running a very old operating system. They’ve disabled auto-updates, so they don’t have good Wi-Fi connectivity, so the auto-updates aren’t happening. Whatever that reason is, the stats that I’ve seen from multiple companies and multiple products show that about 1% of all users continue to run a build that’s more than 6 months old. You have to think about that. You’ve launched something new, six months later, people still haven’t gotten it. You end up having all these problems of having to maintain your old endpoints and support backwards compatibility. I used to work in Gmail inbox, we had a fun problem where someone actually had a phone where they had the newest experience, and then they had a tablet where they had a much older version. They made a mutation on the phone. Then they saw the old thing on the tablet, because the old tablet didn’t support that new mutation, and they were very upset. This person that I was talking about actually ended up being an executive at Google and we got yelled at. These are the kinds of fun things that you have to deal with due to this problem.

There are a few other challenges that come from the fact that mobile is unique. First of all, we’re not perfect. If a problem is found post-release, you either have to turn off that feature, or you have to do a full-on binary respin. Like we talked about in the previous slide, that binary respin can take multiple weeks, to really get out to everyone. Often, what ends up happening is teams will actually turn off their feature, and to be able to do that, that means that they hide almost everything behind a parameter flag, or an FP. This is actually a good general practice. I’m sure many of you do it in both your web and mobile development. The end result is that you end up with hundreds if not thousands of parameters or FPs that are sitting in the wild just living there almost in perpetuity. It becomes really hard to debug what’s going wrong. I’ve personally spent more hours than I can think of, trying to debug the wrong configuration. The next thing is like we were talking about, old versions. Some users just don’t upgrade, and so you have to worry about the old inconsistent experiences. Eventually, you end up biting the bullet and saying, we’re just going to force upgrade people. We’re going to accept the fact that that 1% of users, we’re going to lose them, and maybe they’ll upgrade in the future, maybe would not, and we’re just going to lose that revenue, because the cost of maintaining everything is too much higher.

The last one, which is actually fairly common is you start to get this interesting divergence between Android and iOS. You’ll end up with teams where you’ve got more Android engineers than iOS engineers, or vice versa. Where maybe iOS has a really good framework that allows you to develop a feature much more easily than Android does. This divergence becomes problematic, because it’s obviously not great for the end user. You also start to hear these excuses from your teams, your product managers, on, we’re going to launch on one platform and learn what’s good, and just apply it to the other platform. That’s fine. These are all little challenges. I’ll give you another concrete example. If anyone is crazy enough to actually carry both an Android and an iPhone, if you open up Gmail on Android, you’re going to see tons of settings that allow you to enable different notifications. Ok, I want to get notified when this label gets sent email, or I want to only get notified but I want it to be silent. On iOS, none of this exists. It’s been an outstanding request for probably close to a decade at this point. Why? Because it’s very easy for these to diverge and it’s very hard for them to be kept in sync. Overall, it becomes really hard to experiment and iterate on mobile. I’m not talking about the initial launch of a feature. If you’re launching something brand new, it’s perfectly reasonable that it takes a few weeks, maybe a couple months to launch and get out there. If I’m trying to make a small change like, I want to change the color of this, or I want to change some text, or I want to add a new line in here. It’s really painful that we’re talking about multiple months for our end users to get that experience, and then start to give us feedback.

Let’s take a quick example from my previous company, Uber. We had a car that we were rendering for the different products that you had. We wanted to experiment with a few things. For example, we wanted to experiment by changing the ETA to be the estimated drop-off time. The problem was that today, the backend is sending down an ETA, and you’ve got some client code that pulls out that field from the proto, and maps it to the UI, and just renders that. Now we need to change the backend to send new fields but we also need to change the client to be able to read those new fields and bind to the UI. Or, in the approval section we want to add a banner at the bottom that said this is the most popular product, except the client didn’t have a concept of that third row, so we couldn’t do it. It was trivial to add a TextView to a layout on the mobile side. It took us a day to do it, probably like 20 minutes to do it. Once we did that, we had to respin it, let it soak, push it out. It was super slow.

Enter Server Driven UI, a Panacea

Native development is really powerful. Don’t get me wrong. I actually really like native development. I’ve loved Jetpack Compose. I love SwiftUI. I think there’s really good things being done there, but there are always some problems. Is SDUI a panacea? There are a lot of teams in companies, in the industry that are trying to at least explore this area. Beyond the various codenames I’ve got listed here, I’ve seen blog posts from companies like Lyft, Netflix, Google, Uber, Airbnb, Microsoft. There’s a lot of people trying to make mobile development act a little bit more like web development, where you can just push instructions from the backend, and the client can just run them. It allows us to iterate more quickly. It allows us to avoid some of these backwards compatibility problems. Hopefully, it allows us a bit more consistency in between Android and iOS. Great, it’s a panacea? Not really, and it depends. It’s great for some things, and not for others. That’s really what we’re going to be talking about here.

The first major problem that I wanted to hit on was the terminology, SDUI. A lot of people hear the term server driven UI, and they think of it as this one-size-fits-all solution. If you actually sit down and chat with other engineers, you’re going to find that different people have a different mental model when they hear that term. No two people tend to agree on what it actually means. In some people’s minds, you get something almost on the left-hand side of the spectrum, like a WebView, where you can render anything you want. You’ve got a Turing complete language. You could send down any sort of layout. If you want to do something that’s pixel perfect, you can do that. If you want to write a hyper-customized lambda that you send down, you can evaluate it at runtime, you can do that as well. That’s perfect. At the other end of the spectrum, you get something like, it will be this Ghost Platform. We’ll talk about that a little bit more. What this does at a high level is it provides a well-defined component library that developers can use and mix and match to render out their UI. That’s really nice, because it forces a lot of consistency. You’ve got this really good molecule of building blocks. If you want to build anything outside of those predefined, pluggable components, you’re stuck making a client change again, and we’re back to the slow-release cycle. In the middle, we end up with something kind of like Uber’s Sindarin, which tries to provide atomic components and this ends up trying to balance out the flexibility and consistency ends of the spectrum. Once again, I’m not going to say that any of these are perfect. They all have their tradeoffs, but that’s what we’re going to be talking about.

Server Driven UI

Jumping back briefly to the terminology, we defined three different areas that we were talking about. There’s the UI, the data, and the actions. I personally find it very useful to think about that spectrum amongst these three different dimensions. Let’s go with the UI piece. On the left, you’ve got a WebView. It’s fairly self-explanatory. You can style the layout however you want by just addressing your CSS. If some team wants to build their own custom UI, they can do that. They just put in their own WebView, build whatever they want. They’re not going to disrupt any other teams, any other developers. Everyone gets their own little sandbox to play in. This is really flexible and really powerful. It’s nice, because upfront, you understand as a developer, any possible idea I have that I could possibly come up with in the future, I’m guaranteed that I will be able to implement it using this, without having to do a binary respin. Because it’s a Turing complete language, I can do anything. That flexibility comes at a bit of a cost. I’ve learned personally a long time ago that if you let every team do their own locally optimized maxima, every team is going to build their own locally optimized maxima. Everyone is going to do what’s best for them, not what’s best for the product, or what’s best for the company. Over time, the product becomes a mishmash of all those different experiences. It’s not bad, per se. Amazon is actually super famous for using WebView tech across their mobile apps. You can see that when you open their app, that the checkout page looks quite different from their settings page, or their product pages. Once again, if you’re thinking a little bit more on the flexibility, iteration speed side, maybe this is interesting for you. If you’re thinking more on the consistency side, maybe not so much. Another challenge that you end up running into is performance problems, because instead of using native components, what you end up having is you’re sending down all these instructions that get interpreted on the client, at runtime. One common example is actually animation performance. If you’re trying to animate here, you get really nice animation when you use native components, because you’ve got all the nice transitions that are built into the binary. You get relatively subpar animations, because you’re sending down various matrices that you’re actually applying and running when you’re doing web.

In the middle, you’ve got something that’s a little bit more like Uber’s Sindarin framework. It’s not open source, so we never publish too much about it. Uber has a thing called the Base design language. It’s a design language that defines a ton of building blocks or UI component primitives, things like hit button, a label, a divider, a list. These are all defined natively on the client. What we did with Sindarin was we took that Base design language and mapped it to the backend via a Go DSL. That means any primitive component can now be built either with a backend command, or with a native command on the client. You can see here on the left that what I’ve got is some pseudocode. I’ve got a vertical stack with some children. The first one has a label that says, choose your ride, then there’s some space, then there’s a card concept. Within that, it’s got an illustration, so on, so forth. We’re rendering something that looks a little bit like what’s being shown in the picture there. One nice thing about this, because it’s all native, ultimately, is that you can actually mix and match that. If your UI is already built natively, like I’ve already built out an entire page, and I only want to experiment with one subcomponent, I can do that. I just swap out that subcomponent with the backend representation of it and send it down. Then we can experiment with that part of the layout without affecting the rest of it.

This gives us a few benefits, like the big ones are performance. Since they’re all native components, they’re just running the same way. They’re optimized to run on the devices, and debuggability, because ultimately, the backend is just sending down native instructions. You end up being able to use a lot of the tools that you’re familiar with, things like the IntelliJ debugger, or HPROF. As a mobile developer, you don’t have to learn this brand-new ecosystem. You’ll still have to learn a few new things. Ultimately, these are generated from backend DSL commands, but hopefully it’s a little bit easier. We also ended up trying to make the DSL commands, as you can see, look a little bit more like mobile command. Think Jetpack Compose or SwiftUI. The other thing that you get from doing this is you end up with an increased degree of consistency, because ultimately on the client, you have a well-defined set of components. I’ve got a button with text size medium, or text size large, I don’t have a button with text size 12 SP. On the backend, your Go DSL can provide that exact same thing. You’re guaranteed that when it renders on the client, it looks similar or exactly the same as the native components that you’ve built. The major drawback to something like this, and this is why it sits in the middle of the spectrum, is developers can still mix and match what they want. What we provided is a bunch of what I would describe as atomic units, your button, your label, so on, so forth. You haven’t actually described the larger molecular components. One feature might show a dialog popup with an ok button on top and a cancel button below it. Another feature might show that same dialog with an ok on the right and a cancel on the left. That does lead to some end user confusion, and it doesn’t feel quite as good.

Then on the other end of the spectrum, like we talked about a moment ago, is Airbnb’s Ghost Platform. If Uber’s Sindarin was providing atoms, then I would describe this framework as taking the approach of providing the higher-level molecules. Instead of providing a singular binary divider, they’ll provide something like the HERO section that you see at the top, and that’s going to contain a carousel of images. Or maybe they’ll provide a title section, which has a title along with a star rating for the property that you’re looking to book. Taking this approach is great, because it forces far more consistency across the entire product. There is now a singularly well-defined set of building blocks that everyone can use for their UI. As a developer, I have to worry about fewer permutations. I don’t have to think about, did someone swap these two buttons, and were there any problems that arose from that decision? I can make an assumption that the footer section is always going to be at the bottom. It’s always going to have a higher Z index. It’s always going to have a call-to-action button, so on, so forth. You also end up saving a decent bit of bandwidth, and we’ll talk about this a little later. If you’re not sending down all these primitives, there’s less instructions that you have to send down to the client. I want to say I want to render a HERO section, and it knows what a HERO section is. I don’t have to say, I want to render a horizontal carousel that has four images, and the images themselves have these certain properties. There’s a click listener for these images to expand it out into a full screen view. You just have a HERO section.

Once again, there’s downsides to this. Like mobile development today, taking this approach makes it a little bit harder to experiment. Imagine if you’re a team and you’re building out a brand-new component. First of all, that component doesn’t exist on the client, so you’re going to have to build it into the client, and you’re going to have to push out a new binary. That’s problem number one. Problem number two is, if you want to build that new component, either you end up building it as a brand-new component, in which case now there is yet another primitive that people have to think about when they’re building out their features, or you modify an existing component. If you modify an existing component, like the title section, then every other call site across your entire application that’s using it is going to get modified. You get the consistency. Sometimes there’s a downside to that, you get the blast radius of making that change, and that causes problems. Ultimately, you can see how there’s this bit of a spectrum, if you have a much stronger opinion of what kind of UI you want to build and what kind of experiences you want to build. It’s great to use something like Ghost Platform because it forces a lot more rigor into your design language, into your design systems. You can probably build tools into Figma, or whatever tooling your design teams are using to encourage the designers to actually build things using these basic building blocks. On the other extreme, if you really don’t know what you’re trying to build, you’re a startup, you’re really trying to explore a lot more in a certain area. Putting a WebView in that area, isn’t the worst thing in the world. I personally prefer to use the component library that we built at Uber, mostly because I found that it was good enough that we could build out a bunch of these common primitives and let teams mix and match those, and then we would end up building a lot more consistency by actually forcing the designers to design consistency at the Figma or design level, rather than trying to encode this at the developer level. It’s a tradeoff.

Server Driven Data

Let’s move on to the data piece, because, why not? We’ve already made this very clear that there’s no right answer. It’s very confusing on the UI piece. Rather than solve that, let’s make it even more complicated. On the data piece, there tends to be two different options when developers are looking to build out an SDUI experience. The first one is server-side rendering. The second one is two-phase rendering. Let’s define those. Server-side rendering is what many of us think of when we think of web 1.0. You’ve got the backend that’s sending down a fully hydrated payload. Going back to our HTML example from before, you can see that payload already has the image source URL baked into it. It’s already got the title of the hotel that I’m looking at baked into it right there as well. It’s pretty nice. One of the big things that you get from server-side rendering is, it’s a lot easier to debug, because you get a single blob of data that comes down. It’s also a lot easier to set up a sandbox, because your single blob of data is hermetic, it doesn’t have any dependencies on things outside of that sandbox. It’s also a lot easier to test. A lot of what I just said will make sense in contrast with two-phase rendering.

Let’s talk about two-phase rendering. The opposite of server-side rendering is two-phase rendering. The idea here is that you’re going to send down data separate from the UI. For example, let’s take a look at this picture that we have here. I’ve got the Uber product opened up. If I did this with server-side rendering, we would just send this down and it’d work perfectly. Imagine we want the fare to update every few minutes, or the ETA to update every 60 seconds. You don’t want to open it up, and it shows, it’s going to cost you 10 bucks, and you’re still looking at it 5 minutes later when surge has kicked in, and it’s still claimed it’s 10 bucks. Maybe all of us as riders would want it. Speaking from the other side, where we were trying to make money, we don’t want to keep presenting you that it’s going to cost 10 bucks, when, clearly, it’s going to cost us a lot more to get drivers to come to pick you up. If we did it with server-side rendering, we would send down that entire payload. Then every time we want to update, we would regenerate everything and send down another huge payload. It’d be wildly inefficient in terms of bandwidth, but it would also lead to other problems like forcing us to preserve client state. For example, if I’d scrolled halfway down the screen, or if I had expended a different card, when you send down that data, you don’t want to re-render. You don’t want to have the user’s viewpoint snap.

Two-phase rendering wildly simplified view looks like this. The client makes two separate network calls. The first one goes to the backend, gets the layout, which in this mock is just a single button. That button has a label, which is keyed off of some key. Then there’s a second that we’ll call which goes to the backend, and gets data and puts it in some data cache. Then on the client, we take the data from the data cache via the key and we bind the two of them together. Intentionally bifurcating this data and layout addresses the update way problem, because we can now push or pull the data completely separate from the UI. That’s great. It also helps with another problem that I’ll talk about briefly, which is, it allows us to cache the two of them separately. What you end up finding in a lot of client development, not just mobile, but web as well, is the lifecycle of the UI and the data is wildly different. Even if we want to experiment with UI, we’re not experimenting with pushing out new UI layouts every day, or every week. A product manager will come up with a brilliant new idea, maybe once a month, and then we’ll iterate on that. You can save bandwidth by just caching the layout, and only updating the data. It’s great. As a quick fun aside, by doing this, you also get some very interesting little things that you can do. For example, in this pseudocode that we’ve put together, what we ended up doing was we actually ended up being able to have some of these keys, look at the values of other keys within that data cache because everything runs on the client. In this specific one, what it said was, ok, we’re going to have an integer comparison. We’re just going to compare some other value to 5. If it’s less than that then we’re going to show the text in red. If it’s greater than that, then we’re going to show the text normally. You can evaluate this on the client. The ETA is probably something that already exists on the backend, so you don’t need to do that. You can come up with other scenarios where you actually want to look at client-side state. For example, imagine I want to have a screen and I only want to show certain things, if they’re going to a certain component, if it’s going to be above the fold. To know if it’s above the fold, you have to know how tall the screen is. You’re not going to have the client figure out how tall the screen is, send that information to the backend to be cached, just so we can calculate, is this above the fold? Then do a server-side render and send that back down. Why not just have that Boolean evaluation run client-side.

We’ve just been talking a lot about two-phase rendering. Really, using two-phase rendering, we get the ability to update data at a different rate than the layout is being updated. We get the ability to cast them separately. This is great. The downside of two-phase rendering is that now you have to ensure that your data and your layout are always in sync with one another. What do I mean by that? Imagine I send down layout version one, and it references some text called key A. Then I send down layout version two and it references text called key B. Unless the data cache has key B in there already, now you’re out of sync. It’s not immediately clear what’s going to render. It could render with stale information. It might crash. It might render nothing. You end up having to reason about this relatively intricate or you end up having to build an almost intricate versioning system to keep the versions of the layout and the data in sync with each other. Because unlike the previous one with server-side rendering, where everything comes down as a single hermetic blob, you’ve got two pieces that have to talk to each other.

Which data should I use? My favorite phase, it depends. For me, personally, I’ve found that the benefits of two-phase rendering far outweighs the technical complexity of server-side rendering. You’re going to get the performance and offline benefits that we talked about from reduced payloads, improved caching. You’re going to be able to actually get different behavior that’s dependent on the client only state, like we talked about. I tend to personally reserve server-side rendering for scenarios where the UI is probably going to be static, at least the lifecycle of that session. Something like, ok, I want to render a help center page. That’s going to be static, it doesn’t really need to be dynamic. That’s a perfect use case for server-side rendering.

Server Driven Actions

Lastly, let’s talk about actions. The primary decision here is whether to link the actions directly in the layout or via lambda. What do I mean by directly linked in the UI? Let’s take this button component, for example, that we see on the right. Obviously, when you click on it, you need to take an action. We can define an interface that looks something like this, where you can say, ok, when I click on it, there’s an onclick reference, and that’s going to route to a certain location. Maybe in the future, I want to add something where I say, ok, onclick, I’m going to fire off a specific well-defined analytics ID. Because of all this, your contract is very well defined. It’s actually very nice because, upfront, I can actually guarantee that when someone creates a button, in my server driven UI component, it’s guaranteed to have a destination it’s going to route to. It’s guaranteed to have an analytics event it’s going to fire. The downside of all this, unfortunately, is once again flexibility. If you add more functionality, you’re going to have to change the interface, which changes the API contract. That means, at the end of the day, server has to update, client has to update. We’re back to the binary respin problem that we talked about.

The alternative to that model is actually trying to model in a far less typesafe way. In this example, what you end up with is you got a button, and you just send down an array of events that get executed. In this case we have a button event that says, ok, I’m going to do a redirect to this specific page, or I’m going to have an analytics event that gets fired on during the onclick event, or I’m going to have an analytics event that gets fired when the button itself is rendered. Similar to the server driven data discussion that we had earlier, this lack of type safety foo, unlocks a lot of really powerful things, because actions essentially just become a lambda that gets evaluated on the client. You can build action chains where you can say, ok, I want to click a button. When that happens, fire off a network call to start prefetching data and redirect to another page. If the redirect succeeds, fire off one analytics event, or if it fails, fire off another analytics event. The challenge with this is you’re intentionally weakening the API contract. I’ve seen situations where a developer will end up creating a button, but forget to add the onclick action. They’ve got a button in there, you’re clicking it, and it’s not doing anything. Or someone ends up creating a onclick event that’s supported in binary version 1000, but the developer forgets that there’s versions that are older than that out in the wild. They send this down, it doesn’t get executed, because it doesn’t get supported. Once again, the end user is clicking on something that’s not working. There are a lot of things that we can do with [inaudible 00:36:16]. I haven’t played with it much myself, but I really like the Rust model, where you try and guarantee a lot of the type safety and correctness at the compile time, in this case on the backend. Ultimately, across the wire and on the client, you’re not typesafe. That’s intentional, but it’s dangerous.

Once again, which one should I choose? It depends. If you’re going to do a lot of experimentation with your actions, you’re probably going to want to consider plan diversion. It gives you a lot more flexibility. Me personally, actually, having built both of these, I’ve found that often, especially with actions, you don’t need that level of flexibility. Often, there’s really just a small handful of actions that we all end up supporting. I want to redirect to another page. I want to fire an analytics event. I want to make a network call, or I want to write some state back into the local cache. I click a button, and I want to write some state here so that something else can read from that and use that. What I’ve personally found is by building all that flexibility, the maintenance cost of it was far higher than I had personally anticipated. We ended up not really using the vast majority of that flexibility.

Should I Use SDUI?

I’ve just talked about all the tradeoffs when dealing with server driven UI, and it sounds a little bit like it’s really hard. Maybe it’s not worth the squeeze. You’re not going to get that much benefit. Shouldn’t I use it? No, I actually think it’s pretty nice. I’ve had good experiences. We’ll talk about that. I think it’s a really valuable tool to have in your toolkit. I strongly encourage people to use it when it’s valuable, when it’s appropriate.

Recommendation

SDUI is a very powerful tool. It allows us to make changes on the backend, which avoids this multi-week deployment problem that we were talking about at the beginning. It lets us speed up experimentation. It enables backwards compatibility. Supports better consistency between the different platforms. More importantly than all of this, which I didn’t mention, it actually forces some really good best practice for our developers, because it forces us to more decouple the different components in your UI. Rather than having these larger monoliths that you end up with as you build out mobile applications, where everything’s somewhat tightly coupled with each other. By forcing it to the backend, you have a contract, ok, this component is defined entirely by the backend, therefore, I can test it in isolation. I can run it in its own sandbox. It cannot talk to things outside of its little subdomain. This helps a lot with testability, binary size growth, reuse, so on, so forth. The really important thing, though, is you have to actually have a plan for where your feature or product is going to be evolving over time. If your feature isn’t going to experiment, why not build it natively? If it’s going to experiment, and you don’t really know what you’re going to build, maybe you should use something a lot more flexible, like WebViews. If you do know where you build, then now’s around the time where you start to define how much flexibility you actually want to use. A lot of people, like I was saying earlier, think of this as a panacea, “I can now do anything I want.” That’s true, but it comes with a cost.

At Uber, we were able to 10x our feature velocity on about two dozen features by adopting SDUI. I want to be very clear that that required a lot of effort at the design phase where we actually tried to talk with product, design, other engineers, to understand, what is the expected lifecycle or evolution of this specific feature? Do you expect to see a lot more experimentation? What types of experimentation do you expect to see? For example, we had some features where they wanted to do a lot more experimentation purely on the UI piece but the data itself was relatively static. Great, you can do that by just using the server driven UI piece of this, but doing all the bindings natively. You don’t have to control those from the backend. Likewise, we had features where we wanted to actually do the opposite. You want to have a well-defined UI, but we wanted to change the data that you’re rendering, whether that’s showing an ETD instead of an ETA, or something like that. In those cases, we let developers mix and match and choose to only do the server-side data while continuing to use their existing native layouts. This mix and match approach allowed us to experiment very quickly without going down the path of forcing everyone into a one-size-fits-all solution, which was greatly appreciated by our developers. Because no one likes to be told, “We have a great solution for you. Rewrite the entire app in a brand-new technology stack, and in two years, once you’re done with that, then you can experiment. It’ll be awesome.”

One last note that I want to quickly hit on because we’ve been talking a lot about performance and stuff like that, is, when is SDUI useful? Obviously, it’s useful for experimentation. Another one part that it’s useful for which most people don’t think about is personalized experiences. This is very important as the world becomes more personalized. Everyone expects that. What do I mean by that? I’m going to use Uber as an example again. A couple years ago, if you opened up the Uber app in Dubai, you could call a helicopter to come pick you up. If you did that in London, you would come call a boat to come pick you up. These were really cool but ultimately marketing ploys, but I still liked them, that we supported. The problem was that you ended up with all that functionality built into the binary that every single person in the world downloaded. If any of you had the Uber app and you open the app, and you’re like, why is this 150 Megs? It’s because you have the ability to call a helicopter that’s not turned on, the ability to call a boat that’s not turned on, and all this other stuff that’s not turned on. You’ve got dozens of different credit card providers, tons of different functionality that you can’t use. By moving this to the server, you get the performance gains from doing that. I can now say, this user is only going to use a certain piece of functionality. Great, I’m going to ship it down to them. Overall, I personally enjoyed building server driven UI technologies. I’ve personally enjoyed using them. I find it very useful when used in a targeted or limited fashion. Part of the reason I don’t actually talk about specific technologies in this presentation is because what I’ve personally found is that different teams end up with such different requirements and different native hooks that they need to bind in with. For example, I might need to bind in with Google Maps, or Apple Pay, or whatever, these are native components. It ends up being actually easier right now for teams to build out their own server driven UI frameworks and adopt it to what they need.

Questions and Answers

Participant 1: I’m part of the mini app group working with WTC. I think that this kind of technology, so the server-side renderings are essential for creating super apps and mini apps. There are many advances in China about this kind of technology. I was wondering, you didn’t mention any particular framework, but do you have some particular framework already done to be exposed to some open source program [inaudible 00:45:11]? Because it seems like the versioning stuff, the protocols, the actions is a lot of work that we’re traversing or doing that, and we are trying to find some sunny path. I was just wondering if you already have some [inaudible 00:45:33].

Chao: At Uber, we actually had that exact conversation about which piece of this do we want to open source? How much, and when do we want to open source it? Some of the challenges were things like, ok, the design library that we use, the SDUI piece, we specifically built that to work with the Base design system that Uber has. Other pieces like the actions and the data versioning problem, we actually built those to be standard. I don’t know where we are right now. Open sourcing that was something that we had discussed, because you’re absolutely right, a lot of these problems are not unique to us. We spent a ton of time researching all the other work that other teams were doing, because it’s a very interesting area in the mobile community.

Participant 2: Can you please elaborate, any technical possibility to break a store contract with Google or Apple. How do you deal with that part?

Chao: Apple and Google tend to frown upon this idea that you can change the behavior of that, compared to what they’ve actually tested when they’re approving the binary. It’s absolutely true. What we’ve ended up finding is that as long as you tell them in advance, these are the different permutations that we’re going to be supporting. Here’s actually test accounts that you can run in those specific scenarios. They’re pretty much ok with that, because almost every company in the wild does experimentation, Facebook, WeChat, Netflix. Everyone does experimentation. The only question is, obviously, on the spectrum, how much experimentation are we talking about? Are we talking about different UIs or different data? Also, because some of the stuff that we were doing, we could prove it wasn’t Turing complete, they were a little bit more ok with us doing that.

Participant 3: We already use server driven UI, together with WebView and native code. In our case, we decided to create a decision tree, where you should use one or the other. It’s very hard for you to enforce the people who are making the right decision, by just looking at it, and on the long road, they find people using stuff that is harder for you to maintain because they wanted to test something new. How did you view this problem in your experience?

Chao: We faced the same problem. I don’t think I did a great job of dealing with that specific problem. It was relatively similar. We came up with a decision tree, but we also tried to talk a lot about the different tradeoffs that you would actually get from making certain decision paths. We spent a lot of time working. I just said that there’s about 20 or so features that were using it. Probably about 40 features actually came to us and we actually sat down with every single one of them as the core team and said, you’re going to be one of our first partners. It’s imperative that you have a great experience but we also learn what you want to do, and we can help refine the decision tree for everyone else. It was very much a white glove service approach at the very beginning where we would actually sit with them, understand what they’re trying to build, understand why they’re trying to make those decisions, and then try to codify those patterns into this decision tree. By doing that, not only did they get a good experience, they also became evangelists for us across the company, and they could actually tell other people, “It was great for this. It was not so good for that.”

See more presentations with transcripts

Source

May 3, 2024

34 31 minutes read