Common Sense Driven Development

Nowadays every day or week we’ll getting new framework or tool everyone is hyped about. https://dayssincelastjavascriptframework.com/ is a great example of trolling JS people about that. Development is a lot about this new and exciting technologies but day to day life is not as simple as using the cutting edge, shiny things.

The double edged sword of Cargo Cults

For the definition I’ll fall back to good old Wikipedia:

(…) attempt to emulate more successful development houses, either by slavishly following a software development process without understanding the reasoning behind it, or by attempting to emulate a commitment-oriented development approach (in which software developers devote large amounts of time and energy toward seeing their projects succeed) by mandating the long hours and unpaid overtime, when in successful companies these are side-effects of high motivation and not requirements.

As managements issues are important, I’d like to focus more on first part of the definition.

There are from time to time new tools and practices released and world is getting crazy. I’d say React.js is one of them. Other may be Netflix Cloud tooling or good old Docker and Kubernetes on the Dev/Ops side.

And don’t get me wrong, I like them all. The difference between what you can use and what to use to make your project successful. It’s context of making decision being more important than decision itself.

Having technology solving your problem is great but you may fail because of very steep learning curve. Tool may be not supported in few months or new version will be released and you’ll have nice and shiny legacy code even before release.

What to look for

  1. Make sure you’re not trying to use the same hammer for every nail – there Is a lot of technologies and some are better in some tasks than other. Like PHP and multithreading or long running processes. You don’t want to do this to yourself. Maybe better solution will be to get people to learn a bit of Java of node.js to make this subsystem?
  2. Support – is the library you want to use “mainstream” enough for you to use it and be sure it will still exist in few years. From other hand ask yourself if you really need to use library for some very simple functionality you can write in about 20 seconds.
  3. Learning curve – Check with your team new solution can be understood and implemented correct way. As an example I can take CQRS and Event Sourcing, which are quite complicated topics and used mostly in enterprise environments. Anyway people often think it’s silver bullet for their problems and going through with it. Often they are right but as it needs time for people to learn about it’s problems it’s better to take middle ground and tart with just emitting events before switching to ultimate solution.
  4. Look at yourself first – There is a lot of companies and a lot of ideas. None of them is a silver bullet. There are also old, “bad” ideas. Like monolith. And those bad ideas are good in some cases. Like when you have quite big application to write in small team.
  5. Take authorities with grain of salt – aka Cargo Cult of the person. It happens when opinion of one person becomes opinion of the community. You know examples of that from global politics. And I’m not saying those people are wrong. They are just preaching one solution which they like. And it doesn’t really matter if it’s correct solution of programatically correct. Their acolytes will quote them in every meeting. Argument of need and correctness of the solution will be pushed back because of argument of well known person having opinion.

There is only one correct answer – it depends

I’d assume there is as many styles of coding and tools as developers in the industry. Some are better than others. Some are evolving and getting better and better. Some are legacy at the idea level but still generating revenue for the company.

Bottom line is that there is no single answer to a problem. Context of the problem changes everything and I think it’s the most important thing to look at when making technical and process decisions. And then choose which hyped tech use in the next project.

CPS #1 – Spring Boot, Logback configuration

This entry is part 1 of 1 in the series Copy/Paste Snippets

Welcome to Copy/Paste Snippets!

It’s the first of the series of articles without much narration and a lot of code snippets. Goal is to have nicely searchable list of snippets of commonly used classes and/or configuration files.

If you find error or improvement please let me know and we improve it :)

pom.xml

Configuration file src/main/resources/logback.xml

Use it in you class

Imports:

And at the top of your class:

From PHP to Java

So I’m a Java developer for some time now and I’d like to share with you few things about the process I went through. I want to point few differences in the process, libraries and way of working between the languages.

A bit of background

I’m a senior developer and I was writing Java for some time already before I switched. Usually side projects but not that small. Turns out, as I was already writing quite complicated code structures in PHP 5.6/7 there was no big change when compared PHP to Java.

Another nice surprise was Symfony and Doctrine to be PHP clones of Spring and Hibernate. Knowing both frameworks helped me a lot with catching up with RAD in Java.

To be hones a lot of currently hyped tools for PHP (not Laravel) are based on Java tools. Just like composer being replacement for Maven/Gradle.

So what’s the difference?

First is deepness of the language. There is way more than associative array. Java, for each collection, has multiple implementations with different characteristics so you need to learn what is what.

OO model has also way more to offer. At the beginning you can play with PHP style of composing classes but when you have enum as first class citizen of the language you just want to use it :) Inner classes, static classes and I haven’t event started talking about streams or generics. On top of that there is concurrency, parallelism and need, or lack of need, of being thread safe.

A lot of stuff and I’m still discovering new keywords of the language from time to time.

Is it that bad?

Not really. Writing PHP in Java is very easy. It works a bit worse the other way around. Simple Spring apps are easy and fast to write. As I landed in the team of experienced developers it took me few weeks to learn what I was missing and is used in day to day work.

I think the biggest change for me was move from Composer to Maven. As PHP don’t require a lot of build features Composer is nice and simple. Maven on the other hand is a 20 years old monster with XML configuration. It’s not bad but it takes a while to get used to. After some time and few mistakes it’s pretty straight forward in use and there is a lot of tutorials.

Documentation

At the beginning of every journey into new language you’ll follow the Stack Overflow Driven Development methodology. And it’s working very well in Java as it’s one of the most popular languages in the world. 99.9% questions you may ask will be answered there.

When it comes to libraries it’s more hit and miss. Sometimes documentation is amazing and you can use it to basically copy/paste base solutions. Otherwise you end with digging through big manuals written in very corporate way and without many examples. From time to time you’ll probably end up reading the code as well. In my opinion tho, it’s the best solution anyway. I learn a lot faster by reading other people code than by reading dry manuals or books.

Tooling

As I mentioned before, Maven and Gradle are monsters of build. You need them as it’s not only about FTP files to the server ;) You need to compile your code and run it as binary package. Learning curve is steep but in few weeks of daily tasks you’ll learn a lot. Or at least enough to know where to look for template for new project.

IDE is great help. As you have a lot to choose from for free, like Eclipse of Netbeans, I say inteliJ from JetBrains is simply the best. Community Edition let you work on Java for free as well but without some features. I own my own licence for Ultimate version and it’s the best £20 I’m spending a month. Especially as a polyglot, where I’m using quite few languages supported by it.

What next?

Kotlin! You probably heard about it when Google made it one of officially supported languages for Android. It will soon be supported fully in Spring. With full interop with Java and more Scala-like markup it’s amazing tool to code. And it’s fun.

JVM has also Groovy, Clojure, Scala and a lot of other languages to play with. They will let you learn more about coding and still use your new favourite platform.

Summary

It’s more a journey than travelling from place to place. As I learned quite few things about language and libraries there is still a lot to do. Stay tuned. I’ll post another part in few moths on the next iteration of my learnings.

Serializing metadata and domain messages

Once upon a time you’re getting a task where you need to deal with data used in the domain process and some which are irrelevant for it. First approach is use them in the main model of event or request but there is a better way.

Creating an event

Let’s take a look on how to serialise data in the language agnostic way for the event bus in our functional system.

Of course we need payload of our event which holds all the important data about what happened in our system. Let’s then serialise it to JSON.

 

Can you tell me what just happened in the system? I don’t think so and the same will apply to any other system.

What we need is a description of the payload we’ve just seen. One important thing we’d need to figure out how to deal with this payload is event name.

Question is – where to put it?

Step 1 – Adding a field

It’s the most obvious way and the worst in my opinion.

 

The one good thing about it is that it’s simple and requires the least effort at the start. It can give you quick win when you prototyping or changing legacy system.

On the downside is mix of infrastructure and domain data in your payload and strange behaviour where you need to read field first and then you’ll know how to handle the rest of the payload. Another is possible name collision (can be avoided by prefixing with “_” etc) and difficulty in extending by new fields.

Step 2 – Extracting name outside of the payload

Let’s take a look how can we improve the event payload so we separate infrastructure and domain related data.

 

As you can see we’ve made some improvement as the event name is not a part of payload of the event and they live on different paths. Deserialization is a bit easier as after reading just one field we can dum the “payload” into our favourite tool to transform JSON to object in our favourite language.  So far so good.

The only problem we can encounter on our way is when we’ll need to add more of metadata as the root of our message will grew and it may be hard to model when not always all the data are needed.

Step 3 – The header way

Let’s step back for a moment and let’s take a look at HTTP protocol. each message, being just a text, contains a lot more than just payload.

 

First line defines protocol version and status of the response. Then we can see headers of the http request containing all the informations needed for the client and not being part of the content to be consumed. At the end we have content of the response which is separated from headers by empty line.

Back to our system and event bus we can take similar approach and define our event as headers and payload. Headers will contain all the information needed for infrastructure of our operations where payload will contain only the domain data.

 

As you can see, by separation you can add a lot of information around your business logic keeping separation of concerns in check. I’ve used HTTP-like headers in my example as they are quite common knowledge and they should be easy to understand for everyone who work with web technologies.

Headers implementation details

In my example I’ve used headers as a part of JSON document sent through the bus. Depends on what you will use you can use this solution or use native message headers.

Kafka with all the features, and with how amazing it is, is solely focused on payload of message. There is no feature of headers, so you have no other choice than embed header in the message itself as I did.

Other story is with AMQP queues, like RabbitMQ, where you can use AMQP message structure and separate headers on the bus level. Similar story when you use Spring Integration framework where you can use headers on all the messages and pass additional information alongside the data.

Conclusion

Messaging is not only about payload of the message. Metadata, here called as headers, are very important part of the communication. Maybe they are not crucial for your domain code but I’d say they are required for your systems to work properly at scale. Separating them will give you a lot freedom and ease of change and complexity of a bit trickier deserialization is small price to pay for the value you’ll get.

CQRS explained – slides

Yesterday I gave a talk about basics of CQS and CQRS. It was a part of Code Mastery meetup I’m co-organising.

Here are my slides. Video should come up soon as well.

Modular monoliths

We all know and we all worked with monolithic application. One big codebase which, in time, is looking more like hairball than real application. Usually at this time we want to rewrite it to microservice architecture. But I think first thing we can take a look is how to write better monoliths.

 

Why do I even think about building monolith?

There is few reasons why monoliths are so common. One is that they are easy to manage and deploy. They are easy to reason about as well. There is one project, one place where things happen. When change is happening you know exactly where it will happen and how to test it. At the beginning of the project, especially in the startup environment, it’s faster to develop when you don’t need to think about issues related with distributed systems.

 

So where it went all wrong?

Usually monoliths are written in a hurry. Features are done quickly and without proper planning. Everything is created in one application as it is one big bounded context.

Dependency tree, as well, is all over the place because of that. If you want to use some model you are just using it. Because it’s in the same codebase, right?

I think this is the sole reason why monoliths are going wrong.

 

How can we do this better?

Alright. So we have microservices and monolith. Let’s think how to get advantage of the best parts of both architectures and keep ourselves in single codebase.

 

Single responsibility principle

One of the best parts of distributed systems is that every service has only one responsibility and it’s doing it right. Similar to Single Responsibility Principle from SOLID. What it means on service level is that every service has it’s own data, contracts and encapsulates bounded context of what needs to be done.

 

We can use this in our monolith with ease. All we need to do is to separate each module/component/bound context into separate package of code. It has it’s own models, it’s own data and contracts in the same way microservice have.

Same way we can do with interface modules representing departments or group of users using our application with adapters for specific modules they need to use and nothing more.

 

Integration

As we have our modules nicely separates we need to let them talk to each other as usually there is a lot of cases where more than one is involved.

As it comes to straight forward calls we can use anti corruption layer (adapter interface) to deal with it. All we know is some method in some service and we don’t really care about implementation. We’re safe when it changes as all we need to do is change the class implementing mentioned interface or create new one.

As it comes to sharing data (what we touch next) we can simply use events, as Event Sourcing do, just on code level. You can even implement or find event bus which will take care of event propagation in your system. It will be synchronous but in monolith everything is.

 

Data

In micro services architecture every service holds it’s own data. Having monolith we have one database to deal with. Even if it may look interesting to split database it’s adding unnecessary complexity to our simple modular application.

Better thing to do is to prefix tables with module name and keep prefixed tables to depend only on each other. What it means is that joins, for example, can be done only inside specific prefix namespace. When there is need for data from different part of the system we need to make in code call to it.

If we really want to join, as we do this pretty often, we can use Event Sourcing pattern of materialised views, where module is reacting to public event of other one. Just like microservice reacting to event from global event bus.

Some may say it’s data duplication. I’d say data depend on context. For authorisation system you need user’s email and password where for billing you need way more. Keeping only one User representation is holding you back when it comes to change in one module or other.

 

Putting it all together

Keeping microservices architecture inside monolith may give you best of both worlds. One codebase, one server, one database from monolith and ease of change and domain relevance from distributed systems. Simply use patterns used by microservices replacing tools and network calls with in-code communication.

It should get you through the mono part of the project and get you simple way to migrate to separate codebases where you can simply extract package after package and change implementation of adapter interfaces from method calls to network.

Designing REST API

With current trend of Microservices REST APIs are taking leading position as implementation of them. There are always things that need to be done in a synchronous way. Public interfaces are also a thing and REST is the most obvious way to expose your functionality to 3rd party users. In this article I want to show you result of my research on how to design REST API to last and be comfortable to work with in longer time.

HTTP and this thing we call contract

Let’s start from the very beginning. HTTP is our transport layer. Simple text protocol with request and response.

Status codes

Awful amount of times I saw APIs with 200 as response code and real status of operation in request body.

One can say it has everything what’s needed on client side to react in case of error but it’s not that obvious. Principle of least astonishment is one of the most important terms related to this topic. There are millions of developers around the world and probably quite few in your organisation. Everyone has some idea about how API should work. Probably most of them know what is 200, 201 Created and 400 bad request. And everyone knows what 500 is ;)

Only this, speaking in the same language makes life easier and removes one thing to learn when trying to integrate. With status codes you are also enabling yourself to create strategies of responses. Successful one will contain your resource data. If validation fails you can say client sent Bad Request and express this by returning exactly that – HTTP/1.1 400 bad Request – with response body optimised to show errors. I’d also say that it’s what most of clients of your API would expect from you.

Headers

When you take a look at HTTP response those are next after status code. Role of headers is to provide metadata for your request and response. What is metadata? Everything not directly related to what your API should do. Authentication, current user, correlation IDs, some additional tokens. I saw a lot, and did in the past, authentication data as _token: “value” . I have underscore so I know it’s not part of contract, right? Not really. Now I need to model all the representations to have this field. What worse I need to deserialise body to get this information when it may not be needed because of authorisation error after that.

One more thing what’s happening here but not in very visible fashion is non functional requirements creeping into business logic. Because it’s easier to deserialise everything we’ll have this token in our DTO and then we use this generic field to make decisions. And then when we want o change way security works we’re screwed because _token is all over the places in our code. OK, so let’s use headers for everything what is not related to domain of our API. If you need something non standard feel free to use X-Custom-Header for fun and profit and to get your data around.

Here also quick mention of Content-Type and Accept headers as they got recognised in most of cases and there is not that much need to rant about them ;)

Contract

Let’s talk now about what your API really does. Body of your response. In common language it’s called Contract. Basically because it’s agreement you’re making with your clients on how you will communicate. I mentioned headers previously for non business specific data. Body should be only about business domain of your API. Possibly as short as possible. sometimes it’s hard but ask yourself question if all those data are really needed. I’d say also to try to keep nesting as minimal as possible as traversing big graphs of objects is pretty annoying in long term. Important thing is also to avoid exposing internal data. But let’s talk about this more in next chapter.

Resources

Tutorials showing you to dump your entities as a JSON straight from database are just wrong. If you just want to save your data via HTTP and keep no logic in API – create repository in you code and use database. Adding network overhead is pointless and your logic will sit in the clients so you’re not getting any centralisation of logic.

What is Resource then?

For me it’s related to aggregate roots from Domain Driven Design. Those are entities of your business domain. Entity is an element of your system which hold it’s identity. Usually it’s ID. And on top of that it’s ID you’ll search for when asking for data about something.

Let’s take example of Customer. You’re interested in your customer and can think about many operations related to her. On the contrary Address usually isn’t that interesting on it’s own. We’re talking about it as a Value Object. let’s take a look how it could look like.

Even if your data model stores address in separate table with IDs you will never ask about it without specific context of entity it belongs to.

On the API level we’ll model our resources as URI. Each resource has unique ID so when we combine it with it’s name we’ll get unique access point to data about it. GET http://api.lol/customer/UUID1234 will return

And as we said Address is sub resource of Customer. So following is a way to access only it’s data. GET http://api.lol/customer/UUID1234/address which will return

HTTP verbs

Resource is a noun. As REST says we use verbs to interact with it. Those verbs are GET, POST, PUT, PATCH, DELETE etc. GET is about reading data. POST creates new Resource (and returns 201 Created if succeeded) PUT changes resource PATCH makes partial change DELETE removes resource.

Those are used as a another part of HTTP language of interactions. With those we can do all the CRUD operations on our entity. If your API and business case are simple enough you are probably good to go.

Business logic in REST API

I think the CRUD approach to API is wrong. Especially with mentioned before exposure of data and business logic being on client side instead of inside the service.

To hide behaviour inside the REST API we can do two things. Use resource as a business process or use “resource functions”.

Business process as a resource

This concept comes from world of CQS where write and read operations are separated. This world also lacks concept of mutability data. All you can do there is add more data. So first thing you’ll do is dropping support for PUT, PATCH and DELETE request types. This will block 3rd parties from modifying data as they want. One of the major reasons of business logic leaking to the clients. We’re left with only GET and POST request types. With command-query separated it’s all what we need. GET requests are querying for data and POST is dispatching commands to our system. Thing I struggled to grasp was how to model URIs in this type of API. Turns out what you can do is to treat command as separate resource.

This way we’ll separate our code and access point to it. It is also easy to see and document what API is doing. From other side think what could you do with this endpoint called with GET? For example show all successful requests for reporting purposes. Pretty neat if you ask me. And everything in one place.

Functions

Other approach my friend is using is based on idea that resources has functions. Those functions live as subresources of entity they’re interacting with. Previous example done as a function would look like this:

Good thing about it is that it keeps context of resource function interacts with so it’s quite easy to reason about it. Not being fully compliant with REST specification is a downside. As I showed before subresources can be shown in the body of resource. Having function here would implicate we will show this list of operations as a field and usually we don’t need it to be a case.

Anyway I think both approaches are good. Main goal we want to achieve is to hide business logic and it can be done nicely in both styles.

Hypermedia as the Engine of Application State

It sounds complicated and it has strange acronym (HATEOAS) but after all it’s just about using another feature of common web in our API. This feature is linking between resources and other operations. Let’s take example of list of the customers. One of the most pointless examples in all of the tutorials is having it mapped 1 to 1 with findAll()method in repository. It will never be a case you’ll show list of all customers assuming you’re out of development environment. Feature you need here is pagination.

Without hypermedia

So let’s add optional parameters to our api page_number and num_per_page. Defaults are set respectively to 1 and 20. So far so good. Clients are implementing this feature and everything is working. Let’s take a look at interface of common pagination. We usually show links to previous and next page. But currently we don’t have information about page at all. One time we’re sure there in no next page is after call we make to API. When empty list or pagination error will be encountered we’ll stop pagination. All of this logic must sit everywhere around in Client code even when all this data are available from API perspective.

Hypermedia in API

How can we do better? We can use something like HAL – Hypertext Application Language. Instead of putting parameters in the documentation we’ll add another key in our response – _links – which will hold URLs for purpose of our pagination.

Now our clients need to read this field and call endpoint presented as value there to get next page. Easy. seems like a not that a big change but now we can determine if there is previous or next page based on API call. With metadata structure we can even more details about endpoints like templates for URLs to specific pages. Important thing is that clients are completely unaware about our naming convention and URL schema. We can change URI of next page, change names of pagination related variables without breaking the client.

Hypermedia and POST requests

Similar thing can happen for requests where we want to create new resources. Take a look at https://api.github.com. At the home page you can see all the different things API can provide. All you need to do is to follow the links to find what you’re interested in. Just as you do on all the websites. Again, as with pagination example, we’re freeing API clients implementation from knowledge about detailed structure of API.

Here we have example of customers endpoint which is telling us we can perform two operations. Just create Customer or Register Customer. Let’s say requirements for both processes are different but for sake of time we had put registration in the same API. From the client perspective we’ll call /customers first to get list of available operations and then we send our payload to register endpoint. Everything is working fine.

Introducing new API

Now we want to add registration API as turns out it’s not concern of Customer to register all other data. Let’s think first what happen if we have our link to operation hardcoded in client library (we did at least that). After our updated library is released we need to update all of the client applications with new version number and redeploy them. If we miss one we’ll have a problem. If you’re working in bigger company you’ll also mess with other teams release plans. Also because of not being able to make change in one click you need to keep old endpoint working for some time until migration finishes. And knowing life there will be new feature coming which need to be implemented in both places and one of the team can’t release because of reasons.

Just one small change and you got yourself and half of the company in trouble. Now let’s take a look what will happen if you use contract presented in previous example:

All we changed is one value in one place in one deployment. Now everyone who wants to register customer will be sent to another server and his job will be done. You don’t need to know who is using your service to make changes. Other teams have also time to migrate to call directly new service. No dependencies over what is necessary.

On chattiness of hypermedia APIs

One argument I hear a lot agains this kind of solution is that instead of one direct call you need to do many and having quite a few from end user interaction to finished process it adds a lot of time to the whole process. In general it’s true. it’s also tradeoff between speed and flexibility. As with everything in software – there is no silver bullet. What we can do is to mitigate effect of the multiple calls on client level. At least a bit.

What we’re doing is discovery of URI for resource we want to call. Most of the time it will be the same. So why not to cache it locally. If cache is hit we’re going straight to URI, if miss we’ll do few additional calls to get it. In case of 404 returned by current cached URL we’ll do the same. As long as we keep links path in place we’re guarded agains any issues related to moving endpoints between applications. At some point we may even retire API and instead of providing any functionality just return _links values to new APIs. Application is dead but everything is working fine. It’s not happening that often :)

Design and documentation

I’ve put those two together because they should live together. Too often documentation is a black sheep of the process which is left alone for the last step and sometimes not even touched for several months after changes are made. Been there; done that ;)

Open API standard

Over the years many people tried to create one standard to create API specifications. Today the most widely accepted and proven is OpenAPI. What it does is to provide easily writeable JSON markup to describe API contracts. As it’s very wide topic I won’t go into specifics. you can check all the details of current specification here .

What is important is to use it before you start creating your APIs. Sit down with your team and people who will use your API and have a brainstorm on what data and operations are needed from your new app. During this process create JSON document which will describe what you’re discussing. Later you can put it in repo so it’s visible for everyone (you can say it should be first commit) and if something changes along the way of development everyone will see changes made.

This way of creating APIs also enables parallel work between API and client teams as contract is discussed prior to work. Nice thing is also possibility of creating client libraries from JSON document you’ve just created. And last but not least – beautiful documentation.

Swagger

As Swagger website states:

Swagger is a simple yet powerful representation of your RESTful API. With the largest ecosystem of API tooling on the planet, thousands of developers are supporting Swagger in almost every modern programming language and deployment environment. With a Swagger-enabled API, you get interactive documentation, client SDK generation and discoverability.

I mentioned before some of the features before. Now I want to touch on documentation and Swagger UI which simplified way documentation is created to the level where there is no way back ;) Swagger UI creates web documentation from your OpenAPI specification document. And not only that. Documentation website created from it is fully interactive. Besides providing all important information it allows you to test API in real time providing all what it does in copy/paste format. Creating Client libraries is a blessing as everything can be verified by client team before first line of code. Also your QA team will be very happy as they can now focus on their work more than on typing endless JSONs in Postman. This makes road to production way faster and makes everyone in business happy with working systems.

Putting it into practice

Practice makes perfect. All the ideas mentioned in this article will stay ideas until you start working with them. As I’m always saying – there is no silver bullet. Especially in our industry which is changing on the daily basis. Try to use them, evaluate if they are good for your scenario and let other know what was working for you.