Menu

Wednesday, December 14, 2016

Lessons learned from using IoT devices in the real world

This post was originally published on Red Hat Developers,  the community to learn, code, and share faster. To read the original post, click here

As a consultant, I’ve spent a decent amount of time working on a full stack development project in the realm of IoT. Over the years, our teams have run into a lot of avoidable issues. Here are some lessons I learned from using IoT devices in industry.
I define IoT as “connecting a device to a larger system with the goal of that device providing information to the system that is then leveraged in some way”. This can range from something like a FitBit to a Tesla to a smart fridge — all of which connect and report information to a cloud or back end, making them “internet of things” devices.

Lesson 1: Start Small

Although starting small should be done in all development I found this to be even more important once devices are involved. As a developer, changing something after the fact no longer means “just a code adjustment” — it also means changing a device’s configuration, or even its embedded system. So how do you start small?
First, get your device connect with a simple mock service. Mocks are easier to control and set up than complete systems.
Second, add security to the service.  Once that is done, write tests, error handling, and documentation for the base code.  This ensures a few things: your foundational code is ensured to be solid and properly tested (so you won’t need to go back later and change it); security will be in place early, allowing for confidence in trusting the device; and you’ll have a framework set up to use the development patterns you’ve laid out.
With the project I worked on, we did not start small everywhere, like we should have. This led to a lack of testing, error handling, and documentation all around.
In addition, trying to “go too fast” led us to take short cuts like putting passwords in urls that devices used to connect to the system. Of course, this all had to be changed later to make the system production ready and led to more work not only in the code, but also on the devices themselves. These are examples of things you want to do right, starting small, from the beginning.

Lesson 2: Define What You Can

Another major issue I ran into on this project was a lack of clear requirements. Without knowing how a device should respond to a particular situation, it is difficult to build an integrated system. Focus on getting requirements around who holds ownership for what (in this case, “what section of the system” holds ownership).
For example, who is the owner of a device’s configuration or who is responsible if a bad message comes through? Also, consider what type of messaging requirements exist (zero loss, in order, no duplicates) and the impact of that on the larger system.
Consider the situation where a user’s account balance is at {{the_content}}. What happens if they try to use their card? If it is a debit card it would get denied, but what if it is something like a train pass? How should the farebox react if it’s lost it’s data connection to the system for a moment? Knowing that up front will help to build a more efficient IoT system. Without defining requirements the system won’t know how to react and the development and testing teams won’t know what to expect.

Lesson 3: Check Capabilities

When integrating devices as IoT, it is important to check capabilities of the frameworks being used, as well as capabilities of the devices themselves. How stable of an internet connection is reasonably expected of the device? What type of storage does it have? What build in components can be used from the software framework chosen? Answering these questions will help avoid scrambling to fix issues down the line.
One area my project could have dug into deeper was our frameworks. We re-wrote functionality in libraries that our frameworks provided us with from the start. Rather than using the ones built, we duplicated functionality simply because we did not check.
After moving the system into production, we realized our messaging framework (as well as the storage on our devices) was limited. The messaging framework could not handle the message sizes being sent to it, and messages were getting rejected. We had to scramble to figure out a quick solution when this could have been avoided had everyone on the project known about the limitations of the frameworks and devices we’d selected.

Lesson 4: Trust Your Device

The lesson I find to be most crucial from my experience with IoT is the importance of trusting the device. What happens when the overall system thinks one thing and the device tells it another? Trust the device.
Why bother connecting a device to a system to gather and send information, if the system isn’t going to trust and use that information? That being said, trusting the device showcases the need for security.
In order to fully trust what a device is sending to the cloud, the communication between the two must be secure. Otherwise, the system cannot know that the information being sent is not garbage — or worse, compromised.

Consider FitBit. Why bother buying a FitBit if you did not want it to tell you how many steps you took? You buy a FitBit to tell you how active you have been and you trust it by default. This is also how the system should work. Tesla gathers information when the driver runs it in autopilot mode. Then, that information is used by Tesla to make autopilot better. If Tesla did not trust the information being sent to its cloud system, autopilot would never improve.

Persistence vs. Durability in Messaging. Do you know the difference?

This post was originally published on Red Hat Developers,  the community to learn, code, and share faster. To read the original post, click here

Messaging is a critical aspect of integrating systems, and while there are many different messaging platforms and infrastructures, a common request is for “zero loss of messages.”  From there, the terms “Persistence” and “Durability” often get thrown around, but what do those two things really mean?

Persistence

At a basic level, persistence means that when failure occurs during message processing, the message will still be there (where you found it the first time) to process again once the failure is resolved.
Take JBoss Active MQ for example. In AMQ we have brokers that do the communicating of the messages. For simplicity’s sake, let’s assume we only have a single broker doing the communication to and from a queue. Should this broker be shut down while a message is in the queue, ready to be processed, once the broker comes back up the message will be processed normally.
So how does this work?  In order for messages to “persist” they must be stored somewhere other than just broker memory.  Depending on the platform this could be a temporary folder, a database, a log file, etc.
Now, why would anyone not use persistent messaging?  Well, for one thing it tends slows things down. Otherwise, maybe some messages are okay to lose in the event of a broker shutting down, and it’s not worth the complexity. 
If you think of messaging in the context of status checking, the system may want to periodically ensure that a device is up and running. The device send a status message every few minutes. In the event of a broker restart without using persistent messaging we may lose 1 status message, but that might not be a problem since another message is probably on its way — in this case some data-loss may be acceptable.

Durability

Queues and Topics are important parts of messaging (particularly JMS). A queue by itself is great for point-to-point messaging, often one producer to one consumer. Topics on the other hand are most often used when you have a single producer (or multiple for the same purpose) and many consumers.
A common pattern is to have the producer send the message to the topic and then have the queues subscribe to the topic. This allows each queue to receive its own copy of the message. But what happens to the message if it is sent to the topic, but no queues are online (remember our queues subscribe to the topic)?
This is where durability comes into play. When a durable subscription is set up between a queue and a topic, the queue can be offline when the message hits the topic. Once the queue comes back online, the message can be received.
If the subscription is non-durable, then any messages received to the topic while the topic subscriber is offline will not be received by the subscriber (in this case the queue).

Preventing Message Loss

So what do we need to prevent message loss?  If you are using both queues and topics, then using both persistent messaging and durable subscriptions is your best bet. This will ensure you have a back up of the message in case of broker failure and that your subscriptions will always receive the proper messages. Just remember that certain messaging systems, such as Amazon’s SQS and SNS, may not support durable subscriptions.

Thursday, May 12, 2016

Persistent Custom MDC Logging in Apache Camel

This post was originally published on Red Hat Developers,  the community to learn, code, and share faster. To read the original post, click here

Logging is an ubiquitous need in any production quality application, and one common scenario is to log the active (logged in) username, or to log the user and order IDs for customer order event details. This is typically done to create an audit trail so that issues can be more easily traced should something go wrong, but there are any number of reasons why you might decide to create a custom log.
Mapped Diagnostic Contexts (MDCs) in Apache Camel are great for creating custom logging statements, and will easily meet our needs for these use cases.  MDC is offered by both slf4j and log4j, and is also supported by JBoss Logging. (Apache Camel is a part of the Red Hat JBoss Fuse integration platform.)
In addition, you can use something like GELF to automatically index any MDC, thus allowing them to be easily searched using ElasticSearch (logging configuration is not required for this feature), so there are a few reasons why this might be an appealing solution.
This article will demonstrate how to set up MDC to perform custom logging.

Using MDC

MDC primarily stores a key-value map of contextual data (strings mapping to strings), and the context map key can be added to your logging format (logging.properties, or other logging configuration), like in the following example (orderId is the context key):
%d{HH:mm:ss,SSS} %-5p [%c] %X{camel.routeId} | %X{orderId} | (%t) %s%E%n
Adding an MDC in Java is as simple as :
MDC.put("myKey", "myValue");
With JBoss logging MDCs persist until you remove them.  However, due to a bug, this is not the case for Apache Camel on Karaf. Camel itself does provide some persistent MDCs by default, and these can be found here http://camel.apache.org/mdc-logging.html.
Now, what if you want your own custom MDC to persist on Camel/Karaf?  If you have a route that is processing an order, logging the order ID throughout the route flow would be very helpful when troubleshooting issues in production. Unless we can persist the MDC, this isn’t going to work.

Making MDC Persistent

Luckily, all you need to do to get your custom MDC values to persist is extend Camel’s MDCUnitOfWork.  At a minimum you will want your extension to look something like the example below that shows your custom “orderId” MDC.  You can also extend the clear() method, and others if you desire, but the methods below are the basic ones you’d need to get the job done.
public class CustomUnitOfWork extends MDCUnitOfWork implements UnitOfWork {
  public static final String MDC_ORDERID = "orderId";
  private final String originalOrderId;

  public CustomUnitOfWork(Exchange exchange) {
    super(exchange);
    this.originalOrderId = MDC.get(MDC_ORDERID);
  }

  @Override
    public UnitOfWork newInstance(Exchange exchange) {
    return new CustomUnitOfWork(exchange);
  }

  @Override
    public AsyncCallback beforeProcess(Processor processor, Exchange exchange, AsyncCallback callback) {
    return new MyMDCCallback(callback);
  }

  /**
  * * {@link AsyncCallback} which preserves {@link org.slf4j.MDC} when the
  * asynchronous routing engine is being used. * This also includes the
  * default camel MDCs.
  */
  private static final class MyMDCCallback implements AsyncCallback {
    private final AsyncCallback delegate;
    private final String breadcrumbId;
    private final String exchangeId;
    private final String messageId;
    private final String correlationId;
    private final String routeId;
    private final String camelContextId;
    private final String orderId;

    private MyMDCCallback(AsyncCallback delegate) {
      this.delegate = delegate;
      this.exchangeId = MDC.get(MDC_EXCHANGE_ID);
      this.messageId = MDC.get(MDC_MESSAGE_ID);
      this.breadcrumbId = MDC.get(MDC_BREADCRUMB_ID);
      this.correlationId = MDC.get(MDC_CORRELATION_ID);
      this.camelContextId = MDC.get(MDC_CAMEL_CONTEXT_ID);
      this.routeId = MDC.get(MDC_ROUTE_ID);
      this.orderId = MDC.get(MDC_ORDERID);
    }
  }

  public void done(boolean doneSync) {
    try {
      if (!doneSync) {
        // when done asynchronously then restore information from
        // previous thread
        if (breadcrumbId != null) {
          MDC.put(MDC_BREADCRUMB_ID, breadcrumbId);
        }
        if (orderId != null) {
          MDC.put(MDC_ORDERID, orderId);
        }
        if (exchangeId != null) {
          MDC.put(MDC_EXCHANGE_ID, exchangeId);
        }
        if (messageId != null) {
          MDC.put(MDC_MESSAGE_ID, messageId);
        }
        if (correlationId != null) {
          MDC.put(MDC_CORRELATION_ID, correlationId);
        }
        if (camelContextId != null) {
          MDC.put(MDC_CAMEL_CONTEXT_ID, camelContextId);
        }
      }
      // need to setup the routeId finally
      if (routeId != null) {
        MDC.put(MDC_ROUTE_ID, routeId);
      }
    } finally {
      // muse ensure delegate is invoked
      delegate.done(doneSync);
    }
  }

  @Override
  public String toString() {
    return delegate.toString();
  }
}

Accessing the MDC

Now how do we use this? If you are using Spring, getting your CustomUnitOfWork in use is easy. First implement the UnitOfWorkFactory, like below:
public class CustomUnitOfWorkFactory implements UnitOfWorkFactory {
  @Override
  public UnitOfWork createUnitOfWork(Exchange exchange) {
    return new CustomUnitOfWork(exchange);
  }
}
Then create your Spring bean:
 <bean id="unitOfWorkFactory" class="com.redhat.example.CustomUnitOfWorkFactory"/>
Once this is all in place, you can verify your UnitOfWork is in use for your bundle by checking for a log statement starting with ‘Using custom UnitOfWorkFactory:’ followed by your class name.
MDCs will persist in logging throughout the Camel route.  Keep in mind there are some exceptions to this. The use of SEDA, or something else that would normally act as a start of a brand new route, will clear the context for that Route.  Your custom MDCs are set up to be treated the same way as the camel.breadcrumbid (a unique id used for tracking messages across transports,) so you can think of it that way.

Should I learn OSGi? What’s the point?

This post was originally published on Red Hat Developers,  the community to learn, code, and share faster. To read the original post, click here
Recently, I have been hearing a lot of debate around whether it is worth someone’s time to learn OSGi.  Doing a simple Google search on “OSGi usability” returns results filled with phrases such as “not easy to use”, “unproductive”, “developer burden”, and “going away”.  However, you will also find that it solves a lot of common issues in the JVM, particularly issues around class loading.  So is learning OSGi worth your time?

What is OSGi?

OSGi is meant to solve common class loading issues that are seen in traditional Java EE environments; it is a set of specifications that are used in the creation of jars with extra manifest information to define dependencies and class loading behavior. These “special” jars are called “bundles”, which are the primary packaging structure for OSGi-enabled applications.
Think about a large Maven project, for example. Often you will come across dependency chains where multiple versions of the same dependency are built into the same application – your system then will choose the dependency that is listed first and load that one (typically alphabetically.) This can cause behavior that is different between development and production, if the ordering changes at all between the two environments. Wouldn’t you rather be making that decision yourself, or leave it to a skilled architect?
In an OSGi application, if your service is exposed to 2 different versions of the same package you are required to specify which version to use. Conflicts must be resolved, meaning your build, or potentially the startup of your service itself, will fail  (depending on where the dependency issue is).
OSGi is meant to be modular, and bundles typically have high cohesion and loose coupling.  Each bundle performs its own function, and another will do something different. Bundles are encouraged to interact to each other through exposed and imported types on the class path, and through services (shared instances of a class with a managed life-cycle.)

Benefits of OSGi

  • Modularity – Code is broken down into very small chunks, making it easier to reuse code later.  In addition, many third party applications already provide bundles that can be used in your application.
  • Less downtime – OSGi containers such as Karaf allow for the stopping, updating, restarting, etc of a single bundle without taking down the whole application (assuming that it is properly designed). In addition, multiple versions of your code can run at the same time on the same container. Keep in mind if you are exposing your services on URLs, the URLs for each service version must differ; otherwise, you’d never know which version of the service you are trying to hit.
  • Isolated class loading – If you build a service that needs one version of `org.json` and build a second service that needs a newer version of that same package, you can have both versions deployed in your container at the same time without causing class loading conflicts. Since the versions for each dependent bundle will be specified and loaded as defined in their individual manifests, OSGi will ensure that the correct version is provided to each.
  • Fine-grained import/export control – When creating a bundle, both import and export packages are defined.  ‘Import’ packages are what your bundle pulls in from other bundles, while ‘export’ packages are what your bundle makes available to other bundles that depend on it.  Instead of having all the code in your bundle exposed to other bundles, you have control to pick and choose what is visible to your consumers.

Downfalls of OSGi

  • Developer burden & Learning curve – I mentioned this above, and it is truly a downfall worth mentioning; learning OSGi is not easy. It takes time to understand how class loading works, and why you are running into dependency chain issues.  For those coming from a Java EE background, the learning curve can be steep.
  • Compatibility of Spring Dynamic Modules (DM) – The use of Spring is fairly common in enterprise projects, so compatibility with Spring is important. Although the use of a pure XML spring implementation is supported in Fuse 6.2, the annotation based implementation of Spring is only supported through Spring 3.x, not Spring 4.  In Spring 4, Spring DM was removed.  This means that getting an annotation based Spring implementation using Spring 4 to run on an OSGi container is not only a pain, but nearly impossible and unsupported.
  • Technical risk & Client burden – Not all architects, production support groups, or clients are willing to take a risk on OSGi if they know nothing about it.

How is OSGi related to Micro-Services?

Recently there has been a lot of buzz around Micro Services being the future of integration.  While that may be true, OSGi is the gateway to micro-services.  What is OSGi’s focus?  Modularity, class loading control, and small services that have a very focused purpose or even just a single purpose.  That sounds a lot like micro-services.
The main difference is that micro-services are expected to each run in their own containers allowing for more control over development and updates, whereas an OSGi container might be host to multiple services.

Why is OSGi relevant to Red Hat?

Red Hat’s JBoss Fuse is an open source, lightweight, and modular integration platform with a built in Enterprise Service Bus (ESB). Fuse has traditionally run on Apache Karaf, an OSGi container implementation, which has historically meant that most consultants and developers working with JBoss Fuse had to also learn OSGi; however, Fuse 6.2 allows the ESB to run on Karaf or JBoss EAP – a Java application server with a light-weight modular core. OSGi is not required.
This is not without limitation, as running on the EAP Camel subsystem places some restrictions on Apache Camel components – some are and some are not compatible. The JBoss R&D teams have put some effort into fully impementing OSGi in JBoss EAP, but there has been some debate over whether or not this will continue.

Conclusion

Learning OSGi is definitely worth your time, just don’t get caught up in the details (at least in the beginning.) First, focus on learning something that takes advantage of OSGi,like Camel.  It has flexibility to run in both OSGi and Java EE containers, so even if you decide OSGi is not for you, you’ll still have learned something useful.
Then, once you have gotten your feet wet with some basics, start slow and with simple examples for OSGi.  Pay attention to the modularity, class loading, and concepts of OSGi rather than looking at how to configure a complex project to run in Karaf. Even if the future is micro-services, learning and understanding more about modularity and class loading will help will all of your future Java development.

Wednesday, March 16, 2016

Camel Exception Handling

Camel has some nice built in components for exception handling.  The onException clause is what I use most often. http://camel.apache.org/exception-clause.html

How to use onException?


The onException clause is fairly self explanatory for the basics.  You are specifying what camel should do in the event of a particular exception.  Should it retry the message?  Should it wait before retrying?  Should it log an error? etc.  

The handled flag is one to definitely pay attention to.  If you specify .handled(true) then your application will not throw an ugly server error when the exception occurs.  However, in situations where you are using something like Amazon's Simple Queue Service (SQS) using .handled(true) would also prevent the message from going back through the SQS queue again.  My general rule of thumb is handle exception that you know about, but keep unknown exceptions as .handled(false) and log them and/or notify appropriate parties of the failure. 

On common approach is to handle know errors and then below that list an onException for the overall Exception class which logs what went wrong for anything unknown. 


More Complex onException Implementations

One of the more complex implementations of onException I have written is below. It was critical in this case to get everything write because my route called a service which did some credit card processing. 

        onException(MyCustomerException.class).handled(true)
            .bean(processor, "incrementRetries")
            .choice()
                .when(header("canRetry"))
                    .to("aws-sqs://mySQSQueue?amazonSQSClient=#sqsClient")
                .otherwise()
                    .bean(processor, "cancelOrder")
             .end();


First, I implemented a custom exception.  This allowed for me to handle appropriately anything that failed within a section of code. Second, I decided to mark the exception as handled, to avoid having SQS retry it itself as I mentioned above.  Before implementing this we saw a large back up in the queue and a ton of reprocessing attempts by SQS itself.  This did not seem very safe considering we were dealing with charging people's credit cards.  

Also, I am keeping track of the number of retries here myself.  There are much better ways to do this such as .maximumRedeliveries(5) , however I needed a way to cancel the order in the event the message had been retried so many times. As you can see if I am allowed to retry then my message gets sent back to SQS and goes through the queue again.  This particular approach allowed for complete control over what SQS did with requeueing and when to stop retrying. 

I was able to set "canRetry" to false if I knew my exception was a result of a terminal exception.  Another approach which looking back may have been better would have been:

     onException(TerminalException.class).handled(true)
             .bean(processor, "cancelOrder');

     //let SQS handle the retries
     onException(NonTerminalException.class).handled(false);

Key Points

- Be careful what you set to handled vs. not
- Creating custom exceptions is your friend 
- onException statements listed first take priority
- When using a queuing service that will automatically re-queue messages be careful about having camel also re-queue them. 

Tuesday, March 1, 2016

Setting the Logging Level for a Package in Karaf

I came across this task the other day and was amazed at the lack of simple documentation around it.  Eventually I figured out the steps and wanted to share with others in hopes that you will not have to go searching for this like me.  

There are 2 simple ways to change the logging level for a package in Apache Karaf.  You can of course get more complex with your configurations and log to different files, but these are the simple way to just change logging level for 1 package and its sub-packages. 

1. Inside the console run:
       log:set <level> <package>


2. Inside fuse/etc/org.ops4j.pax.logging.cfg add the following line
       log4j.logger.<package> = <level>


Tuesday, January 19, 2016

The Lifecycle of a Bundle


When using JBoss Fuse you may be running Apache Karaf as your container.  In this case it is very important to understand what a bundle is and its lifecycle. 

What is a Bundle?
- A bundle is a small module of code. 
- You can think of a bundle as a jar with additional manifest information specifying packages to import and export
- Bundles can contain camel routes, but do not have to
- If you are using Maven to create/define your bundle you will need to specify the import and export packages
- You can make a regular plain jar into a bundle using maven's wrap feature, but this is not recommended if you can avoid it.  Doing this leaves a bit of 'magic' to be done by maven in defining the manifest.
- Bundles should be an independent from one another as possible, without allowing for duplicate code in your bundles.

The Lifecycle
1. INSTALLED - The bundle is installed in Karaf, but may not have all of its required dependencies met.
2. RESOLVED -  The bundle is installed and has all needed dependencies resolved. ***NOTE: This means all known dependencies are resolved, you may still hit run time dependency issues.
3. STARTING -  The bundle is starting up and has all dependencies resolved.
4. ACTIVE - The bundle is started and up and running.  Any services can be hit at this point and any routes will be running.  For a bundle with camel routes you may see both "ACTIVE" and "STARTED" listed as statuses for the bundle in different columns indicating that the routes are running.
5. STOPPED - The bundle is stopped, but still installed.  A bundle which is stopped may or may not have all of its required dependencies.
6. UNINSTALLED - The bundle has been uninstalled.  It will no longer show up in the karaf console.

Notes:
- Understanding the bundle life cycle can help make the development and deployment processes go smoother.  Knowing what each status means greatly helps to nail down the root issue when troubleshooting an error. 
- Remember not all bundles have camel routes.  You can use a bundle for dependencies such as your DTOs.
- When first creating a new bundle it is best to ensure it gets to the ACTIVE stage before moving onto another bundle.  This will help cut down on spending too much time trying to solve multiple dependency chain issues at once.  This is particular true for custom (developed by you) bundles dependent on other custom bundles.