martes, 16 de diciembre de 2014

...how to make a successful Open Source project?

I’ve spent my last year at my current employer working with opensource projects, and trying to make them successful, and after one year I think I can have some ideas on common pitfalls that need to be avoided to make a project survive for some time. Here are some ideas that need to be worked on if you are on the Open Source.

Share the project goals, roadmap, vision,…​

One of the most important topics regarding an open source project is that it needs to be clear what it is for and how it will grow over time. I’ve seen many Open Source projects where you can see in their home/docs what the project is about, but not what is the roadmap for the project.
If I were a company trying to make something on top of an Open Source project, I not only need to care on what the project does today, but what will do in the future, as this future might be very different than mine, and in that case I would not risk into that project.
One example of a project that I like how they show this is apiman

Having a strong community

This means that communication between the project committers and any contributor or user (community) has to be agile, and that needs to be paid very special attention.
I’ve seen some projects where questions lay around on the forums without answers forever. And of course, this can happen, but can not be the very majority of the questions, as it might look like as there is no interest in helping people adopt the project. Once a project has been adopted, people tend to contribute more, as they are in a necessity rather than on a inquiry.
An example project is Fabric8

Making the project look alive

A homepage that looks like we are still in the 20th century is not healthy. A project that releases a version every year, is not healthy. A project where forums are not active with questions and answers flowing around is not healthy.
Usually a project has daily contributions of code. The site, the docs also needs of this daily contributions. If the code is very active, but the site/docs are staled, then it will be difficult to adopt the project.
An example of this is Openshift

Advertise

We are in the era of the web 3.0, very social, so we need to share information of the project as much as possible, share anything about the project that anybody mentions, make them show up in twitter, re-tweet, make interesting people follow it, blog about it, as there are people that are tendencies makers, so use them in favor of your project. I know people that have hundreds/thousands of followers in the IT industry, so make them help you if you can.
An example can be Wildfly

Demo, sample, show how to use, document, either through videos or blogs, …​

I went to the website to one project I wanted to contribute, and documentation was awful, there where 4 videos from 2 years back, from a very old release, there where no demos that could be used, they had a very small amount of quickstarts of very little value, as they where very trivial. At the end, it took me more than half a year to be productive with the project. Recently, I have been following a project where they had very good video/tutorials showing the usage, a docker image to start using the project right away, a lot of meaningful demos using that docker images to demonstrate some meaningful use cases. I think that after a week I can use the project. Of course, I can not develop, but so far I understand all the concepts behind the project from a user perspective.
If I where leading an Open Source project, I would put every committer to spend one hour a week to write a blog, record a video or whatever could be done to make as much information available as possible. Usually forums are full of use cases, missing samples/scenarios/quickstarts that can be written in that one hour. A committer should be productive enough to have something valuable in that one hour.
Other options that I’ve seen are having an "Evangelist". But so far, I tend to think that if a person is not contributor/committer of a project it is very difficult for him to be able to show how it works, or the business value, so if you want to go the Evangelist way, put him to learn the code :-D
An example can be Docker

Work with companies that want to adopt, so adopt a company to adopt your project

Companies are the best resourcing that you can have. If you can make a company adopt your project, and you work very closely with them to realize their use cases, they will probably spend on the project dedicating resources.
You need to have more than one company, or their use cases will become your use cases, so you need to stay focus on your goals. If you adopt a small amount of companies to support your project you’ll be very healthy.

Conclusions

Open Source can mean many things, but at the end, the most important one is that you need people to work with you, so you need to work for these people also. Yo need to attract them to you, as there are many open source projects to chose from, so why yours?
And of course, if you are a company using Open Source as the foundation of your projects, you need to invest and take all of these information into action. And once you have customers, contributors to your projects, you are devoted to them, so don’t fail them, as trust is very difficult to regain.

jueves, 13 de noviembre de 2014

...use a Proxy for speeding up docker images creation

Sometimes it is very convenient to use a proxy (squid or any other) to speed up development. When creating an image you might download some packages, maybe, change some steps, which require you to redownload thos packages. To ease this a bit, you can use a proxy, and instruct docker to use that proxy.
I started looking at configuring a proxy for the docker daemon, but once I had it finally working I realized that it was proxying only the images I downloaded from the internet, so not too much benefit. Anyway, it is documented below.
I then tried to hook a proxy between the build process and the internet. After some hours, I got to this nice post from Jerome Petazzo. His work, linked on github is more advanced what is mentioned in that post, and the docs are not very clear, so I will summarize it here, and comment a small issue that I had on my Fedora 20 (docker 1.3.0).

Proxy for images

Here is a description of the steps required to use a proxy with the daemon.

Install the proxy and configure

Installation steps are quite easy, just use yum in Fedora to install squid:
$ yum -y install squid
In fedora (20), squid config file is /etc/squid/squid.conf. We will configure for our usage.
Configuration is dependent on your preferences, this is just an example of my configuration preferences.
  • Uncomment cache_dir directive and set the allowed max size of the cache dir. Example:
cache_dir ufs /var/spool/squid 20000 16 256
sets the max cache dir size to 20 GB.
  • Add maximum_object_size directive and set its value to the largest file size you want to cache. Example:
maximum_object_size 5 GB
allows to cache DVD-sized files.
  • Optional: Disable caching from some domains. If you have some files/mirrors already on your local network and you don’t want to cache those files (the access is already fast enough), you can specify it using acl and cache directives. This example disables caching of all traffic coming from .redhat.com domain:
acl redhat dstdomain .redhat.com
cache deny redhat
  • start Squid service:
$ service squid start
We will not start squid on boot, as we do only want to use squid for Docker image development purposes.
  • Make sure iptables or SELinux do not block Squid operating on port 3128 (the default value).

Configure Docker to use a proxy

By now, we will have squid running on port export 3128 (default). We just need to instruct docker to use that while the containers go to the internet for things.
You need to establish en environment variable to the docker daemon, specifying the http_proxy.
In fedora 20, you can modify your /etc/sysconfig/docker configuration file, with the following:
HTTP_PROXY=http://localhost:3128
http_proxy=$HTTP_PROXY
HTTPS_PROXY=$HTTP_PROXY
https_proxy=$HTTP_PROXY

export HTTP_PROXY HTTPS_PROXY http_proxy https_proxy

# This line already existed. Only lines above this one has been added.
OPTIONS=--selinux-enabled
Now you need to restart the daemon:
$ systemctl daemon-reload
$ systemctl restart docker.service

Create images

Now, if you get an image, it will get proxied. If you delete it from your local, and want to fetch it again, it will get it now from the proxy cache.
This might seem as it is bt a big benefit, but if you have a local lan, you can use this to have a proxy/cache for the HUB (or registry).

Proxy for images build contents

As I said before, it usually is mor interesting to proxy what will be in the images you are developing, so if you invalidate a layer (modify the Dockerfile) next time will not go to the internet.
Following Jerome’s blog and his github what I did was:
I cloned his github repo to my local:
$ git clone https://github.com/jpetazzo/squid-in-a-can.git squid-in-a-can.git
And then I run:
fig up -d squid && fig run tproxy
You need fig, but who does not have it?
Then you just need to do a normal docker build. The first time every download will get into the "squid" container, and the later times will be fetched from there. While doing this, I hit an issue. I do not realy know if it was in my environment, in any Fedora 20/Docker 1.3.0, or any of them. The issue was that I was getting a unreachable host. It turned out that in my iptables I had a rule that was rejecting everything with icmp-host-prohibited. I solved it removing those lines from iptables.
I used:
$ iptables-save > iptables-original.conf
$ cp iptables-original.conf iptables-new.conf
Commented out his lines in iptables-new.conf
-A FORWARD -j REJECT --reject-with icmp-host-prohibited
-A INPUT -j REJECT --reject-with icmp-host-prohibited
And load the new iptables conf:
$ iptables-restore < iptables-new.conf
I also opened a bug in squid-in-a-can github to see if Jerome’s has an answer to this.

Options

Now there are 2 options, as the container created this way stores the cached data in it, so if you remove it, you remove the cache.
  • First option is to use a volume to a local dir. For this, edit the fig.yml in the project’s source dir.
  • Second option is to use your local squid (if you already have one), so you only need to run that second container, or only the add/remove iptables rule:
    • Start to proxy (Asuming squid is running):
iptables -t nat -A PREROUTING -p tcp --dport 80 -j REDIRECT --to 3128
  • Stop to proxy:
iptables -t nat -D PREROUTING -p tcp --dport 80 -j REDIRECT --to 3128

jueves, 6 de noviembre de 2014

...proxying a SOAP service with 3 faults in SwitchYard

This document describes a problem that I’ve faced with SwitchYard due to one of it’s known issues/features/limitations.

Problem

I needed to create a proxy service for a SOAP Web Service where I had a contract with 3 faults.

public interface CustomerListServicePortType {
    public CustomerListResponse getCustomerList(CustomerListRequest parameters) throws
            GeneralError, CustomerNotFound, InvalidUserCredentials;
}
SwitchYard has one limitation, it only accepts contracts with one Exception type (as well as only accepts one single input type). When I created the initial service for this, and deployed my service, I had SwitchYard telling me about this:
org.switchyard.SwitchYardException: SWITCHYARD010005: Service operations on a Java interface can only throw one type of exception.
One option could be to modify the contract, but as this is to proxy a legacy service, I need to maintain my contract, so I looked into various options, out of which I’ll describe the one that was the easiest for me.

Solution

I created an internal Contract, for my service, to only have one single Exception:
public interface CustomerListServicePortType {
    public CustomerListResponse getCustomerList(CustomerListRequest parameters) throws
            CustomerListException;
}
Use transformers to map from and to the original exceptions to my new "unique" exception. As when doing SOAPFault handling, what really gets marshalled/unmarshalled is the FaultInfo, I decided to keep the original FaultInfo in my new Exception:
import org.w3c.dom.Element;

public class CustomerListException extends Exception {

    private Element faultInfo;

    public CustomerListException(Element cause) {
        faultInfo = cause;
    }

    public Element getFaultInfo() {
        return faultInfo;
    }
}
And my tranformers where so simple, that I was happy not having to deal with DOM parsing and Element, and all that stuff.
public final class ExceptionTransformers {

   @Transformer(from = "{http://common/errorcodes}invalidUserCredentials")
   public CustomerListException InvalidUserCredentialsToCustomerListEx(Element from) {
      CustomerListException fe = new CustomerListException(from);
      return fe;
   }

   @Transformer(from = "{http://common/errorcodes}generalError")
   public CustomerListException transformGeneralErrorToCustomerListEx(Element from) {
      CustomerListException fe = new CustomerListException(from);
      return fe;
   }

   @Transformer(from = "{http://common/errorcodes}customerNotFound")
   public CustomerListException transformCustomerNotFoundToCustomerListEx(Element from) {
      CustomerListException fe = new CustomerListException(from);
      return fe;
   }

   @Transformer(to = "{http://common/errorcodes}customerNotFound")
   public Element transformCustomerListExToCustomerNotFound(CustomerListException e){
      return e.getFaultInfo();
   }

   @Transformer(to = "{http://common/errorcodes}generalError")
   public Element transformCustomerListExToGeneralError(CustomerListException e){
      return e.getFaultInfo();
   }

   @Transformer(to = "{http://common/errorcodes}invalidUserCredentials")
   public Element transformCustomerListExToInvalidUserCredentials(CustomerListException e){
      return e.getFaultInfo();
   }
}
These transformers gets registered as Java transformers (due to the @Transform annotation).
And everything works like a charm

martes, 4 de noviembre de 2014

...Docker layer size explained

When you create a docker image, the final size of an image is very relevant, as people will have to download it from somewhere (maybe internet), at least, the first time, and also every time the image will change. (At least will have to download all the changed/new layers).
I was curious about how to optimize the size of a layer, cause I read at some time that docker internally used a "Copy-on-Write filesystem", so every write that you made while creating a layer was there, even if you removed the software.
I decided to validate this, and to explain how it works, and how to optimize the size of a layer.
I have 3 main tests to validate the concept, using the JBoss Wildfly image, available on github as a base. But as this image, is composed of 2 base images on top of fedora, plus the wildfly image, I decided to merge everything into one single Dockerfile.

Test 1 - Every command in a separate line

This first test, demonstrates how every command creates a layer, so if you split commands in separate lines, you end up with many more layers, plus many more space being used.
The code for this dockerfiles is available on github:

Image sizes
The conclusion to this is to avoid creating unnecesary layers, or combine shell commands in docker commands, like multiple yum install && yum clean

Test 2 - Uncompressing while downloading vs remove downladed file after decompressing

In this test, I wanted to test whether the "copy-on-write" meant that even if I removed a file, it still occupy some disc space. So for this purpose, what I did was uncompressing a file while I was downloading it directly from the internet versus saving that file, decompressing it and then removing it.
The code for this dockerfiles is available on github:

Image sizes
The conclusion is that in terms of size it is the same, if it is done in a single docker command.

Test 3 - One single RUN command for most of the stuff

In this test, I have modified the mage description, to only contain one single RUN command with everything in there.
The code for this dockerfiles is available on github:

Image size
The conclusion for this test is that the benefit we obtain when having s simple layer is not so big, and every change will create a whole new layer, so it is worse on the long run

Overall conclusions

These are the summary of the conclusions I have made:
  • Layer your images for better reusability of the layers
  • Combine all the yum install && yum clean all that you’ll have in an image in a single RUN command
  • When installing software it has smaller footpring to download (via curl) tha to ADD/COPY from local filesystem as you can combine the download with the install and removing stale data.
  • Don’t combine commands in a single RUN more than needed as the benefit in terms of size can not be huge, but the lose in terms reusability it is

viernes, 24 de octubre de 2014

...make JBDS faster for SwitchYard

While working with SwitchYard, JBoss Developer Studio can be a pain in the ass. Red Hat is working to provide some better user experience, but in the meantime, you can try some of these tips.

Increase heap memory

Eclipse needs a lot of memory, and maven and SwitchYard projects even more, so provide with a good amount of it to your eclipse. Modify jbdevstudio.ini in the appropiate section:
-vmargs
-XX:MaxPermSize=256m
-Xms2G
-Xmx2G
-XX:-UseParallelGC
-XX:+AggressiveOpts
-XX:-UseConcMarkSweepGC
Provide a good amount of memory, so if you can give 3 or 4 GBs instead of 2 better.

Disable automatic updates

Faster startup time. You’ll update or check for updates whenever you want.
Preferences --> Automatic updates --> (Disable) Automatically find new updates and notify me

Disable auto build

If you build whenever you want, the project you want, then disable. If you have a big ammount of projects you can skip having eclipse doing background building all the time. If you have few projects, you can keep it.
Project --> Build automatically (Uncheck)

Refresh workspace on startup

If you don’t do things on command line, then your workspace should be refreshed. If you use git (command line) or maven (command line) maybe you want to keep it:
General -> Startup and shutdown -> Refresh workspace on startup (Enable)

Disable validations

If you there is a lot of background task processing gone on validating your project (due to any facet your project has, like JPA, …​)
Validation -> Suspend all validations (check)

Disable startup features not needed (FUSE, …​)

Use the fewer plugins needed for your work.
General -> Startup & shutdown ->  Plugins activated on Startup (Remove FUSE, Fabric8, JBoss Central, Forge UI, JBoss Tools reporting, Equinox autoupdate)

Disable XML Honour schemas

There is a known bug in JBDS and SwitchYard, so avoid it with:
XML -> XML Files -> Validation -> Honour all XML schema locations (uncheck)

Close JPA and EAR projects if not working on them

Every project that you have opened is both, eating resources and having to be check with the background tasks, so close them if not needed (as dependencies or at all)