OpEd: OpenStack wishes for Monasca

This is my first attempt at an editorial post for my blog. I usually just sprinkle in my opinion in whatever I’m writing, but you can see on this site I don’t blog much anyway. But this subject was prompted by a conversation in the Monasca IRC meeting this week (http://eavesdrop.openstack.org/meetings/monasca/2018/monasca.2018-09-05-15.00.log.html) so I thought I’d take a stab at it.

In some ways, this is my wishlist for Monasca and how I would like to see the project get better.  In some ways, it is a bit of a gripe about Telemetry.

Summary: better advertizing of ‘official’ projects, small but active projects can still be useful, and project consolidation.

Quick history:

Ceilometer started in 2013 as the way to gather metering data in OpenStack that could be used for billing/chargeback/rating (whatever you want to call it). But Ceilometer from the start made clear statements that they did not want to be a billing system and that was left for another project or product. CloudKitty for example can consume these collected meters and create rating reports. Ceilometer has changed over time and now has been split into Aodh (alarming) and Panko (events) all under the Telemetry project, and has spawned off Gnocchi as a separate Time Series Database that left OpenStack (that probably deserves its own rant).

Monasca was designed for “Monitoring at Scale” around 2014. HP had designs on selling big clouds and had requirements for monitoring the large deployments, thus made some big contributions to Monasca. It has proven useful and a number of companies now contribute, including SUSE, Fujitsu, StackHPC, and others.

Back in 2014, engineers at HP decided the Ceilometer stack was not scalable enough (same problem that eventually lead to the creation of Gnocchi), but Monasca was scaling well. After some attempts to work with the Ceilometer team directly which were rejected, HP decided to contribute Ceilosca and use the ceilometer agent to publish meters to the Monasca API. This has worked well and been used by a few companies, and even is supported by CloudKitty.  As you can imagine, it has also been a bit of work to keep Ceilosca in sync with the changes in Ceilometer (including deprecation of the Ceilometer v2 API when it was dropped in favor of Gnocchi) and Monasca API.

A few things projects struggle with in OpenStack:

Visibility and Mind Share and “officialness”

Ceilometer managed to position itself well early on as the metering solution for OpenStack. Ceilometer continues to have a lot of mind share in OpenStack, though many consumers seem to be unaware of the changes going on in the Telemetry project.

Monasca doesn’t seem to have ever gotten that level of ‘officialness’, likely because it was initially seen as a ‘single vendor’ option from HP. The single vendor thing bugs me a little – I’ve been seeing discussions in the openstack-dev mailing list that seem to indicate no one should be using a project that only has contributors from one company, and that single vendor projects need to be labeled by the Technical Committee. But that is going to be increasingly difficult as contributions shrink (see next point). And look at Telemetry (only RedHat contributors) and CloudKitty (only Objectif Libre). Monasca now has a broader base of contributors than Telemetry.

Contributors

This is a big one now in 2018. In the OpenStack community, we are seeing a ‘maturing’ which unfortunately means many companies who have previously made big contributions are looking at their commitments vs their return on investment and reducing their contributions going forward. This can be very painful for a project when bugs are left unfixed and support requests unanswered because there aren’t enough contributors left.

Case in point – Telemetry. The Telemetry project is really down to just two people from RedHat who are mostly focused on Gnocchi development and have their own roadmap for Ceilometer. Aodh and Panko are unstaffed and unsupported. The current roadmap for Ceilometer seems to be simplifying it by removing any “unused” features and focusing it as a way to gather metrics to feed to Gnocchi (not an OpenStack project).

See also blog posts from April 2018
https://julien.danjou.info/lessons-from-openstack-telemetry-incubation/
https://julien.danjou.info/lessons-from-openstack-telemetry-deflation/

One thing I don’t know if OpenStack is well equipped for is the rotation of engineers.  Many big companies don’t expect an engineer to work on the same project for their whole career.  Many engineers (especially in the Silicon Valley) jump from one company to another fairly regularly.  So it should be expected that any contributor may leave a project.  But I haven’t seen a lot of effort in capturing the specialized knowledge a long time contributor may have, or in finding ways to deal with the lack of a backfill for a contributor.  The simple answer for this might be documentation, but I wonder if there is a way to facilitate the “what happens if you leave?” conversations in each program before it becomes a problem.

Working across projects

OpenStack is a collection of interdependent projects, yet it has always taken effort to get the projects to work together. Some of that is just the nature of development and engineers feeling like they have enough work in their own project.

Sometimes it is political. The history between Telemetry and Monasca goes way back to 2013 (before my involvement, so much of what I know is third hand). The lore is that the HP team working on Monasca tried to work with the RedHat team working on Ceilometer but were told to go take a hike. That left a wall between the projects which has continued on. Earlier this year, I tried to simplify things by taking the Ceilosca publisher portions and contribute it back to Ceilometer (https://review.openstack.org/#/c/562400/) and was blocked. The excuse was that Telemetry project didn’t want to support any new code (unfounded – this was a proven publisher from Ceilosca, Monasca team wants to continue to support that publisher, and the new Prometheus publisher was just added by someone else) and that they didn’t have time to review it. But there was definitely a sense of politics and that the Telemetry team didn’t want anything to do with a Monasca related feature. (see email exchange between jd and witek in openstack-dev 20 June 2018 and http://eavesdrop.openstack.org/irclogs/%23openstack-telemetry/%23openstack-telemetry.2018-05-29.log.html)

Sometimes it is just a matter of humans not having enough attention span or time. At the Dublin PTG I became more aware of other projects like Watcher and Vitrage that do some interesting things based on data collected in the cloud. These are projects that could benefit from the data and alarming features in Monasca. But here we are at the Denver PTG and nothing has happened between the PTGs to move in the direction of more cooperation between monitoring and health tracking or resource optimization.  We humans are busy with doing all the tasks involved in creating a new release of our OpenStack distribution and fixing bugs (and taking a 3 week vacation) so taking on new interactions with another project aren’t high on the priority list.

Support

Projects with few contributors are going to be lacking in support.  It is somewhat the nature of Open Source development – it is easier to get someone to develop new software out in the open, but you often need to pay someone to do the unsavory work and support it.

http://eavesdrop.openstack.org/irclogs/%23openstack-telemetry/%23openstack-telemetry.2018-07-25.log.html  (“we are shrinking ceilometer feature to our own needs”)

http://eavesdrop.openstack.org/irclogs/%23openstack-telemetry/%23openstack-telemetry.2018-08-31.log.html (panko unmaintained)

Round it all up

So what do we do about all this?  Some things are just part of the nature of this large collection of projects we call OpenStack.  Here are a few wild wishes.

  • Better advertisement of projects within the community.  I know there is a lot of information and documentation out there about all the projects (some outdated, but that is a different problem), but sometimes it is a matter of getting that idea to someone to prompt them to go learn more.  I learned that I might be interested in Vitrage based on seeing it on the schedule at Dublin PTG and by word of mouth.
  • Recognition that sometimes single vendor is ok.  With contributions shrinking, if there is active development and support for a project and the project team isn’t stuck on their own desires, that may be great.
  • Simplification of projects.  While throwing away features in Ceilometer makes me a bit uneasy, I admit it might be the best thing for Ceilometer going forward to be as simple and easy to maintain as possible. (See, I’m not just all negative about Telemetry).
  • Consolidation of projects.  This might be the most radical notion.  But if we want OpenStack projects with similar goals or big dependencies to work well together, maybe they should be consolidating.  I considered staging a coup in Telemetry and trying to bring it under the Monasca project.  That may still be a good idea (and oddly gordc gave me that idea).  It may be worth looking at the efforts and goals of projects like Watcher, Congress and Vitrage and finding a common program that could make a better OpenStack solution.
  • And of course, I think we all wish for more contributors.  But that is business – if a company sees an area where they want value they will make contributions there.  We just hope that doesn’t mean that an existing feature has to go unmaintained until it becomes broken and painful enough that someone has to invest in fixing it.

Since this was a long ramble, I may come back and revisit some of these items later.  Feel free to contact me if you think I completely got something wrong. 🙂