Fixed: ExecJS RuntimeError and FATAL ERROR: v8::Context::New() when deploying Rails to Shared Hosting
I’ve recently been working on setting up a rails app for a site I’m building for one of my side projects. While I’ve mostly been working on this in my spare time, I decided that it would be important, to one day be able to deploy it to a larger audience (like friends and family). After trying to get rails set up on my old hosting, and being told that Rails 3 wasn’t supported. I recently moved to Vexxhost. Two things enticed me: 1) they’re Canadian; 2) Their hosting is bloody cheap.
So after moving my PHP sites over, I got to work on deploying my currently simple rails app. And that’s when I hit a problem.
The Problem ExecJS::RuntimeError in Home#indexShowing /home/jrstarke/apps/___/releases/20120128224905/app/views/layouts/application.html.erb where line #6 raised:
FATAL ERROR: v8::Context::New() V8 is no longer usable (in /home/jrstarke/apps/___/releases/20120128224905/app/assets/javascripts/charities.js.coffee)
Now if you’re finding yourself in a similar situation, Stop. Take a breath. You’re not very far.
The reason you are finding this is because Rails is clever. The .erb and .js.coffee files in application are compiled, compressed, and this all happens dynamically. They call this the ‘Asset Pipeline’.
The problem is that if you’re shared hosting is anything like Vexxhost (and I imagine most that use cpanel are), their servers in their shared hosting are based on efficiency. This compiling of assets dynamically takes time and resources, and really, once you’d deployed it to the server, they don’t usually change much (with the exception of a new version, but then that won’t change much either until the next version).
The SolutionIt turns out that the solution isn’t all that hard, in fact, most of it I found through a great article on the Ruby on Rails site about The Asset Pipeline – In Production Environments.
The trick is that you essentially need to turn off the asset pipeline. During the dynamic compilation on the fly of assets into a more static compilation. Without further ado, to the solution.
First you need to go into your config/application.rb and disable the application pipeline. You will look for the following line:
config.assets.enabled = true
and turn it to false
config.assets.enabled = false
Next, next go to your Gemfile and comment out the following 3 lines:
gem 'sass-rails', " ~> 3.1.1"
gem 'coffee-rails', "~> 3.1.1"
gem 'uglifier'
These are all the changes we need to make, but before we can deploy the application, we need to compile the assets, ourselves. This is where you will use the following line.
bundle exec rake assets:precompile
At this point use whatever tool you normally use to deploy your app (in my case, Capistrano). If you’re using Capistrano, don’t forget to commit the new assets that would have just been created in your public/assets folder.
Next you’ll need to log into your cpanel, or whatever actually controls your app, stop and restart it. Even if your deploy tool (like Capistrano) just said it restarted your app, don’t believe it, just restart it. This part caught me for a few hours trying to fix this bug. Even though Capistrano said it restarted the app, it still seemed to hold onto the old gems (sass and coffee) until I actually restarted.
With any luck, try your app now, and if your system is anything like mine, it will have worked.
Realizing quality improvement through test driven development
Nachiappan Nagappan, E. Michael Maximilien, Thirumalesh Bhat, and Laurie Williams. “Realizing quality improvement through test driven development: results and experiences of four industrial teams“. ESE 2008.
Test-driven development (TDD) is a software development practice that has been used sporadically for decades. With this practice, a software engineer cycles minute-by-minute between writing failing unit tests and writing implementation code to pass those tests. Test- driven development has recently re-emerged as a critical enabling practice of agile software development methodologies. However, little empirical evidence supports or refutes the utility of this practice in an industrial context. Case studies were conducted with three development teams at Microsoft and one at IBM that have adopted TDD. The results of the case studies indicate that the pre-release defect density of the four products decreased between 40% and 90% relative to similar projects that did not use the TDD practice. Subjectively, the teams experienced a 15–35% increase in initial development time after adopting TDD.
In the test-driven development (TDD) chapter of Making Software, Turhan & Co. reported that the evidence for it is mixed: there is moderate support for the claim that it improves quality, and it is not quite clear if this entails a productivity cost. To come up with this conclusion, the authors went through all the papers they could find on the topic—some of them reporting experiments with students, some with practitioners, and with highly varying quality.
In my opinion, one of the stronger papers in their sample was this one, by Nagappan & Co. It reports on four teams, one at IBM and three at Microsoft, and it contrasts TDD vs. comparable non-TDD teams post-hoc (so the study did not bias data collection). As the abstract points out, there were far fewer defects in all four products, though managers at all teams reported an increase in development time.
The conservatism in the Making Software chapter is warranted: there is still conflicting empirical evidence with TDD, as with most other practices in software development. But studies like Nagappan & Co.’s show that TDD is likely to be beneficial. Just note that (at least in the Microsoft teams they studied) “there was no enforcement and monitoring of the TDD practice; and decisions to use TDD were made as a group.” In other words, developers applied TDD because they wanted to, not because of a decree from their manager. Whether it would’ve been as effective otherwise is an open question.
MacBook Pro unable to connect to internet after waking from sleep

I recently I’ve been having some problems with my MacBook Pro. At first, I would seem to get disconnected from the internet randomly. After a few weeks, I finally figured out what the pattern was to the problem.
One morning sat down on the couch, opened my MacBook, and as was becoming normal, I was unable to connect to the internet. The AirPort icon on the menu bar was fully black (meaning that the signal was good).
So I went into the browser and tried to to the router address. To my surprise, I could fully connect to router. I mean fully. I could log in. I could change the settings. I could ping different websites. It all worked flawlessly. But as soon as I went back to my browser and attempted to connect to a website, any browser, it would complain that I wasn’t connected to the internet.
I would turn off the Wi-Fi a few times, and eventually it would connect fully. So then an idea came to me: Maybe it was related to the fact that I just woke it from sleep. I thought back. Every time I could remember, I had just opened the computer, and it would have difficulty connecting.
So I tried it. I made sure I could access different websites. All signs were a go. Then I closed the lid, and waited for the status light to start flashing. After it did, I would wait a few seconds, and then open the lid. I logged in, opened Chrome, and tried gmail. Gmail is currently unavailable it would tell me. So I opened Safari, and tried again. Same story. After going to the AirPort icon turning it off and on a few times, it came back. I decided to try it again. Sure enough same story. So I tried it a few more times.
After having successfully reproduced this exact approach 6 times, I decided it was time to call Apple. I talked to a rep one day, who walked me through a few steps, that failed to get any results, and before we could go through everything, I had to leave for a meeting. I called back a few days later, and got a Wonderful rep. named Ryan. After 17 minutes on the phone, my problem was solved. I was ecstatic.
The SolutionSo the solution to the problem was actually pretty simple. I post it here, so that it might help others (or me if the problem recurs sometime). Remember to back up before hand, as it is possible that you might lose something.
I would also recommend storing your Wi-Fi password on your computer (especially if your password is as complex as mine is).
Here’s what worked for me:
- Open Finder.
- From the Menu at the top, choose ‘Go’ -> ‘Computer’.
- Open your hard drive (probably called ‘Macintosh HD’).
- Open the ‘Library’ folder.
- Open the ‘Preferences’ folder.
- Locate the ‘SystemConfiguration’ folder (likely at the bottom).
- Drag this folder to your desktop (as far as I can tell, this is to back it up incase something goes wrong)
- Once the copy is complete, drag the original to the trash.
- In Finder, open Applications (If you don’t have it on the sideback, use ‘Go’ -> ‘Applications’).
- Open the ‘Utilities’ folder.
- Open ‘Keychain Access’.
- Select ‘login’ from the list of Keychains.
- In the list to the right, locate an object with the name of your wireless network.
- Select it and press ‘delete’ on your keyboard.
- Select ‘System’ from the list of Keychains.
- Repeat steps 13 and 14 above.
- Restart your MacBook.
- Once your computer has restarted, connect to your wireless network.
With any luck your computer will now no longer have issues with your wireless network. This worked perfectly with my D-Link DIR-655 router, on which I could reproduce the wireless network issues flawlessly. Since I performed the steps above, my wireless connection has worked flawlessly.
A Survey of the Practice of Computational Science
A survey of the practice of computational science. In International Conference for High Performance Computing, Networking, Storage and Analysis, pages 19:1–19:12, 2011. (doi:10.1145/2063348.2063374)
Computing plays an indispensable role in scientific research. Presently, researchers in science have different problems, needs, and beliefs about computation than professional programmers. In order to accelerate the progress of science, computer scientists must understand these problems, needs, and beliefs. To this end, this paper presents a survey of scientists from diverse disciplines, practicing computational science at a doctoral-granting university with very high research activity. The survey covers many things, among them, prevalent programming practices within this scientific community, the importance of computational power in different fields, use of tools to enhance performance and software productivity, computational resources leveraged, and prevalence of parallel computation. The results reveal several patterns that suggest interesting avenues to bridge the gap between scientific researchers and programming tools developers.
Several studies of scientific programmers and scientific programming have come out in the past few years [1]. This in-depth analysis, which is based on semi-structured interviews with 114 researchers in science and engineering at Princeton University, is probably the most insightful to date. It explores the languages and tools researchers use, their debugging techniques, the environments they use, their performance tuning strategies, their use of parallelism [2], and many other aspects of their work. While some of its conclusions are unsurprising (e.g., the fact that scientists don’t test their programs rigorously), others highlight fruitful directions for future research—most particularly, the need to integrate performance analysis and tuning tools into everyday programming. More studies like this in other areas would be very welcome.
[1] Disclosure: one of us (GW) co-authored one of these studies, a web-based survey of over 1900 scientists conducted in 2008-09.
[2] Not surprisingly, job-level parallelism (i.e., running a sequential program many times with slightly different parameters) is by far the most common.
Got Issues? Do New Features and Code Improvements Affect Defects?
Daryl Posnett, Abram Hindle, and Prem Devanbu. “Got Issues? Do New Features and Code Improvements Affect Defects?“ WCRE 2011.
There is a perception that when new features are added to a system that those added and modified parts of the source-code are more fault prone. Many have argued that new code and new features are defect prone due to immaturity, lack of testing, as well unstable requirements. Unfortunately most previous work does not investigate the link between a concrete requirement or new feature and the defects it causes, in particular the feature, the changed code and the subsequent defects are rarely investigated. In this paper we investigate the relationship between improvements, new features and defects recorded within an issue tracker. A manual case study is performed to validate the accuracy of these issue types. We combine defect issues and new feature issues with the code from version-control systems that introduces these features; we then explore the relationship of new features with the fault-proneness of their implementations. We describe properties and produce models of the relationship between new features and fault proneness, based on the analysis of issue trackers and version-control systems. We find, surprisingly, that neither improvements nor new features have any significant effect on later defect counts, when controlling for size and total number of changes.
One piece of common wisdom in the software industry is that new code tends to be buggier than old code, because it is immature and more poorly tested. But in this short paper, Posnett, Hindle, and Devanbu present an interesting twist on this. In the open source projects they studied, they found that although code changes in general are associated with future defect fixing activity, as we might expect, those changes that correspond to new feature development and to code improvements are not. That’s interesting and counter-intuitive—one would expect new feature code commits to be among the buggiest. The authors offer a possible explanation: well-established open source projects tend to be quite conservative, and new feature code is heavily scrutinized, so that most defects are found and sorted out before the code is integrated. Which means that projects that are not so careful might experience much more new feature pain.
The Effects of Stand-Up and Sit-Down Meeting Formats on Meeting Outcomes
Allen C. Bluedorn, Daniel B. Turban, and Mary Sue Love. “The Effects of Stand-Up and Sit-Down Meeting Formats on Meeting Outcomes“. Journal of Applied Psychology 84(2), 1999.
The effects of meeting format (standing or sitting) on meeting length and the quality of group decision making were investigated by comparing meeting outcomes for 56 five-member groups that conducted meetings in a standing format with 55 five-member groups that conducted meetings in a seated format. Sit-down meetings were 34% longer than stand-up meetings, but they produced no better decisions than stand-up meetings. Significant differences were also obtained for satisfaction with the meeting and task information use during the meeting but not for synergy or commitment to the group’s decision. The findings were generally congruent with meeting-management recommendations in the time-management literature, although the lack of a significant difference for decision quality was contrary to theoretical expectations. This contrary finding may have been due to differences between the temporal context in which this study was conducted and those in which other time constraint research has been conducted, thereby revealing a potentially important contingency—temporal context.
If there’s one practice that caught on with every software team that calls itself Agile, it’s got to be daily stand-up meetings. If you hold your meetings standing up, the argument goes, they will go briskly, which is great because nobody likes meetings that drag on and on, especially if you hold them daily. This paper provides valuable evidence with respect to the efficacy of stand-up meetings: they are significantly shorter than sit-down meetings, and the decisions taken in them are just as good. Their only downside in the experiment is that participants were less satisfied with the meeting than those in sit-down meetings.
These were all 5-person meetings lasting 10-20 minutes and concerning a well-defined problem. The authors warn: “…additional research is needed to determine whether the stand-up meeting can be used for longer meetings dealing with problems that vary in their structure.”
(Thanks to Laurent Bossavit for pointing me to this paper. If you know of interesting papers that are relevant for software practitioners, even—or especially—if they’re from other disciplines, please send them our way! Also, note that we try to post links to freely downloadable versions of the papers we discuss. Sometimes, as in this case, we found none—but e-mailing the authors and asking nicely usually gets you a copy.)
Programming in a Socially Networked World: the Evolution of the Social Programmer
Since I’ve first blogged about Stack Overflow in February 2011, the number of questions on the Q&A portal has more than doubled (from 1 million to almost 2.5 million), as has the number of answers (from 2.5 million to 5.2 million). According to a recent study by Lena Mamykina and colleagues, over 92% of the questions on Stack Overflow are answered — in a median time of a staggering 11 minutes.
The virtually real-time access to a community of other programmers willing and eager to help is an almost irresistible resource, as shown by the 12 million visitors and 131 million page views in December 2011 alone. Also, as we found in a recent study for Web2SE 2011, Stack Overflow can reach high levels of coverage for a given topic. For example, we analyzed the Google search results for one particular API –- jQuery -– and found at least one Stack Overflow question on the first page of the search results for 84% of the API’s methods.
The access to such a vast repository of knowledge that is just a web search away raises several research questions:
- Will developers who focus on reusing content from the web have sufficient understanding of the inner workings of the software they produce?
- Are web resources going to cover all important aspects of a topic?
- What meta-data is needed to facilitate technical information-seeking?
- How can we address security and copyright concerns that come with using other developers’ code?
In a recent position paper, Fernando, Brendan, Peggy and I discuss the past, present, and future of software developers that have access to an unprecedented amount and diversity of resources on the web. The paper is available as a pre-print, and will be presented at the Future of Collaborative Software Development workshop co-located with CSCW 2012 in Seattle in February.
This is the abstract of the paper:
Social media has changed how software developers collaborate, how they coordinate their work, and where they find information. Social media sites, such as the Question and Answer (Q&A) portal Stack Overflow, fill archives with millions of entries that contribute to what we know about software development, covering a wide range of topics. For today’s software developers, reusable code snippets, introductory usage examples, and pertinent libraries are often just a web search away. In this position paper, we discuss the opportunities and challenges for software developers that rely on web content curated by the crowd, and we envision the future of an industry where individual developers benefit from and contribute to a body of knowledge maintained by the crowd using social media.
Factors that Affect Software Systems Development Project Outcomes
Laurie McLeod and Stephen G. MacDonell. “Factors that Affect Software Systems Development Project Outcomes: A Survey of Research.“ ACM Computing Surveys, 2011.
Determining the factors that have an influence on software systems development and deployment project outcomes has been the focus of extensive and ongoing research for more than 30 years. We provide here a survey of the research literature that has adressed this topic in the period 1996-2006, with a particular focus on empirical analyses. On the basis of this survey we present a new classification framework that represents an abstracted and synthesized view of the types of factors that have been asserted as influencing project outcomes.
Reading this literature review was a strange experience. Despite its 56-page length, and the fact that it was published only a couple of months ago, it manages to miss most of the interesting research in software development of recent years. There seem to be two reasons for this. First, the paper focuses almost entirely on research coming from the Information Systems community, which for reasons I’ve never understood is fairly disconnected from the Software Engineering research community (such as the TSE and ESE journals and the ICSE and FSE conferences). Second, the paper only considers research published between 1996 and 2006. It took me a while to realize this, but most of the exciting developments in our field (such as the link between organizational and code structure, the exploitation of data mining techniques to predict defects, and the rich and detailed qualitative evaluations of Agile practices) have only flourished in the last five years or so, and therefore would be out of scope for this survey.
In any case, McLeod and MacDonell’s survey provides a long list of factors that have been found to affect software projects, along with citations for each of them, and in that sense it is a useful gateway to research on these topics. Just be aware as you read it that, despite its recent publication date, it is fairly dated already.
PS: The paper is still only available behind a paywall, but it may eventually be posted in the authors’ lab site.
Research In Progress: How Mozilla Builds Software
WebFWD recently posted a video presentation by UC Berkeley’s Prof. Homa Bahrami and her student Claire Rudolph, who studied how Mozilla builds software. It’s full of useful insights about how a distributed mix of volunteers and paid professionals builds world-class software without drowning in information, and is a great example of research in progress. We’d welcome pointers to more presentations of this kind.
A decade of research and development on program animation: The Jeliot experience
Mordechai Ben-Ari and Roman Bednarik and Ronit Ben-Bassat Levy and Gil Ebel and Andrés Moreno and Niko Myller and Erkki Sutinen: “A decade of research and development on program animation: The Jeliot experience”. Journal of Visual Languages & Computing, 22(5), 2011.
Jeliot is a program animation system for teaching and learning elementary programming that has been developed over the past decade, building on the Eliot animation system developed several years before. Extensive pedagogical research has been done on various aspects of the use of Jeliot including improvements in learning, effects on attention, and acceptance by teachers. This paper surveys this research and development, and summarizes the experience and the lessons learned.
Like our two previous papers, this one is about software engineering education rather than software engineering per se, but (a) we’re unlikely to improve the latter until we start getting the former right, and (b) education research has always had a strongly empirical flavor, which people studying “grown up” programmers could learn a lot from. What makes this paper interesting for me is that it describes how a specific research program has evolved over more than ten years. Ideas are turned into tools; how people use those tools, and what impact they have, are studied in situ; those studies produce new insights, which are turned into a new generation of tools, and the cycle repeats. Along the way, the researchers evolve as well: they learn how to ask more penetrating questions, and (hopefully) how to iterate more rapidly. Jonathan Weiner’s book Time, Love, Memory does a great job of describing this process at greater length in genetics; young researchers (and those of us who are not so young) can learn a lot about our craft from reading both.
So what does this paper actually cover? It opens with an eight-paragraph summary of program visualization—tools and methods to draw pictures of the states of programs as they execute—followed by a brief discussion of the difference between program animation and algorithm animation. Section 3 then summarizes the evolution of their software testbed, while Section 4 shows readers what it looks like now. Sections 5-10 are the meat of the paper: what do users learn, and what effect does program visualization have on attention (both in the classroom as a whole and at the individual level), on teachers, and on collaboration. Section 11, an in-depth summary of lessons learned. In a way, it’s the whole point of the paper, and everything that comes before it is scene-setting. I wish there were more summaries and retrospectives like this, since every shared insight can save other designers or researchers months of wasted effort going down blind alleys.
Empirical Software Engineering’s Greatest Hits
Here are a couple of videos (the first about 8 minutes long, the second over an hour) discussing empirical studies in software engineering, and why they matter.
Online vs. Face-to-Face Pedagogical Code Reviews: An Empirical Comparison
Christopher Hundhausen, Pawan Agarwal, and Michael Trevisan: “Online vs. Face-to-Face Pedagogical Code Reviews: An Empirical Comparison.” SIGCSE 2011.
Given the increased importance of communication, teamwork, and critical thinking skills in the computing profession, we have been exploring studio-based instructional methods, in which students develop solutions and iteratively refine them through critical review by their peers and instructor. We have developed an adaptation of studio-based instruction for computing education called the pedagogical code review (PCR), which is modeled after the code inspection process used in the software industry. Unfortunately, PCRs are time-intensive, making them difficult to implement within a typical computing course. To address this issue, we have developed an online environment that allows PCRs to take place asynchronously outside of class. We conducted an empirical study that compared a CS 1 course with online PCRs against a CS 1 course with face-to-face PCRs. Our study had three key results: (a) in the course with face-to-face PCRs, student attitudes with respect to self-efficacy and peer learning were significantly higher; (b) in the course with face-to-face PCRs, students identified more substantive issues in their reviews; and (c) in the course with face-to-face PCRs, students were generally more positive about the value of PCRs. In light of our findings, we recommend specific ways online PCRs can be better designed.
Like our previous selection, this paper comes from software engineering education rather than software engineering per se, but has a lot to say about the latter. Code review is now a regular part of most open source projects, thanks in part to online code review tools like ReviewBoard. Here, the authors compare those kinds of reviews with face-to-face reviews, and find that the latter are more effective in several ways: people enjoy them more, they find more issues, and they are more likely to come away believing that reviews are worth doing. It would be fascinating to replicate this study with both junior programmers joining established teams, and developers with more experience who are undertaking reviews systematically for the first time.
The FCS1: A Language Independent Assessment of CS1 Knowledge
Allison Elliott Tew and Mark Guzdial: “The FCS1: A Language Independent Assessment of CS1 Knowledge”. SIGCSE’11, March 2011.
A primary goal of many CS education projects is to determine the extent to which a given intervention has had an impact on student learning. However, computing lacks valid assessments for pedagogical or research purposes. Without such valid assessments, it is difficult to accurately measure student learning or establish a relationship between the instructional setting and learning outcomes.
We developed the Foundational CS1 (FCS1) Assessment instrument, the first assessment instrument for introductory computer science concepts that is applicable across a variety of current pedagogies and programming languages. We applied methods from educational and psychological test development, adapting them as necessary to fit the disciplinary context. We conducted a large scale empirical study to demonstrate that pseudo-code was an appropriate mechanism for achieving programming language independence. Finally, we established the validity of the assessment using a multi-faceted argument, combining interview data, statistical analysis of results on the assessment, and CS1 exam scores.
People have been studying how we learn programming for even longer than they’ve been studying how we do it, and while the two aren’t exactly the same, there’s a lot of overlap in both methodologies and findings. Some of the best work I know has come out of the group at Georgia Tech led by Mark Guzdial (who is also a prolific and informative blogger). In this paper, he and his student Allison Tew present the results of a multi-year project to develop an instrument that can be used to assess how well students have learned basic programming concepts, regardless of whether the language they learned in was Java, Python, or MATLAB. The long-term goal is to create a concept inventory for computing similar to those that have been developed in physics, biology, and other sciences.
I think this is critically important work, and deserves a lot more attention from the software engineering community as a whole, not just that portion of it also interested in teaching. To paraphrase Dobzhansky, nothing in software engineering makes sense except in light of human psychology, so while measuring outputs like bugs per module is important, we won’t know why some people are so much more productive than others until we get a handle on what people actually know.
Best software for On-line Communities?
I’m Looking for advice as to what the best software is for creating online communities.
I’m currently trying to set up a Graduate Students’ Association in my Computer Science department at the University of Victoria.
After talking to a number of Graduate students in my department about what they think would be useful to them, I received a number of great suggestions. Based on the suggestions, I came up with 3 features that I think would be useful:
- A Blog: Which will allow the Grad. Association to keep grad students informed about important upcoming events.
- A Forum or discussion board: Will allow grad. students to discuss whatever they feel is important to them or their work.
- A Wiki: This will allow grad. students to easily create useful resources for other grad students.
There are many possible choices, but it all seems to come down to trade-offs.
As A Service or Host your ownThere are two major options for hosting the community. You can either choose to use an As a Service model, where someone else manages the infrastructure, or the Host your own, where you take the software and run it on an infrastructure you control.
The obvious trade off here is Control for maintenance and cost. By going with an as a service, you will decrease costs, and maintenance, but you also decrease your control. By hosting your own, you increase the three.
In the maintenance category, I would like to reduce as much maintenance as possible, as this will likely be run by busy grad students. At the same time though, if we lose all control, we are completely at the liberty of the service to decide what we can do, and how long we can keep it up.
Separate Services or an All-in-oneAgain, it is possible that we can use a number of different systems (such as a blog system, a forum and a wiki), or we can find a system that does all three.
In this case we have a trade off between consistency and maintenance versus precision. Of course, if you use 3 systems, you have to deal with maintaining 3 systems, and if you have 3 systems, it is unlikely that the three systems can be consistent.
In the precision category though, any one system will likely be good at a small number of features, and will be less good a others. So by going three systems, they’ll likely be better at the particular thing they do.
For this, I’d like to make the ease of use as easy as possible. If it is really hard for the users to go from one aspect to the other, like the discusion forum to the wiki or the blog, they are less likely to do it, unless it is absolutely necessary. If it’s not possible to link to the other systems consistently, this will be particularly hard.
Community AdviceI’ve use WordPress for blogs before, and they work pretty well. I’ve also considered CMS systems like TikiWiki, but it appears to be particularly buggy. I’ve also used Drupal, and it’s particularly bad for maintenance (which makes me die a little inside).
So this is where I look for advice from the community. What software can you recommend?
Codermetrics?
As a new parent I haven’t had much chance to go to the movies lately, and among the many new releases I’ve missed is “Moneyball”. But I read enough about the movie to learn that it was about baseball and the folks behind Sabermetrics, and so it did not surprise me when, shortly after the film came out, Greg Wilson pointed me to the article “Moneyball for software engineering”, by Jonathan Alexander, who also wrote a book on the same topic. I decided to write about it here because it is, sadly, an illustrative example of the “you can’t control what you can measure” trap that we’re too prone to fall for in our domain.
In his article, Alexander argues that the statistical approach featured in Moneyball can be applied to the software development domain. By gathering the right stats, he says, software companies can better assess the contributions from their employees, and create “more competitive teams”. Here’s a few of Alexander’s proposed measurements:
- Productivity by looking at the number of tasks completed or the total complexity rating for all completed tasks.
- Utility by keeping track of how many areas someone works on or covers.
- Teamwork by tallying how many times someone helps or mentors others, or demonstrates behavior that motivates teammates.
- Innovation by noting the times when someone invents, innovates, or demonstrates strong initiative to solve an important problem.
If you’re going down this route, you’ll also need some way to assess success, and software development does not have the simple win/loss that baseball has. Alexander has a few metrics in mind though:
- Looking at the number of users acquired or lost.
- Calculating the impact of software enhancements that deliver benefit to existing users.
- …and so on.
And once you have all these metrics, you could play around with them, assessing performance, identifying different kinds of “roles”, coaching on skills that the team is lacking, et cetera. That is Alexander’s proposal, in short, and he says that a “growing number of companies” are starting to use it.
But there is no data on the efficacy of this approach, and frankly I cannot see how it could possibly work. There are two major problems with it.
The first problem is assuming that a technique that works for baseball will also work for software development. Baseball is the perfect home for a stats-heavy approach. It is a very discrete sport—that is, you can get discrete data fairly easily. There are clear win/loss conditions, and every single play can be classified according to given criteria and assigned to individual players with relative ease. That’s not the case with software development. Exactly what counts as an innovation? What counts as an area of work? How do you assign a complexity rating for a completed task? And how could you ever get agreement on your answers to questions like these?
The second problem is that measurements can be gamed, and measurements used to shape policy will be gamed. Perhaps in baseball this is not an issue, and that may be because Sabermetrics measurements (as far as I know) tend not to focus on interpersonal or subjective criteria. That is not the case here, and it can’t be the case here, as good software development often depends on interpersonal and subjective criteria. See here for a wonderful illustration of a nightmare scenario that nonetheless would do great on the performance metrics above.
This is not to say that measurements are not useful in our domain—we’ve covered several examples of the opposite in this blog already. But we often jump to the numbers a bit too quickly, no matter how careless was the process to come up with them. Perhaps this is because seeing percentages or trends gives us a warm fuzzy illusion of control, and we tend to forget that we’re dealing with pretty complex constructs that can’t be captured easily, and with intelligent professionals that will react to our observations in unintended ways. My advice: always be suspicious of your subjective appraisals, but if you start collecting metrics, be extra suspicious. All those seemingly hard numbers might make you forget that they are probably still subjective, but dressed up in objectivity: wolves in sheeps’ clothing.
A field study of API learning obstacles
Martin P. Robillard and Rob DeLine. “A field study of API learning obstacles” ESE 16 (6), 2011.
Large APIs can be hard to learn, and this can lead to decreased programmer productivity. But what makes APIs hard to learn? We conducted a mixed approach, multi-phased study of the obstacles faced by Microsoft developers learning a wide variety of new APIs. The study involved a combination of surveys and in-person interviews, and collected the opinions and experiences of over 440 professional developers. We found that some of the most severe obstacles faced by developers learning new APIs pertained to the documentation and other learning resources. We report on the obstacles developers face when learning new APIs, with a special focus on obstacles related to API documentation. Our qualitative analysis elicited five important factors to consider when designing API documentation: documentation of intent; code examples; matching APIs with scenarios; penetrability of the API; and format and presentation. We analyzed how these factors can be interpreted to prioritize API documentation development efforts.
Developers don’t live on Stack Overflow alone. For many API questions, there are still materials and documentation that can help them speed up their learning process (or there should be, I think, while I try to learn node.js…). Robillard and DeLine’s report is full of rich insights and practical implications relevant for anyone trying to improve the developer documentation of their products.
Software Requirements Change Taxonomy: Evaluation by Case Study
Sharon McGee and Des Greer, “Software Requirements Change Taxonomy: Evaluation by Case Study“, International Conference on Requirements Engineering, Trento, Italy, September 2011.
Although a number of requirements change classifications have been proposed in the literature, there is no empirical assessment of their practical value in terms of their capacity to inform change monitoring and management. This paper describes an investigation of the informative efficacy of a taxonomy of requirements change sources which distinguishes between changes arising from ‘market’, ‘organisation’, ‘project vision’, ‘specification’ and ‘solution’. This investigation was effected through a case study where change data was recorded over a 16 month period covering the development lifecycle of a government sector software application. While insufficiency of data precluded an investigation of changes arising due to the change source of ‘market’, for the remainder of the change sources, results indicate a significant difference in cost, value to the customer and management considerations. Findings show that higher cost and value changes arose more often from ‘organisation’ and ‘vision’ sources; these changes also generally involved the co-operation of more stakeholder groups and were considered to be less controllable than changes arising from the ‘specification’ or ‘solution‘ sources. Overall, the results suggest that monitoring and measuring change using this classification is a practical means to support change management, understanding and risk visibility.
Many people have considered how best to classify requirements changes: for example, Harker (pdf) or Zowghi (pdf). In this paper, the authors conducted a single case study to understand whether their taxonomy could not only capture the various changes which occurred during an industrial software development project, but also whether such a classification could help with project management concerns.
It is a well-worn truth that changes in requirements can be very expensive to fix later in the project. However, one of the things that is typically not considered is the opportunity that a change affords. We tend to focus on the negative, but as McGee demonstrates during her case study, these changes are a key part of business strategy.
In particular, the highest-value/highest-cost changes came from the strategic, organization level, and the lowest-value/lowest-cost changes to the system from the detail-oriented, solution implementation level. The classification of change origins provides evidence that the context of the change is important in understanding how to manage that change.
During her research presentation, Sharon McGee also commented on the challenges of this type of research. While valuable, the organization she embedded with found the research process time-consuming. She doubted they would be willing to undergo a follow-up study. This is a major barrier to obtaining case study opportunities that go beyond the anecdotal.
Author Response: Quorum vs Perl vs Randomo Novice Accuracy Rates
Hi Greg and Jorge,
Thanks for mentioning our work on your site. My team and I have been astonished at how far and wide our results have spread in just a day or two. It’s amazing how emotional people have become about our experiment. Anyway, I’m a working scientist, so I don’t have a ton of time, but I’ll try to respond to a few user comments:
1. Claim: We tested with novices. This would never apply in the field.
Response: As scientists and practitioners, it would behoove us to objectively test such claims instead of just declaring their truth-value. I think that we should be testing languages with novices, professionals, and everyone in between.
2. Claim: $a to $c initializations are non-idiomatic and borked (or old). The syntax ($a,$b,$c) = @_; would be better. Or similarly, people might have chosen different examples.
Response: Testing with other examples or other versions of Perl, could reveal different accuracy rates. With that said, I find it pretty unlikely that ($a,$b,$c) = @_; would have much meaning to a novice. There is no way to know without more formal experiments, but it wouldn’t surprise me if someone discovered novices did even worse with such syntax.
3. Claim: We should trust our gut instincts over empirical studies.
Response: Gut instincts can be valuable, but in programming language design, people’s guts rarely seem to agree. By using the scientific method, we can obtain more reproducible, and frankly more accurate, answers.
4. Claim: A larger sample size might show Perl did better than a language designed by chance.
Response: This is true, as we clearly discuss in the paper. Keep in mind, if this is the case, it would only mean that novices were afforded 26% greater accuracy than those using Randomo. That’s very poor.
5. Claim: Quorum users were not more accurate than Perl or Randomo users.
Response: This is false. Results show there is a 95.3% chance that novice Quorum users were more accurate than Perl users and a 99.6% chance that they were more accurate than Randomo. To say otherwise is misrepresentative of our results.
6. Claim: Two of the languages are made up.
Response: Three: so is Perl. Quorum is implemented though. We’ll release 1.0 in a few months on sourceforge. Randomo is clearly a thought experiment, but would be easy to implement.
In Summary:
If anything, from reading the responses, what I think our community really needs to do is to move away from a largely pseudo-scientific view of programming language design toward one based on evidence. The scientific method has a much better chance of ending the programming language wars someday than does continuing to argue about it.
Finally, as one last point, for those readers that absolutely must send hate mail, please send it only to me, not my students.
Andreas Stefik, Ph.D.
Assistant Professor
Department of Computer Science
Southern Illinois University Edwardsville
An Empirical Comparison of the Accuracy Rates of Novices using the Quorum, Perl, and Randomo Programming Languages
Andreas Stefik, Susanna Siebert, Melissa Stefik, and Kim Slattery: An Empirical Comparison of the Accuracy Rates of Novices using the Quorum, Perl, and Randomo Programming Languages. PLATEAU 2011.
We present here an empirical study comparing the accuracy rates of novices writing software in three programming languages: Quorum, Perl, and Randomo. The first language, Quorum, we call an evidence-based programming language, where the syntax, semantics, and API designs change in correspondence to the latest academic research and literature on programming language usability. Second, while Perl is well known, we call Randomo a Placebo-language, where some of the syntax was chosen with a random number generator and the ASCII table. We compared novices that were programming for the first time using each of these languages, testing how accurately they could write simple programs using common program constructs (e.g., loops, conditionals, functions, variables, parameters). Results showed that while Quorum users were afforded significantly greater accuracy compared to those using Perl and Randomo, Perl users were unable to write programs more accurately than those using a language designed by chance.
In the early 1990s, when I was teaching parallel programming to scientists, I discovered very quickly that they found some programming systems much easier to learn than others. Data parallelism and Linda’s tuple spaces? They could get something working in half an hour. Message passing? It took hours to get as far. When Brent Gorda and I started teaching software engineering to scientists a few years later at Los Alamos National Laboratory, we initially used Perl; after switching to Python, we found that it only took two days to cover material that had previously taken three, and that students seemed to remember it better weeks or months later.
But everyone has stories like that about their favorite programming language. Haskell’s fans swear that strong typing makes all the difference, while fans of Scheme are wont to claim that strong typing is for people with weak memories. If anything deserves empirical study (if only to put such claims to rest), it’s this. That’s why I enjoyed this paper so much. It isn’t just their finding that novices using Perl were no more likely to write a correct program than novices using a language whose syntax was generated randomly (although I did smile quite broadly when I read that). This paper’s real contribution is to show that such studies are possible—that we can and should put such claims to the test, just as Rossbach et al. did for transactional programming.
Three Empirical Studies From ESEC/FSE’11
As our previous post said, a lot of interesting work was presented at the joint ECSE/FSE conference in September. Three of my favorites reporting empirical studies are:
- Sven Appel, Jörg Liebeg, and Christian Kästner: “Semistructured Merge: Rethinking Merge in Revision Control Systems”.
An ongoing problem in revision control systems is how to resolve conflicts in a merge of independently developed revisions. Unstructured revision control systems are purely text-based and solve conflicts based on textual similarity. Structured revision control systems are tailored to specific languages and use language-specific knowledge for conflict resolution. We propose semistructured revision control systems that inherit the strengths of both: the generality of unstructured systems and the expressiveness of structured systems. The idea is to provide structural information of the underlying software artifacts — declaratively, in the form of annotated grammars. This way, a wide variety of languages can be supported and the information provided can assist in the automatic resolution of two classes of conflicts: ordering conflicts and semantic conflicts. The former can be resolved independently of the language and the latter using specific conflict handlers. We have been developing a tool that supports semistructured merge and conducted an empirical study on 24 software projects developed in Java, C#, and Python comprising 180 merge scenarios. We found that semistructured merge reduces the number of conflicts in 60% of the sample merge scenarios by, on average, 34%, compared to unstructured merge. We found also that renaming is challenging in that it can increase the number of conflicts during semistructured merge, and that a combination of unstructured and semistructured merge is a pragmatic way to go.
Almost all version control systems treat files as lines of text, ignoring whatever program structure they contain. The few that diff and merge at the logical level only work that way, and are usually only available as part of all-or-nothing programming environments. In this paper, the authors look at a hybrid approach that tries to combine the good features of both pure alternatives. The tool itself is interesting, but I was equally interested in the empirical study they did to see how much of a difference they were making. That study told them that when their tool underperformed, it was most often because it couldn’t handle renamings well, which in turn tells them what they need to work on next.
- Andrew Meneely, Pete Rotella, and Laurie Williams: “Does Adding Manpower Also Affect Quality? An Empirical Longitudinal Analysis”.
With each new developer to a software development team comes a greater challenge to manage the communication, coordination, and knowledge transfer amongst teammates. Fred Brooks discusses this challenge in The Mythical Man-Month by arguing that rapid team expansion can lead to a complex team organization structure. While Brooks focuses on productivity loss as the negative outcome, poor product quality is also a substantial concern. But if team expansion is unavoidable, can any quality impacts be mitigated? Our objective is to guide software engineering managers by empirically analyzing the effects of team size, expansion, and structure on product quality. We performed an empirical, longitudinal case study of a large Cisco networking product over a five year history. Over that time, the team underwent periods of no expansion, steady expansion, and accelerated expansion. Using team-level metrics, we quantified characteristics of team expansion, including team size, expansion rate, expansion acceleration, and modularity with respect to department designations. We examined statistical correlations between our monthly team-level metrics and monthly productlevel metrics. Our results indicate that increased team size and linear growth are correlated with later periods of better product quality. However, periods of accelerated team expansion are correlated with later periods of reduced software quality. Furthermore, our linear regression prediction model based on team metrics was able to predict the product’s post-release failure rate within a 95% prediction interval for 38 out of 40 months. Our analysis provides insight for project managers into how the expansion of development teams can impact product quality.
The Mythical Man-Month is the most-quoted book in software engineering. Here, the authors test its central claim by looking at what effect expanding a development team has on downstream fault rates; in particular, they look at how the rate of team expansion correlates with defects later on. Their finding is that growth on its own doesn’t hurt quality: it’s rapid growth that causes problems.
- Zuoning Yin, Ding Yuan, Yuanyuan Zhou, Shankar Pasupathy, and Lakshmi Bairavasundaram: “How Do Fixes Become Bugs?”
This paper presents a comprehensive characteristic study on incorrect bug-fixes from large operating system code bases including Linux, OpenSolaris, FreeBSD and also a mature commercial OS developed and evolved over the last 12 years, investigating not only the mistake patterns during bug-fixing but also the possible human reasons in the development process when these incorrect bug-fixes were introduced. Our major findings include: (1) at least 14.8%∼24.4% of sampled fixes for post-release bugs 1 in these large OSes are incorrect and have made impacts to end users. (2) Among several common bug types, concurrency bugs are the most difficult to fix correctly: 39% of concurrency bug fixes are incorrect. (3) Developers and reviewers for incorrect fixes usually do not have enough knowledge about the involved code. For example, 27% of the incorrect fixes are made by developers who have never touched the source code files associated with the fix. Our results provide useful guidelines to design new tools and also to improve the development process. Based on our findings, the commercial software vendor whose OS code we evaluated is building a tool to improve the bug fixing and code reviewing process.
This paper’s starting point is something every seasoned developer knows: bug fixes are often buggy themselves. But how buggy? And are fixes for some kinds of bugs more error-prone than others? This papers examines 12 years of data from four operating systems to produce the statistics and recommendations summarized in the abstract. (Not surprisingly, concurrency and memory-management bugs are the hardest ones to fix correctly.) Given that testing and code review resources are always in short supply, this kind of information can help teams focus their efforts where they’ll do the most good.




