In Memory of Jim Gray
Mr. Jim Gray
ACM Turing Award Laureate Author: Paul McJones
James Nicholas Gray was born in San Francisco, California on 12 January 1944. He was raised by his mother, an English teacher, who encouraged her two children to read and make frequent visits to the aquarium or planetarium or to a museum. In 1961 Gray graduated from Westmoor High School in San Francisco.
Gray spent most of the next decade across the bay, at the University of California, Berkeley. His initial plan was to major in physics. Two stints in a co-op program at an aerospace company gave him a greater appreciation for the academic environment. He took graduate courses and carried out research work, graduating in 1966 with a bachelor’s degree in mathematics and engineering. After spending a year in New Jersey working at Bell Laboratories in Murray Hill and attending classes at the Courant Institute in New York City, he returned to Berkeley and enrolled in the newly-formed computer science department, earning a Ph.D. in 1969 for work on context-free grammars and formal language theory. He spent the next two years as a postdoctoral fellow, sponsored by IBM Corporation. During this time he served as the director of the CAL Timesharing System research project, a research project to build a secure and reliable operating system.
In 1971 Gray became a research staff member of IBM’s Watson Research Center in Yorktown Heights, NY. Inspired by work he’d done during his postdoctoral fellowship on urban modeling, he joined the General Science Department and worked on land-use mapping. During this period he met John Cocke, who suggested the challenge of scalable computing: finding a way to interconnect computers so that spending twice as much for computer hardware would allow a large computing problem to be solved twice as fast.
After a winter in New York, Gray decided he wanted to move back to California. But first he spent the summer of 1972 as a UNESCO Expert at the Polytechnic Institute of Bucharest in Romania. Back in California, he accepted an offer from IBM’s San Jose Research Laboratory (now IBM Almaden). The San Jose Research Laboratory was collocated with IBM’s General Products Division, which designed and manufactured computer disk drives. Database management was a research focus, and one of the lab members, Edgar F. Codd, had published an influential paper proposing a new way to organize database systems, called the relational model.
Several projects at IBM and elsewhere were begun with the goal of building practical systems based on Codd’s relational model. In 1973, IBM research management decided to combine people from the Watson and San Jose Laboratories into a single project located in San Jose. Gray soon joined this project, which became known as System R. The project continued for five years and — together with the Ingres project at the University of California, Berkeley — served as the foundation for the relational database industry. Ray Boyce and Don Chamberlin designed the widely-used SQL query language for System R.
Gray played a major role in System R, combining his experience with systems and theory to create a unified approach to the interrelated problems of concurrency control and crash recovery. He defined the transaction as a unit of work, such as moving money from one bank account to another, that must leave the bank’s database in a consistent state whether or not the transaction succeeds: either the money moves, or it stays in the original account. Gray developed techniques that allowed concurrent execution of many transactions, as well as restart after crashes, while maintaining the consistency of the database. He proved the correctness of the approach. This work was the foundation for his Turing Award. As the research component of System R wound down, Gray helped transfer the technology to IBM product groups and began thinking about how to extend transactions to a distributed network of communicating computers.
In 1980, Gray made a career change, moving to Tandem Computers, where he spent the next decade. Tandem had pioneered the use of fault-tolerant hardware and software in commercial systems. Its approach of interconnecting isolated computers with a high-speed network promised the scalability that John Cocke had proposed a decade earlier. The scope of Gray’s work at Tandem moved beyond the research-to-production transfer he’d done at IBM, to extensive involvement in product development and activities involving Tandem customers and the entire field of data processing. An important example of his product development activity was the leadership role he played in the NonStop SQL relational database management system, which was tightly integrated with Tandem’s operating system and communication software, and featured fault tolerance, high availability, and scalable performance. Gray was involved from the initial conception: obtaining management approval, recruiting engineers, leading the architecture design, and participating in coding and tuning.
Gray believed that the relational database model and the SQL data access language were sound foundations for online applications, but he was concerned about the difficulty customers had comparing the offerings from various hardware and software vendors. He designed end user-oriented performance benchmarks, and helped establish a vendor-neutral organization, the Transaction Processing Performance Council, to oversee their impartial implementation. This led to more than a decade of strong competition between vendors to improve their products.
Gray’s interest in fault tolerance led him to work with Tandem customers to study the cause of system failures. He published one of the first papers with statistics from production fault tolerant systems, demonstrating that the most common sources of system failure were system administration and software bugs. As customer need for geographical distribution of computer systems and terminal networks increased, Gray studied how such distribution interacted with availability, consistency, and other desirable properties of the overall system. The many technical reports and papers he published while at Tandem helped customers plan their applications, helped Tandem engineers plan enhancements to their products, and contributed to the open literature on a wide variety of topics related to performance, reliability, availability, and ease of use.
After a decade at Tandem, Gray moved to Digital Equipment Corporation in 1990, where he started a small laboratory in San Francisco. Over the next four years he consulted with product groups for the Rdb relational database management system and the ACMS transaction-processing monitor. In addition, he and his co-author Andreas Reuter completed the book Transaction Processing: Concepts and Techniques. They had begun the work in 1986 as preparation for a one-week seminar; it evolved into a 1000-page book published in 1992. Building on their combined experience designing and teaching algorithms and systems, the authors presented an integrated view of the overall architecture as well as many details faced by the implementers of transaction processing systems. Completing the book marked a turning point for Gray, ending his focus on transaction processing systems.
In 1994 Gray resigned from Digital and accepted a Mackay Fellowship at the University of California, Berkeley. He participated in the Sequoia 2000 project, which was designing a Geographic Information System to support global change research.
During 1994-1995, Gray and Gordon Bell proposed to Microsoft that they establish an advanced development laboratory in San Francisco dedicated to servers and scalability. Microsoft agreed, and a small staff was hired. Over the next dozen years, Gray set the goal for himself “to put all the world’s scientific data online, along with tools to analyze the data.” He worked with colleagues at Microsoft and several universities to build a series of systems that applied the growing power of commodity hardware and software to a series of applications allowing access, search, and computation on large-scale scientific data. TerraServer allowed access to newly-available satellite imagery with resolution of 1.5 meters/pixel. SkyServer, a collaboration with Alexander Szalay and his colleagues at Johns Hopkins, allowed access to astronomical data from the Sloan Digital Sky Survey. SkyServer led to additional work with astronomical data, and Gray also worked with others to show how to apply the approach to other fields such as molecular biology, sensor networks for environmental science, and oceanography.
Gray had a lifelong interest in teaching and mentoring others. He taught formal and informal courses at Stanford University, gave lectures at universities around the world, and served on a wide range of program committees, editorial boards, and advisory boards. Taken together, his research, system building, mentoring, writing, teaching, and speaking had a large positive impact on almost everyone involved commercially or academically in the field of online transaction processing. Our modern society depends on online transaction processing for banking, ecommerce, and a host of other applications.
On January 28, 2007, Gray failed to return from sailing his 40-foot sloop Tenacious around the Farallon Islands. The Coast Guard conducted a comprehensive search but found no signs of the boat. Friends and colleagues of Gray conducted an innovative search using satellite imagery and cloud computing, but were also unsuccessful. A four-month long search of the seabed covering approximately 1000 square kilometers and using a state-of-the-art technology (including multibeam echosounders and remotely operated vehicles) was equally unsuccessful. After the legally-mandated waiting period, a court granted a petition to have him declared dead as of January 28, 2012.
From the ACM
Thoughts and Memories from Friends
Steve Jones Another legend I never met, but from whom I learned a few things about relational databases. I’ve read some of his work, and I admired his passion.