ConE: A Concurrent Edit Detection Tool for Large Scale Software Development

ConE: A Concurrent Edit Detection Tool for Large Scale

Soware Development

CHANDRA MADDILA, Microsoft Research

NACHIAPPAN NAGAPPAN

∗

, Microsoft Research

CHRISTIAN BIRD, Microsoft Research

GEORGIOS GOUSIOS

†

, Facebook

ARIE van DEURSEN, Delft University of Technology

Modern, complex software systems are being continuously extended and adjusted. The developers responsible

for this may come from dierent teams or organizations, and may be distributed over the world. This may

make it dicult to keep track of what other developers are doing, which may result in multiple developers

concurrently editing the same code areas. This, in turn, may lead to hard-to-merge changes or even merge

conicts, logical bugs that are dicult to detect, duplication of work, and wasted developer productivity. To

address this, we explore the extent of this problem in the pull request based software development model.

We study half a year of changes made to six large repositories in Microsoft in which at least 1,000 pull

requests are created each month. We nd that les concurrently edited in dierent pull requests are more

likely to introduce bugs. Motivated by these ndings, we design, implement, and deploy a service named

ConE (Concurrent Edit Detector) that proactively detects pull requests containing concurrent edits, to help

mitigate the problems caused by them. ConE has been designed to scale, and to minimize false alarms while

still agging relevant concurrently edited les. Key concepts of ConE include the detection of the Extent of

Overlap between pull requests, and the identication of Rarely Concurrently Edited Files. To evaluate ConE, we

report on its operational deployment on 234 repositories inside Microsoft. ConE assessed 26,000 pull requests

and made 775 recommendations about conicting changes, which were rated as useful in over 70% (554) of

the cases. From interviews with 48 users we learned that they believed ConE would save time in conict

resolution and avoiding duplicate work, and that over 90% intend to keep using the service on a daily basis.

CCS Concepts:

• Software and its engineering → Integrated and visual development environments

;

Software maintenance tools; Software conguration management and version control systems.

Additional Key Words and Phrases: Pull-based software development, pull request, merge conict, distributed

software development

ACM Reference Format:

Chandra Maddila, Nachiappan Nagappan, Christian Bird, Georgios Gousios, and Arie van Deursen. 2021. ConE:

A Concurrent Edit Detection Tool for Large Scale Software Development. ACM Trans. Softw. Eng. Methodol. 1,

1 (September 2021), 26 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn

∗

Work done while at Microsoft Research.

†

Work done while at Delft University of Technology.

Authors’ addresses: Chandra Maddila, Microsoft Research, Redmond, WA, USA, chmaddil@microsoft.com; Nachiappan

Nagappan, Microsoft Research, Redmond, USA, [email protected]; Christian Bird, Microsoft Research,

Redmond, WA, USA, [email protected]; Georgios Gousios, Facebook, Menlo Park, CA, USA, gousiosg@fb.com; Arie van

Deursen, Delft University of Technology, Delft, The Netherlands, [email protected].

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee

provided that copies are not made or distributed for prot or commercial advantage and that copies bear this notice and

the full citation on the rst page. Copyrights for third-party components of this work must be honored. For all other uses,

contact the owner/author(s).

1049-331X/2021/9-ART

https://doi.org/10.1145/nnnnnnn.nnnnnnn

ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: September 2021.

arXiv:2101.06542v3 [cs.SE] 25 Sep 2021

2 Chandra Maddila, Nachiappan Nagappan, Christian Bird, Georgios Gousios, and Arie van Deursen

1 INTRODUCTION

In a collaborative software development environment, developers, commonly, work on their indi-

vidual work items independently by forking a copy of the code base from the latest main branch

and editing the source code les locally. They then create pull requests to merge their local changes

into the main branch. With the rise of globally distributed and large software development teams,

this adds a layer of complexity due to the fact that developers working on overlapping parts of

the same codebase might be in dierent teams or geographies or both. While such collaborative

software development is essential for building complex software systems that meet the expected

quality thresholds and delivery deadlines, it may have unintended consequences or ‘side eects’

[

]. The side eects can be as simple as syntactic merge conicts, which can be handled

by version control systems [

] and various techniques/tools [

], to semantic conicts [

Such bugs can be very hard to detect and may cause substantial disruptions [

]. Primarily, all of

this happens due to lack of awareness and early communication among developers editing the

same source code le or area, at the same time, through active pull requests.

There is no substitute to resolving merge or semantic conicts (or xing logical bugs or refactoring

duplicate code) when the issue is manifested. Studies show that pull requests getting into merge

conicts is a prevalent problem [

]. Merge conicts have a signicant impact on code

quality and can disrupt the developer workow [

]. Sometimes, the conict becomes so

convoluted that one of the developers involved in the conict has to abandon their change and start

afresh. Because of that, developers often defer resolving their conicts [

] which makes the conict

resolution even harder at a later point of time [

]. Time spent in conict resolution or refactoring

activities is going to take away valuable time and prohibits developers from fullling their primary

responsibility, which is to deliver value to the organization in the form of new functionality, bug

xes and maintaining the service. In addition to loss of time and money, this causes frustration

[

]. Studies have shown that these problems can be avoided by following strategies such as

eective communication within the team [

], and developing awareness about others’ changes

that have a potential to incur conicts [22].

Our goal is to design a method to help developers discover changes made on other branches

that might conict with their own changes. This goal is particularly challenging for modern, large

scale software development, involving thousands of developers working on a shared code base.

One of the design choices that we had to make was to minimize the false alarms by making it more

conservative. Studies have shown that, in large organizations, tools that generate many false alarms

are not used and eventually deprecated [46].

The direct source of inspiration for our research is complex, large scale software development as

taking place at Microsoft. Microsoft employs ~166K employees worldwide and 58.6% of Microsoft’s

employees are in engineering organizations. Microsoft employs ~69K employees outside of the

United States making it truly multinational [

]. Because of the scale and breadth of the organization,

tools and technologies used across the company, it is very common for Microsoft’s developers to

constantly work on overlapping parts of the source code, at the same time, and encounter some of

the problems explained above.

Over a period of twelve months, we studied pull requests, source control systems, code review

tools, conict detection processes, and team and organizational structures, across Microsoft and

across dierent geographies. This greatly helped us assess the extent of the problem and practices

followed to mitigate the issues induced by the collaborative software development process. We

make three key observations:

(1)

Discovering others’ changes is not trivial. There are several solutions oered by source control

systems like GitHub or Azure DevOps [

] that enable developers to subscribe to email

ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: September 2021.

ConE: A Concurrent Edit Detection Tool for Large Scale Soware Development 3

notications when new pull requests are created or existing ones are updated. In addition,

products like Microsoft Teams or Slack can show a feed of changes that are happening in

a repository a user is interested in. The notication feed becomes noisy over time and it

becomes very hard for developers to digest all of this information and locate pull requests

that might cause conicts. This problem is aggravated when a developer works on multiple

repositories.

(2)

Tools have to t into developers’ workows. Making developers install several client tools and

making them switch their focus between dierent tools and windows is a big obstacle for

adoption of any solution. There exists a plethora of tools [

] that aim to solve this

problem in bits and pieces. Despite this, usability is still a challenge because none of them t

naturally into developers’ workows. Therefore, they cause more inconvenience than the

potential benets they might yield.

(3)

Suggestions about conicting changes must be accurate and scalable. There exist solutions

which attempt to merge changes proactively between a developer’s local branch and the

latest version of main branch or two developer branches. These tools notify the developers

when they detect a merge conict situation [

]. Such solutions are impractical to

implement in large development environments as the huge infrastructure costs incurred by

them may outweigh the gains realized in terms of saved developer productivity.

Keeping these observations in mind, we propose ConE, a novel technique to i) calculate the

Extent Of Overlap (EOO) between two pull requests that are active at the same time frame, and ii)

determine the existence of Rarely Concurrently Edited les (RCEs). We also derived thresholds to

lter out noise and implemented ranking techniques to prioritize conicting changes.

We have implemented and deployed ConE on 234 repositories across dierent product lines

and large scale cloud development environments within Microsoft. Since deployed, in March 2020,

ConE evaluated 26,000 pull requests and made 775 recommendations about conicting changes.

This paper describes ConE and makes the following contributions:

•

We characterize empirically how concurrent edits and the probability of source code les

introducing bugs vary based on the fashion in which edits to them are made, i.e., concurrent

vs non-concurrent edits (Section 3).

•

We introduce the ConE algorithm that leverages light-weight heuristics such as the extent of

overlap and the existence of rarely concurrently edited les, and ConE’s thresholding and

ranking algorithm that lters and prioritizes conicting changes for notication (Section 4).

•

We provide implementation and design details on how we built ConE as a scalable cloud

service that can process tens of thousands of pull requests across dierent product lines every

week (Section 5).

•

We present results from our quantitative and qualitative evaluation of the ConE system

(Section 6).

To the best of our knowledge, this is the rst study of an early conict detection system that

is also deployed, in a large scale, cloud based, enterprise setting comprised of a diverse set of

developers who work with multiple frameworks and programming languages, on multiple disparate

product lines and who are from multiple geographies and cultural contexts. We have observed

overwhelmingly positive response to this system with a 71.48 % positive feedback provided by the

end users: A very good user interaction rate (2.5 clicks per recommendation that is surfaced by

ConE to learn more about conicting changes) and 93.75% of the users indicating their intent to

use or keep using the tool on a daily basis.

Our interactions and interviews with developers across the company made us realize that

developers nd it valuable to have a service that can facilitate better communication among them

ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: September 2021.

4 Chandra Maddila, Nachiappan Nagappan, Christian Bird, Georgios Gousios, and Arie van Deursen

about edits that are happening elsewhere (to the same les or functions that are being edited by

them) through simple and non-obtrusive notications. This is reected strongly in the qualitative

feedback that we have received (explained in detail in section 6).

2 RELATED WORK

The software engineering community has extensively studied the impact of merge conicts on

software quality [

], investigated various methodologies and tools that can help developers

discover conicting changes through interactive visualizations, and developed speculative analysis

tools [

]. While ConE draws inspiration from some of this prior work, it is more ambitious,

targeting a method that is eective while not resource intensive, can be easily scaled to work on

tens of thousands of repositories of all sizes, is easy to integrate and ts naturally into existing

software development workows and tools with very little to no disruption.

A conict detection system that has to work for large organizations with disparate sets of

programming languages, tools, product portfolio and has thousands of developers that are also

geographically distributed, has to satisfy the requirements listed below:

•

language-independent: the techniques and tooling built should be language-independent in

nature and support repositories that hosts code written in any programming language and

should support new languages with no or minimal customization.

•

non-intrusive: the recommendations passed by the tool should naturally t into developer

workows and environment.

•

scalable: nally, the techniques proposed and the system should be performant and responsive

without consuming a lot of computing resources and demanding lot of infrastructure to scale

them up.

We now explain some of the prior work that is relevant and explain why they do not satisfy

some or all of the requirements.

Tools base d on edit activity.

Manhattan [

] is a tool that generates visualizations about team

activity whenever a developer edits a class and noties developers through a client program, in

real time. While this shows useful 3D visualizations about merge conicts in the IDE itself (thus

being non-intrusive and natural to use), it is not adaptive (it does not automatically reect any

changes to the code in the visualization, unless the user decides to re-import the code base), not

generic (it works only for Java and Eclipse) and not scalable as it operates on the client side and

has to go through the cycle of import-analyze-present again and again for every change that is

made, inside the IDE environment. Similarly, FASTDash [

] is a tool that scans every single le

that is edited/opened in every developer local workspace and communicates about their changes

back and forth through a central server. This is impractical to implement across large development

teams. It requires tracking changes at the client side with the help of an agent program that runs

on each client. Furthermore it then keeps listening to every le edit activity in the workspace, then

communicating that information with a central server which mediates communication between

dierent workspaces. This is prone to failures and runs into scale issues even with a linear increase

in developers and pull requests in the system.

Tools based on early merging.

Some tools were built upon the idea of attempting actual

merging and notifying the developers through a separate program that runs on the client [

These solutions are very resource intensive because the system needs to perform the actual source

code merge for every pull request or developer branch with the latest version of the main branch

(despite implementing optimization techniques like caching and tweaking the algorithm to compute

relationships between changes when there is a change to the history of the repository). It is not

possible to implement and scale this at a company like Microsoft where tens of thousands of pull

ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: September 2021.

ConE: A Concurrent Edit Detection Tool for Large Scale Soware Development 5

requests are created every week. Additionally, these solutions do not attempt to merge between

two dierent user branches or two dierent active pull requests but attempt to merge a developer

branch with the latest version of the main branch. This will not nd conicting changes that exist

in independent developer branches and thus cannot trigger early intervention. Palantir [

] is a tool

that addresses some of the performance issues by leveraging a cache for doing dependency analysis.

It is, however, still hard to scale due to the fact that there is client-server communication involved

between IDEs and centralized version control servers to scan, detect, merge and update every

workspace with information about remote conicting changes. Some solutions explore speculative

merging [

] but the concerns with scalability, non-obtrusiveness remain valid with all of

them.

Predictive tools.

Owhadi-Kareshk et al. explored the idea of building binary classiers to predict

conicting changes [

]. Their model consists of nine features, of which the number of jointly

edited les is the dominant one. The model has been evaluated on a dataset of syntactic merge

conicts reverse engineered from git histories. The model’s reported performance in terms of

precision ranges from 0.48 to 0.63 (depending on the programming languages).

While one of our proposed metrics, our Extent of Overlap, is akin to the dominant feature

in Owhadi-Kareshk’s model, unfortunately their proposed approach cannot be applied in our

context. In particular the reported precision is too low and would generate too many false alarms

which would render our tool unused [

]. Furthermore, the reported precision and recall are

measured based on a gold standard of syntactic changes. Instead, we target an evaluation with

actual developers, based on a service deployed on repositories they are working with on a daily

basis. As we will see in our evaluation, these developers not only value warnings about syntactic

changes, but also semantic conicts [

], or even cases of code/eort duplication (as explained in

Section 6.3).

Empirical studies of merge conicts and collaboration.

There exists many studies that do

not propose tools, but study merge conicts or present methods to predict conicts or recommend

coordination. Zhang et al. [

] conducted an empirical study of the eect of le editing patterns on

software quality. They conducted their study on three open source software systems to investigate

the individual and the combined impact of the four patterns on software quality. To the best of our

knowledge ours is the rst empirical study that is conducted at scale, on industry data. We perform

analysis on 67K bug reports, from 83K les (in comparison to the studies conducted by Zhange et

al. which looked at 98 bugs, from 2,140 les).

Ashraf et al. presented reports from mining cross-task artifact dependencies from developer

interactions [

]. Dias et al. proposed methods to understanding predictive factors for merge conicts

[

], i.e., how conict occurrence is aected by technical and organizational factors. Studies

conducted by Blincoe et al. and Cataldo et al. [

] show the importance of timely and ecient

recommendations and the implications for the design of collaboration awareness tools. Studies

like this form a basis for building solutions that are scalable and responsive (the large-scale ConE

service that we deployed at Microsoft) and their importance in creating awareness of the potential

conicts.

Costa et al. proposed methods to recommend experts for integrating changes across branches

[

] and characterized the problem of developers’ assignment for merging branches [

]. They

analyzed merge proles of eight software projects and checked if the development history is an

appropriate source of information for identifying the key participants for collaborative merge. They

also presented a survey on developers about what actions they take when they need to merge

branches, and especially when a conict arises during the merge. Their studies report that the

majority of the developers (75%) prefer collaborative merging (as opposed to merging and taking

ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: September 2021.

6 Chandra Maddila, Nachiappan Nagappan, Christian Bird, Georgios Gousios, and Arie van Deursen

decisions alone). This reiterates the fact that tools that facilitate collaboration, by providing early

warnings, are important in handling merge conict situations.

3 CONCURRENT VERSUS NON-CONCURRENT EDITS IN PRACTICE

The dierences in the fashion in which edits are made to source code les (concurrent vs non-

concurrent) can cause various unintended consequences (as explained in section 1). We performed

large scale empirical analysis of source code edits to understand the ability of concurrent edits

to cause bugs. We picked bugs as a candidate for our case study because it is relatively easy to

mine and generate massive amounts of ground truth data about bugs and map them back to the

changes that induced the bugs, by leveraging some of the techniques proposed by Wang et al.

[

], at Microsoft’s scale. Understanding the extent of the problem, i.e., the side eects caused by

concurrent source code edits in a systematic way, is an essential rst step towards making a case

for building an early intervention service like ConE. This allows us to quickly sign up customers

inside the company and deploy the ConE system on thousands of repositories, for tens of thousands

of developers, across Microsoft. To that extent, we formulate two research questions that we would

like to nd answers for.

• RQ1

How do concurrent and non-concurrent edits to les compare in the number of bugs

introduced in these les?

• RQ2

To what extent are concurrent, non-concurrent, and all edits, correlated with subsequent

bug xes to these les?

Answering the questions above allows us to assess the urgency of the problem. The methods,

techniques and outcomes used can also be employed to inform decision makers, when investments

in the adoption of techniques like ConE need to be made.

We performed an empirical study on data that is collected from multiple, dierently sized

repositories. For our study, we focused on one of the important side eects that is induced by

collaborative software development, i.e., the “number of bugs introduced by concurrent edits”. We

chose this scenario as we have an option to generate an extensive set of ground truth data, by

leveraging techniques proposed by Wang et al. [

], to tag pull requests as bug xes. They employ

two simple heuristics to tag bug xes: the commit message should contain the words “bug” or

“x”, but not “test case” or “unit test”. Tagging changes that introduce bugs is not a practice that

is followed very well in organizations. Studies have shown that les changed in bug xes can be

considered as a good proxy to les that introduced the bugs in the rst place [33, 45]. Combining

both ideas we created a ground truth data set which we used in our empirical analysis. We broadly

classify our empirical study into three main steps.

(1)

Data collection: Collect data using the data ingestion framework that we have built which

ingests meta data about pull requests (author, created/closed dates, commits, reviewers etc),

iterations/updates of pull requests, le changes in pull requests, and intent of the pull request

(feature work, bug x, refactoring etc).

(2)

Use the data collected in Step 1 to analyze the impact of concurrent edits on bugs or bug xes

in comparison to non-concurrent edits.

(3)

Explain the dierences in correlations between concurrently versus non-concurrently edited

les to the number of bugs that they introduce.

For the purpose of the empirical analysis we dene concurrently and non-concurrently edited les

as follows:

•

Concurrently edited les: Files which have been edited in two or more pull requests, at the

same time, while the pull requests are active. A pull request is in an ‘active’ state when it is

being reviewed but not completed or merged.

ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: September 2021.

ConE: A Concurrent Edit Detection Tool for Large Scale Soware Development 7

•

Non-concurrently edited les: Files which have never been edited in two pull requests while

they both are in active state. So, we are sure that changes made to these les are always

made in the latest version and are merged before they are edited through another active pull

request

3.1 Data Collection

We collected data about le edits (concurrent and non-concurrent) from the pull request data,

for six months, from six repositories. We picked repositories in which at least 1,000 pull requests

are created every month. After reducing the repositories to a subset, we randomly selected six

repositories for the purpose of the analysis. We made sure our data set is representative in various

dimensions like size (small (1), medium (2), large (3)), the nature of the product (on-prem product

(2) vs cloud service (4)), geographical distribution of the teams (US only (2) versus split between

dierent countries and time zones (4)), and programming languages (as listed in Table 3). We

performed data cleansing by applying the lters listed below:

•

Exclude PRs that are open for more than 30 days: the majority of these pull requests are ‘Stale

PRs’ which will be left open forever or abandoned at a later point of time. Studies shows that

70% of the pull requests gets completed within a week after creation [25].

•

Exclude PRs with more than 50 les (this is the 90th percentile for le counts in our pull

request data set). This is one of the proxies that we use to to exclude PRs which are created

by non-human developers that do mass refactoring or styling changes etc.

•

Exclude edits made to certain le types. We are primarily interested in understanding the

eects of concurrent edits on source code changes as opposed to les like conguration

or initialization les which are edited by lot of developers through lot of concurrent pull

requests, all the time. For the purpose of this study, we consider only the following le types:

.cs, .c, .cpp, .ts, .py, .java, .js, .sql.

•

Exclude les that are edited a lot: For example, les that contain global constants, key value

pairs, conguration values, or enums are usually seen in a lot of active pull requests at the

same time. We studied 200 pull requests to understand the concurrent edits to these les.

They typically are in the order of a few thousands of lines in size, which is well above the

median le size. In all cases the edits are localized to dierent areas of the les and surgical

in nature. Sometimes, the line numbers of the edits are far away (few thousands of lines

away, at least). Therefore, we impose a lter on the edit count of fewer than twenty times in

a month (90th percentile of edit counts for all source code les) and exclude any les that

are edited more than this. Without this lter, these frequently edited les would dominate

the results of the ConE recommendations thus yielding too many warnings for harmless

concurrent edits.

We started with a data set of 208,556 pull requests. As bug xes is our main concentration for

the empirical analysis, we removed all the pull requests that are not bug xes. That reduced the

data set to 67,155 pull requests (32.2% of the pull requests are bug xes). Then we applied other

lters mentioned above, which further reduced the data set to 54,127 pull requests (25.95%). Table 1

shows the distribution of concurrently and non-concurrently edited les per repository.

3.2 RQ1: Concurrent versus non-concurrent bug inducing edits

We take every (concurrently or non-concurrently) edited le, and check whether the nature of

the edit has any eect on the likelihood of that le appearing in bug xes after the edit has been

merged. We compare how the percentage of edited les that are seen in bug xes (within a day, a

week, two weeks and a month), varies with the nature of the edit (concurrent vs non-concurrent).

ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: September 2021.

8 Chandra Maddila, Nachiappan Nagappan, Christian Bird, Georgios Gousios, and Arie van Deursen

Table 1. Distribution of concurrently and non-concurrently edited files per repository

Repo Distinct

number

of con-

currently

edited

les

Distinct

number

of non-

concurrently

edited les

Number

of bug

x pull

requests

Percentage

of con-

currently

edited les

Percentage

of non-

concurrently

edited les

Repo-1

3500 4875 4781 41.7 58.2

Repo-2

10470 16879 15678 38.2 61.8

Repo-3

2907 4119 5467 41.3 58.7

Repo-4

5560 7550 8972 42.4 57.6

Repo-5

4110 7569 9786 35.2 64.8

Repo-6

5987 9541 9443 38.5 61.5

Total

32534 50533 54127 39.1 60.9

Figure 1 shows the impact of concurrent versus non-concurrent edits on the number of bugs

being introduced. Across all six repositories, the percentage of bug inducing edits is consistently

higher for concurrently edited les (blue bars) than for non-concurrently edited ones (orange bars).

3.3 RQ2: Edits in files versus bug fixes in files

We use Spearman’s rank correlation to analyze how the total number of edits, concurrent edits,

and non-concurrent edits to les each correlate with the number of bug xes seen in those les.

While Figure 1 shows that more concurrently edited les are seen in bug x pull requests

(compared to non-concurrently edited ones), this might also be because these les are frequently

edited and seen in bug x pull requests naturally. To validate this, we performed Spearman rank

correlation analysis for each le that is ever edited with respect to how many times it is seen in

bug xes (the numbers of data points from the six repositories are listed in Table 1):

•

The total number of times a le is seen in all completed pull requests vs the number of bug

xes in which it is seen

•

The total number of times a le is seen in concurrent pull requests vs the number of bug xes

in which it is seen

•

The total number of times a le is seen in non-concurrent pull requests vs the number of bug

xes in which it is seen

The results are in Table 2. We observe that concurrent edits (third column) consistently are

correlated with bug xes, more so than non-concurrent edits (column 4) and all edits (column 2).

For all repositories except Repo-4, there exists almost no correlation between non-concurrent edits

(column 4) and bug xes.

For Repo-4, frequently edited les are not necessarily the ones seen in more bug xes: there

exists a negative correlation between total edits (column 2) and the number of bug xes. However,

les that are concurrently edited (column 3) do have a positive correlation with the number of bug

xes.

The variety in the correlations can be explained by the fact that concurrent editing is just

one of many factors related to the need for bug xing. Other factors might include the level of

modularization, developer skills, the test adequacy, engineering system eciency, and so on.

ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: September 2021.

ConE: A Concurrent Edit Detection Tool for Large Scale Soware Development 9

(a) (b)

(e) (f)

Fig. 1. Graphs showing how the percentage of files seen in bug fixes (within a day, a week, two weeks

and a month) is changing. Blue and orange bars represent concurrently and non-concurrently edited files,

repsectively.

4 SYSTEM DESIGN

Backed by the correlation analysis suggesting that concurrent edits maybe prone to causing issues.

Also, there exists a huge demand from engineering organizations, inside Microsoft, for a better tool

that can detect conicting changes early on and facilitate better communication among developers,

we moved forward to materialize the idea of ConE into reality. We then performed large scale

testing and validation by deploying ConE on 234 repositories. Details about the implementation,

deployment and scale-out are provided in section 5.

In this section we describe ConE’s conict change detection methodology, algorithm and system

design in detail. We will use the following terminology:

•

Reference pull request is a pull request in which a new commit/update is pushed thus triggering

the ConE algorithm to be run on that pull request.

ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: September 2021.

10 Chandra Maddila, Nachiappan Nagappan, Christian Bird, Georgios Gousios, and Arie van Deursen

Table 2. Spearman rank correlation analysis for total edits, concurrent edits, non-concurrent edits vs bug

fixes.

Repo

Total Edits to

Bug Fixes

Concurrent Edits to

Bug Fixes

Non-Concurrent Edits

to Bug Fixes

Repo-1 0.145

∗∗∗

0.298

∗∗∗

0.034

∗∗

Repo-2 0.072

∗∗∗

0.140

∗∗∗

0.057

∗∗

Repo-3 0.140

∗

0.330

∗

0.120

∗

Repo-4 −0.077

∗∗∗

0.451

∗∗∗

−0.461

∗∗∗

Repo-5 0.164

∗∗∗

0.472

∗∗∗

0.091

∗∗∗

Repo-6 0.084

∗∗

0.196

∗∗∗

0.005

∗

***

𝑝 < 0.001,

𝑝 < 0.01,

𝑝 < 0.05

•

Active pull request is a pull request whose state is ‘active’ when the ConE algorithm is

triggered to be run on a reference pull request.

A key design consideration is that we want to avoid false alarms. In the current state of the

practice developers never receive warnings about potentially harmful concurrent edits. Based on

this we believe it is acceptable to miss a few warnings. On the other hand, giving false warnings

will likely lead to rejection of a tool like ConE. For that reason, ConE has several built-in heuristics

that are aimed at reducing such false alarms.

Due to the nature of the problem, the domain we are operating in, and the algorithm we have

in-place, it is possible to see notications that are false alarms. One of the design choices that

we had to make was to minimize the false alarms by making it more conservative. A side eect

of this is our coverage (number of pull requests for which we send a notication) will be lower.

Studies have shown that, in large organizations, tools that generate many false alarms are not used

and eventually deprecated [

]. However, recent techniques proposed by Brindescu et al. [

], can

potentially aid in facilitating a decision by determining the merge conict situations to ag, based

on the complexity of the merge conict.

4.1 Core Concepts

ConE constantly listens to events that happen in an Azure DevOps environment [

]. When any new

activity is recorded (e.g., pushing a new update or commit) in a pull request, the ConE algorithm is

run against that pull request. Based on the outcome, ConE noties the author of the pull request

about conicting changes. We describe two novel constructs that we came up with for detecting

conicting changes and determining candidates for notications: Extent of overlap (EOO) and the

existence of ‘Rarely Concurrently Edited’ les (RCEs). Next, we provide a detailed description of

ConE’s conict change detection algorithm and the parameters we have in place to tune ConE’s

algorithm.

4.1.1 Extent of Overlap (EOO). ConE scans all the active pull requests which meet our ltering

criteria (explained in section 3.1) and for each such pull request (reference pull request) calculates

the percentage of les edited in the reference pull request that overlap with each of the active pull

requests.

Extent of Overlap =

| 𝐹

𝑟

∩ 𝐹

𝑎

− 𝐹

𝑒

| 𝐹

𝑟

∗ 100

where F

= Files edited in reference pull request, F

= Files edited in a given active pull request, F

= Files excluded i.e., les that are not of types listed in the paragraph below. The idea is to nd

ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: September 2021.

ConE: A Concurrent Edit Detection Tool for Large Scale Soware Development 11

Table 3. Distribution of file types seen in bug fixes

File type Percentage On ConE allow list?

.cs 44.32 yes

.cpp 18.55 yes

.c 11.27 yes

.sql 6.20 yes

.java 5.36 yes

.js 3.98 yes

.ts 3.79 yes

.ini 0.20 no

.csproj 0.04 no

others 6.29 no

Table 4. Number of Bug Fixes with RCEs and No RCEs

Edit type Count Percentage

Bug x PRs with no RCEs 1617 78.3

Bug x PRs with at least one RCE 446 21.7

the percentage of items that are commonly edited in multiple active pull requests and create a

pairwise overlap score for each of the active and reference pull request pairs. Intuitively, if the

overlap between two active pull requests is high, the probability of them doing duplicate work or

causing merge conicts when they are merged is also going to be high. We use this technique to

calculate the overlap in terms of number of overlapping les for now. This can be easily extended

to calculate the overlap between two active pull requests in terms of number of classes or methods

or stubs if that data is available.

A milder version of EOO is used by the model proposed by Owhadi-Kareshk et al [

], which

looks at the number of les that are commonly edited in two pull requests when determining

conicting changes. While calculating extent of overlap it is important to exclude edits to certain

le types whose probability of inducing conicts is minimal. This helps in reducing false alarms

in our notications signicantly. Based on a manual inspection of 500 randomly selected bug x

pull requests, by the rst three authors, we concluded that concurrent edits to initialization or

conguration les are relatively safe, but that concurrent edits to source code les are more likely

to lead to problems. Therefore, we created an allow list based on le types as shown in Table 3. As

can be seen, this eliminates around 6.4% of the les. Note that such an allow list is programming

language-specic. When ConE is to be applied in dierent contexts, dierent allow lists are likely

needed.

4.1.2 Rarely Concurrently Edited files (RCEs). These are the les which typically are not edited

concurrently, recently. Usually all the updates or edits to them are performed, in a controlled

fashion, by a single person or small set of people. Seeing RCEs in multiple active pull requests is an

anomalous phenomenon. For example, a le foo.cs is always edited by a given developer, through

one active pull request at any point. The ConE system keeps a track of such les and tags them as

RCEs. In the future, if multiple active pull requests are seen editing this le simultaneously, ConE

ags them. Our intuition is that, if a lot of RCEs are seen in multiple active pull requests, which is

ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: September 2021.

12 Chandra Maddila, Nachiappan Nagappan, Christian Bird, Georgios Gousios, and Arie van Deursen

unusual, changes to these les should be reviewed carefully and everyone involved in editing them

should be aware of others’ changes.

We performed an empirical analysis, from our shadow mode deployment data (as explained in

Section 5.2), to understand how pervasive RCEs really are. As explained in Table 4, 21.7% of bug

xes contains at least one RCE in them while the total number of RCEs in these repositories is

just 2%. Based on this data and anecdotal feedback from developers, we realized that concurrent

edits to RCEs is an unusual activity which should not be seen a lot. But, if observed, it should be

notied to all the developers involved.

For building the ConE system, we ran the RCE detection algorithm that looks at the pull requests

that are created in a repository within the last three months from when the algorithm runs. The

duration can be increased or decreased based on how big or how active the system is. This process,

after each run, creates a list of RCEs. Once the initial bootstrapping is done and a list of RCEs is

prepared, that list is used by the ConE algorithm when checking for the existence of the RCEs in a

pair of pull requests. The RCE list is updated and refreshed once every week, through a separate

process. The process of detecting and updating RCEs is resource intensive. So, we need to strike a

balance between how quickly we would like to update the RCE list versus how many resources we

need to throw at the system, without compromising the quality of the suggestions. We picked one

week as the refresh interval through multiple iterations of experiments. This process guarantees

that the ConE system reacts to the changes in the rarity of concurrent edits, especially the cases

where an RCE becomes a non-RCE due to the concurrent edits it experiences. The steps involved

in creating and updating RCEs are listed below.

Creating the RCE list:

(1)

Get all the pull requests created in the last three months from when the algorithm is run.

Create a list of all the les that are edited in these pull requests by applying the lters

explained in the paragraph above on le types.

(2)

Prepare sets of pull requests that overlap with others. Prepare a list of les edited in the

overlapping pull requests by applying the lters explained in the paragraph above on le

types.

(3)

The list of les created in step-1 minus the list of les created in step-2 constitutes the list of

rarely concurrently edited les (RCEs).

Updating the RCE list:

(4)

Remove les from the RCE list if they are seen in overlapping pull requests when the algorithm

is run the next time. Because, if they are seen in overlapping pull requests, they will not be

qualied to be RCEs anymore.

(5)

Refresh the list by adding the new RCEs discovered in the latest edits, when the algorithm is

run again.

4.2 The ConE Algorithm

ConE’s algorithm to select candidate pull requests that developers need to be notied about

primarily leverages the techniques explained above: Extent of Overlap (EOO) and existence of

Rarely Concurrently Edited les (RCEs). Together these serve to reduce the total number of active

pull requests under consideration, in order to pick the pull requests that need to be notied about.

The ConE algorithm consists of seven steps listed below:

Step 1

: Check if the reference pull request’s age is more than 30 days. Studies have shown that

pull requests which are active for so long may not even be completed [

]. Exclude all such pull

requests.

ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: September 2021.

ConE: A Concurrent Edit Detection Tool for Large Scale Soware Development 13

Step 2

: Construct a list of les that are being edited in the reference pull request. While con-

structing this set, we exclude any les of types that are not in the allow list from Table 3.

Step 3

: Construct a set of les that are being edited in each of the active pull requests, using

the methodology mentioned in Steps 1 and 2. One extra lter that we apply here is to exclude PRs

which are being interacted by the author of the reference pull request. If the author of the reference

pull request is already aware of this pull request there is no need to notify thems.

Step 4

: Calculate the extent of overlap using the formula described in section 4.1. For every pair

of reference pull request PR

and active pull request PR

, calculate the tuple T

ea1

〈

, PR

〉

, where E

is the extent of overlap between the two pull requests. Do this for all the active pull

requests with respect to a reference pull request. At the end of this step we have a list of tuples, T

= [〈PR

, PR

, 55〉, 〈PR

, PR

, 95〉, 〈PR

, PR

, 35〉....].

Step 5

: Check for the existence of rarely concurrently edited les (RCEs) and the number of

RCEs between each pair of reference and active pull request. Create a tuple T

〈

, PR

, R

〉

where PR

is the reference pull request, PR

is active pull request and R

is the number of RCEs in

the overlap of reference and active pull requests. Do this for all reference and active pull request

combinations. At the end of this step we have a list of tuples, T

= [

〈

, PR

, 2

〉

〈

, PR

, 2

〉

〈PR

, PR

, 9〉....]

Step 6

: Apply thresholds on the values for extent of overlap and the number of RCEs, as explained

in section 4.3. For example, we can apply a threshold that we select the pull requests whose extent

of overlap is greater than 50% OR there should be at least two RCEs. We go through the list of

tuples that we have generated in Steps 4 and 5 above and apply the thresholding criteria.

Step 7

: Apply a ranking algorithm to prioritize the pull requests that need to be looked at rst

if multiple pull requests are selected by the algorithm. We rank candidate pull requests based on

the number of RCEs present and then by the extent of overlap. This is because RCEs being edited

through multiple active pull requests is an anomalous phenomenon which needs to be prioritized.

4.3 Default Thresholds and Parameter Tuning

In this section we describe the thresholding criteria, and the rationale that needs to be applied

while choosing parameter values for large scale deployment. The parameters that we have in place

are: the extent of overlap (EOO), the number of rarely concurrently edited les (RCEs), the window

of time period (i.e., the number of months to consider for determining RCEs), and the total number

of le edits in the reference PR.

In line with our objectives, we are searching for parameter settings that nd actual conicts, yet

minimize false alarms. Furthermore, we target settings that are easy to explain (e.g., “this PR was

agged because half of the les changed it are also touched in another PR”).

Threshold for EOO:. For Extent of Overlap, we explored what would happen if we put the threshold

at 50%: if at least half of the les edited in another pull request, consider it for notication. To

assess the consequences of this, we randomly selected 1654 pull requests, which have at least

one le overlapped with another pull request. This data set is a subset of the data collected to

perform empirical analysis on concurrent edits (see Section 3). We manually inspected each of

these 1654 pull requests to make sure the overlap we observe is indeed correct. Our empirical

analysis (see Table 5), shows that 50% of the pull requests have an overlap of 50% or less. Thus,

this simple heuristic eliminates half of the candidate pull requests for notication, substantially

reducing potential false alarms, and keeping the candidates that are more likely to be in conict.

Threshold for RCEs: For RCEs we again followed a simple rule: if the active-reference pull request

pair contains at least two les that are modied in them, which are always edited in isolation,

select the active pull request as a candidate. As shown in Figure 2, the majority of the pull requests

ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: September 2021.

14 Chandra Maddila, Nachiappan Nagappan, Christian Bird, Georgios Gousios, and Arie van Deursen

Table 5. Distribution of extent of overlap (EOO)

Percentage of overlap

(range)

Number

of PRs

0-10

309

11-20

223

21-30

137

31-40

41-50

51-60

359

61-70

71-80

81-90

91-100

378

Fig. 2. Distribution of the number of PRs with a

given number of RCEs

Fig. 3. Distribution of the number of PRs with a

given number of overlapping files.

contains fewer than two RCEs. To be conservative, we imposed a threshold on RCE

≥

2, i.e., to

select a PR as a candidate, that pull request needs to have at least two RCEs that are commonly

edited between the reference and active pull requests.

Number of overlapping les: Assume a developer creates a pull request by editing two les and one

of them is also edited in another active pull request. Here EOO is 50%. This means this pull request

qualies to be picked as a candidate for notication. Editing just one le in two active pull requests

might not be enough to reasonably make an assumption about the potential of conicts arising.

Therefore, we impose a threshold on the “number of les” that needs to be edited, simultaneously,

in both pull requests. As a starting point, we imposed a threshold of two, i.e., every candidate pull

request should have more than two overlapping les (in addition to satisfying the EOO condition

of >= 50%). We plotted the distribution of the number of overlapping les in Figure 3. As shown in

Figure 3, the number of PRs (on the Y-axis) drops sharply after the number of overlapping le edits

is two. Therefore, we picked two as the default threshold.

Threshold Customization: In addition to the empirical analysis, we collected initial feedback

from developers working with the production systems through our shadow mode deployment

(Section 5.2). One of the prominent requests from the developers was to enable the repository

administrators to change the values of the parameters explained above based on the developer

ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: September 2021.

ConE: A Concurrent Edit Detection Tool for Large Scale Soware Development 15

Fig. 4. ConE System design: Pull requests from Azure DevOps are listened to by the ConE change scanner,

suggestion generator, notification engine, and decorator

feedback. Therefore, we provided customization provisions to make ConE system suit each reposi-

tory’s needs. Based on the pull request patterns and needs of the repository, system administrators

can tune the thresholds to optimize the ecacy of the ConE system for particular reporsitories.

5 IMPLEMENTATION AND DEPLOYMENT

5.1 Core Components and Implementation

The core ConE components are displayed in Figure 4. ConE is implemented on Azure DevOps

(ADO), the DevOps platform provided by Microsoft. We chose to develop ConE on ADO due to

its extensibility that allows third party services to interact with pull requests through various

collaboration points such as adding comments in pull requests, a rich set of APIs provided by ADO

to read meta data about pull requests, and service hooks which allow a third party application to

listen to events such as updates that happen inside the pull request environment.

Within Azure DevOps, as shown in the left box of Figure 4, ConE listens to events triggered

by pull requests, and has the ability to decorate pull requests with notications about potentially

conicting other pull requests. The ConE service itself, shown at the right in Figure 4, runs within

the Azure Cloud. The ConE change scanner listens to pull request events, and dispatches them to

workers in the ConE suggestion generator. Furthermore, the scanner monitors telemetry data from

interactions with ConE notications. The core ConE algorithm is oered as a scalable service in

the Suggestion Generator, with parameters tunable as explained in Section 4.3.

The ConE Service is implemented using C# and .NET 4.7. It has been built on top of Microsoft

Azure cloud services: Azure Batch [

] for compute, Azure DevOps service hooks for event noti-

cation, Azure worker roles and its service bus for processing events, Azure SQL for data storage,

Azure Active Directory for authentication and Application Insights for telemetry and alerting.

5.2 ConE Deployment

We selected 234 repositories to pilot ConE in the rst phase. Some of the key attributes based on

which the repository selection process has taken place are listed below:

•

Prioritize repositories where we have developers and managers who volunteered to try ConE,

since we expect them to be willing to provide meaningful feedback.

•

Include repositories that are of dierent sizes (based on the number of les present in them):

very large, large, medium, and small.

ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: September 2021.

16 Chandra Maddila, Nachiappan Nagappan, Christian Bird, Georgios Gousios, and Arie van Deursen

•

Include repositories that host source code for diverse set of products and services. That

includes client side products, mobile apps, enterprise services, cloud services, and gaming

services.

•

Consider repositories which have cross-geography and cross-timezone collaborators, as well

as repositories that have most of the collaborators from a single country.

•

Consider repositories that host source code written in multiple programming languages

including combinations of Java, C#, C++, Objective C, Swift, Javascript, React, SQL etc.

•

Include repositories which contain a mix of developers with dierent levels of experience

(based on their job titles): Senior, mid-level and junior.

We enabled ConE in shadow mode on 60 repositories for two months (with a more liberal set of

parameters to maximize the number of suggestions we generate). In this mode we actively listen

to pull requests, run the ConE algorithm, generate suggestions, and save all the suggestions in

our SQL data store for further analysis, without sending the notications to the developers. We

generated and saved 1200 suggestions by enabling ConE in this mode for two months. We then

went through the suggestions and the telemetry collected to optimize the system before a large

scale roll out.

The primary purpose of shadow mode deployment is to validate whether operationalizing a

service like ConE is even possible at the scale of Microsoft. Furthermore, it allowed us to check

whether we indeed can ag meaningful conicting pull requests, and what developers would think

of the corresponding notications. The telemetry we collected includes the time it takes to run

the ConE algorithm, resource utilization, the number of suggestions the ConE system would have

made, etc. We experimented with tuning our parameters (explained in subsection 4.3) and their

impact on the processing time and system utilization. This helped us in understanding the scale

and infrastructure requirements and overall feasibility.

We collected feedback from the developers by reaching out to them directly. We have shown them

the suggestions we would have made if the ConE system was enabled on their pull requests, format

of the suggestions and the mode of notications. We iterated over the design of the notication

based on the user feedback before settling on the version of the notication as shown in Figure 5.

After multiple iterations of user studies and feedback collection, on the design, frequency, and

the quality of the ConE suggestions as validated by the developers participated in our shadow

mode deployment program, we turned on the notications on 234 repositories.

5.3 Notification Mechanism

We leveraged Azure DevOps’s collaboration points to send notications to developers. A notication

is a comment placed by our system in Azure DevOps pull requests. Figure 5 shows a screenshot

of a comment placed by ConE on an actual pull request. It displays the key elements of a ConE

notication: a comment text which provides a brief description of the notication, the id of the

conicting pull request, the name(s) of the author(s) of the conicting pull request, les that are

commonly edited in the pull requests, a provision to provide feedback by resolving or not xing a

comment (marked as “Resolved” in the example), and the option to reply to the comment inline to

provide explicit written feedback.

While ConE actively monitors every commit that is being pushed to a pull request, it will only

add a second comment on the same pull request again if the state of the active or the reference

pull request is signicantly changed in subsequent updates and ConE nds a dierent set of pull

requests as candidates for notication.

In a ConE comment, elements like pull request id, le names, author name are actually hyperlinks.

The pull request id hyperlink points to the respective pull request’s page in Azure DevOps. The le

ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: September 2021.

ConE: A Concurrent Edit Detection Tool for Large Scale Soware Development 17

Fig. 5. ConE notificationt that a pull request has significant overlap with another pull request.

name hyperlink points to a page that shows the di between the versions of the le in the current

and conicting pull requests. The author name element, upon clicking, spins up a chat window

with the author of the conicting pull request instantly. When people interact with these elements

by clicking them, we track those telemetry events (which is consented by the users of the Azure

DevOps system, in Microsoft) to better understand the level of interaction developers are having

with the ConE system.

5.4 Scale

The ConE system has been deployed on 234 repositories in Microsoft. The repositories have been

picked based to maximize the diversity and variety of the repositories in various dimensions as

explained in Section 5.2. Since enabled in March 2020, until September 2020 (when we pulled the

telemetry data) ConE evaluated 26,000 pull requests which were created in all the repositories on

which ConE has been enabled. Within these 26,000 pull requests, an additional 156,000 update

events (commits on the same branch, possibly aecting new les) occurred. Thus, ConE had to

react to and process a total of 182,000 events that were generated, within Azure DevOps, in those

six months. For every update, ConE has to compare the reference pull request with all active pull

requests that match ConE’s ltering criteria. In total ConE made a total of approximately two

million comparisons.

The scale of operations and processing is expected to grow as we onboard new and large

repositories. The simple and lightweight nature of the ConE algorithm combined with the scalable

architecture and ecient design, and its engineering on Azure cloud has given us the ability to

process events at this scale with a response rate of less than four seconds per event. The time

it takes to process an event end to end, i.e., receiving the pull request creation or update event,

running the ConE algorithm and passing the recommendations back (if any) has never taken more

than four seconds. ConE employed a single service bus queue and four worker roles in Azure

to handle the current scale. As per our monitoring and telemetry (resource utilization on Azure

infrastructure, processing latency, etc.) ConE still had bandwidth left to serve the next hundred

repositories of similar scale with the current infrastructure setup.

ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: September 2021.

18 Chandra Maddila, Nachiappan Nagappan, Christian Bird, Georgios Gousios, and Arie van Deursen

Fig. 6. Distribution of notifications per repository

6 EVALUATION: DEVELOPERS PERCEPTIONS ABOUT CONE’S USEFULNESS

Out of the 26,000 pull requests under analysis during ConE’s six month deployment (Section 5.4),

ConE’s ltering algorithm (Section 4.2) excluded 2,735 pull requests. In the remaining 23,265 pull

requests, ConE identied 775 pull requests to send notications to (3.33%). In this section, we

evaluate the usefulness of these 775 notications.

All repositories were analyzed with the standard conguration; No adjustments were made to

the parameters. Though the service is enabled to send notications in 234 repositories, during

the six-month observation period, ConE raised alerts on just 44 distinct repositories. As shown in

Figure 6, the notication volume varies between repositories.

6.1 Comment resolution percentage

ConE oers an option for users to provide explicit feedback on every comment it placed, within

their pull requests. Users can select the “Resolved” option if they like or agree with the notication,

and the “Won’t x” option if they think it is not useful. A subset of users were given instructions

and training on how to use these options. The notication itself also contains instructions, as shown

in Figure 5. A user can choose not to provide any feedback by just leaving the comment as is, in

the “Active” state. Through this we collect direct feedback from the users of the ConE system.

Figure 7 shows the distribution of the feedback received. The vast majority (554 out of 775, for

71.48 %) of notications was agged as “Resolved”. For 147 (18.96%) of the notications, no feedback

was provided. Various studies have shown that users tend to provide explicit negative feedback

when they do not like or agree with a recommendation, while tend not be so explicit about positive

feedback [35, 43]. Therefore, we cautiously interpret this as neutral to positive.

We manually analyzed all 74 (9.5%) cases where the developers provided negative feedback. For

the majority of them, the developer was already aware of the other conicting pull request. In

some cases the developers thought that ConE is raising a false alarm as they expect no one else

to be making changes to the same les as the ones they are editing. When we show them other

overlapping pull requests that were active while they were working on their pull request, to their

surprise, the notication were not false alarms. We list some of the anecdotes in subsection 6.4.

ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: September 2021.

ConE: A Concurrent Edit Detection Tool for Large Scale Soware Development 19

Fig. 7. Number of positive (Resolved), negative (Won’t Fix), and neutral (Active) responses to ConE notifications

Fig. 8. Number of notifications (orange), and number of interactions (blue) with those notfications, per month.

As developers become more familiar with ConE, they increasingly interact with its notifications

6.2 Extent of interaction

As discussed in Section 5.3, a typical ConE notication/comment has multiple elements that

a developer can interact with: For each conicting pull request, the pull request id, les with

conicting changes, and the author name are shown. These are deep links. Developers can just

take a look at the comment and ignore it or interact with it by clicking on one of the “clickable

elements” in the ConE notication. If the user decides to pursue further clicking on one of these

elements, that action is also logged as telemetry (in Azure AppInsights).

From March 2020 to September 2020, we logged 2170 interactions on 775 comments that ConE

has placed, which amounts to 2.8 clicks per notication on average. Measured over time, as shown

in Figure 8, the number of interactions and the “clicks per notication” are clearly increasing as

more and more people are getting used to ConE comments, and are using it to learn more about

conicting pull requests recommended by ConE.

Note that the extent of interaction does not include additional actions developers can take to

contact authors of conicting pull requests once ConE has made them aware of the conicts, such

as reaching out by phone, walking into each other’s oce, or a simple talk at the water cooler.

ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: September 2021.

20 Chandra Maddila, Nachiappan Nagappan, Christian Bird, Georgios Gousios, and Arie van Deursen

Table 6. Distribution of qualitative feedback responses

Category # of responses

Favorable

I’d love to use ConE 25 (52.08%)

I will use ConE 20 (41.67%)

Unfavorable I don’t want to use ConE 3 (6.25%)

6.3 User Interviews

The quantitative feedback discussed so far captures both direct (comment resolution percentage) and

indirect (extent of interaction) feedback. To better understand the usefulness we directly reached

out (via Microsoft Teams, asynchronously) to authors of 100 randomly selected pull requests for

which ConE placed comments. The user feedback for these 100 pull requests is 45% positively

resolved, 35% won’t x, and 20% no response. The interviewers did not know these authors, nor

had worked with them before, also because the teams working on the systems under study are

organizationally far away from the interviewers.

The interview format is semi-structured where users are free to bring up their own ideas and

free to express their opinions about the ConE system. We posed the following questions:

(1) Is it useful to know about these other PRs that change the same le as yours?

(2)

If yes, roughly how much eort do you estimate was saved as a result of nding out about

the overlapping PRs? If not, is there other information about overlapping PRs that could be

useful to you?

(3)

Does knowing about the overlapping PRs help you to avoid or mitigate a future merge

conict?

(4) What action (if any) will you likely take now that you know about the overlapping PRs?

(5)

Would you be interested in keeping using ConE which noties you about overlapping PRs in

the future? (Note that we aim to avoid being too noisy by not alerting if the overlapping les

are frequently edited by many people, if they are not source code les, etc.)

We did not receive the responses in a uniform format directly based on the structure of the

questions. We used Microsoft Teams to reach out to the developers and the questions are open

ended. Therefore, we could not enforce a strict policy on the number of questions the respondents

should answer and on the length of the answers. Some of the participants answered all questions,

while some answered only one or two. Some respondents were detailed in their response, while

some were succinct with ‘yes’ or ‘’no’ answers. Some of the respondents provided a free-form

response, with an average word count of just 47 words per response. So, we could not calculate

the distribution of responses for all questions. However, we see that for question-5, there were

responses. We coded and categorized the responses we received for question-5 as explained below.

The rst three authors, together, grouped the responses that we received (48 out of 100), until

consensus was reached, into two categories: Favorable (if the users would like to continue using

ConE, i.e., the answer to question 5 is along the lines of ‘I will use ConE’ or a ‘I’d love to use/keep

using ConE’) and Unfavorable (users do not nd the ConE system to be useful and do not want

to continue using it.). Table 6 shows the distribution of the feedback: 93.75% of the respondents

indicated their interest and willingness to use ConE.

ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: September 2021.

ConE: A Concurrent Edit Detection Tool for Large Scale Soware Development 21

6.4 Representative otes

To oer an impression, we list some typical quotes (positive and negative) that we received from the

developers. In one of the pull requests where we sent a ConE notication notifying about potential

conicting changes, a developer said:

"I wasn’t aware about other 2 conicting PRs that are notie d by ConE. I believe that

would be very helpful to have a tool that could provide information about existence of

other PRs and let you know if they perform duplicate work or conicting change!!"

It turned out that the other two developers (the authors of the conicting pull requests agged by

ConE) are from entirely dierent organizations and geographies. Their common denominator is

the CEO of the company. It would be very dicult for the author of the reference pull request to

know about the existence of the other two pull requests without ConE bringing it to their notice.

Several remarks are clear indicators of the usefulness of the ConE system:

"Yes, I would be really interested in a tool that would notify overlapping PRs."

"Looking forward to use it! Ver y promising!"

"ConE is such a neat tool! Very simple but super eective!"

"ConE is a great tool, looking forward to seeing more recommendations from ConE"

"This is an awesome tool, Thank you so much for working to improve our engineering!"

"It is a nice feature and when altering les that are critical or very complex, it is great to

know."

Some developers mentioned that ConE helped them saving time and/or eort signicantly by

providing early intervention:

"ConE is very useful. It saved at least two hours to resolve the conicts and smoke again"

"This would save a couple of hours of dev investigation time a month"

"ConE would have saved probably an hour or so for PR <XYZ>"

We also received feedback from some developers who expressed a feeling that a tool like ConE

may not necessarily be useful for their scenarios:

"For me no, I generally have context on all other ongoing PRs and work that might cause

merge issues. No, thank you!"

"For my team and the repositories that I work in, I don’t think the benet would be that

great. I can see where it could be useful in some cases though"

"It’s not helpful for my specic change, but don’t let that discourage you. I can see how

something like ConE be denitely useful for repositories like <XYZ> which has a lot of

common code"

Another interesting case we noticed is, ConE’s ability to help in detecting duplication of work.

ConE notied a developer (D1) about an active pull request authored by another developer (D2).

After the ConE notication was sent to D1, they realized that D2’s pull request is already solving

the same problem and D2 made more progress. D1 ended up abandoning their pull request and

pushed several code changes in D2’s pull request, which was eventually completed and merged.

When we reached out to D1, they said:

"Due to poor communication / project planning D2 and I ended up working on the same

work item. Even if I was not notied about this situation, I would have eventually learned

about it, but that would have costed me so much time. This is great!"

Though we do not observe scenarios like this frequently, this case demonstrates an example of

the kind of potential conicts ConE can surface, in addition to agging syntactic conicts.

ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: September 2021.

22 Chandra Maddila, Nachiappan Nagappan, Christian Bird, Georgios Gousios, and Arie van Deursen

Table 7. Distribution of quantitative feedback based on size of the repository

Feedback Large repositories Small repositories Total

Positively resolved 404 (77.69%) 150 (58.82%) 554 (71.48%)

Won’t x 33 ( 6.34%) 41 (16.08%) 74 ( 9.54%)

No response 83 (15.96%) 64 (25.10%) 147 (18.96%)

Total 520 (67.09%) 255 (32.90%) 775 (100.0%)

6.5 Factors Aecting ConE Appreciation

After analyzing all the responses from our interviews, analyzing the pull requests on which we

received ‘Won’t Fix’ and interviewing respective pull request authors, we identied the following

main factors as to what makes a developer incline towards using a system like ConE.

Developers who found the ConE notications useful: These are the developers who typically work

on large services with distributed development teams across multiple organizations, geographies

and time zones. They also tend to work on core platforms or common infrastructure (as opposed to

the ones who make changes to the specic components of the product or service). To corroborate

this, the rst author classied the repositories into large and small manually, based on the size and

the activity volume in those repositories. We then, programmatically, categorized the 628 responses

based on their repository sizes. The results, in Table 7), show that for large repositories developers

are positive for 77.69% (404/520) of the cases, whereas for small repositories this is 58.82% (150/255).

Developers who found ConE not so useful: These developers are the ones who work on small micro

services or small scale products and typically work in smaller teams. These developers, and their

teams, tend to have delineated responsibilities. They usually have more control over who makes

changes to their code base. Interestingly, there were cases where some of these developers were

surprised to see another active pull request, created by a dierent developer, from a dierent team

sometimes, which was editing the same area of the source code as their pull request. This could be

a result of underestimating the pace with which service dependencies are introduced, product road

maps change, and codebases are re-purposed in large scale organizations.

7 DISCUSSION

In this section we describe the outlook and future work. We also explain some of the limitations of

the ConE system and how we plan to address them.

7.1 Outlook

One of the immediate goals of the ConE system is to expand its reach beyond the initial 234 on

which it is enabled, and eventually on every source code repository in Microsoft. Furthermore, in

the long run, Microsoft may consider oering ConE as part of its Azure DevOps pipeline, making

it available to its customers across the world. Likewise, GitHub may consider to develop a free

version of ConE as an extension on the GitHub marketplace for the broader developer community

to benet from this work.

As explained, ConE is expected to generate false alarms because of the fact that it is a heuristics

based system.To improve the system and reduce the number of false alarms at this point, ConE

checks for very simple but eective heuristics (see Section 4.2) and conditions to ag conicting

ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: September 2021.

ConE: A Concurrent Edit Detection Tool for Large Scale Soware Development 23

changes that causes unintended consequences. We oer three conguration parameters (see Sec-

tion 4.3), that help us make the solution eective by striking a suitable balance between the rate of

false alarms and coverage, and customize the solution based on individual repository needs.

To further improve precision, we would like to investigate the options that let us go one level

deeper from le level to, e.g., analyze actual code dis. Understanding code dis and performing

semantic analysis on them is a natural next a step for a system like ConE. Providing di information

across every developer branch is fundamentally expensive, so it is not oered by Azure DevOps, the

source control system on which ConE is operationalized, nor by other commercial or free source

control systems like GitLab or GitHub. A possible remedy is to bring the di information into the

ConE system. This involves checking out two versions of the same le, within ConE, and nding

dierences. This has to happen in real-time, in a scalable and language agnostic fashion.

Once we have the di information, another idea is to apply deep learning and code embeddings to

develop better contextual understanding of code changes. We can use the semantic understanding

in combination with the historical data about concurrent and non-concurrent edits to develop

better prediction models and raise alarms when concurrent edits are problematic.

ConE was found to be useful by facilitating early intervention about the potential conicting

change. However, this does not fully solve the problem i.e., xing the merge conicts or merging

the duplicate code. Exploring auto-xing of conicts or code duplication as a natural extension to

ConE’s conict detection algorithm will help in alleviating the problems caused by the conicts

and xing them in an automated fashion.

7.2 Threats to validity

Concerning internal validity, our qualitative analysis was conducted by reaching out to the de-

velopers via Microsoft Teams, asynchronously. None of the interviewers know the people that

were reached out neither worked with them before. We purposefully avoided deploying ConE on

repositories that are under the same organization as any of the researchers involved in this work. As

Microsoft is a huge company and most of the users of the ConE service are organizationally distant

from the interviewers, the risk of response bias is very minimal. However, there is a small chance

that respondents may be positive about the system because they want to make the interviewers,

who are from the same company, happy.

Concerning external validity, the empirical analysis, design and deployment, evaluation and

feedback collection are done specically in the context of Microsoft. The correlations we reported

in Table 2 can vary based on the setting of the organization in which the analysis is performed. As

Microsoft is one of the world’s largest concentration of developers, and developers at Microsoft

uses very diverse set of tools, frameworks, programming languages, our research and the ConE

system will have a broader applicability. However, at this point the results are not veried in the

context of other organizations.

8 CONCLUSION AND FUTURE WORK

In this paper, we seek to address problems originating from concurrent edits to overlapping les in

dierent pull requests. We start out by exploring the extent of the problem, establishing a statistical

relationship between concurrent edits and the need for bug xes in six active industrial repositories

from Microsoft.

Inspired by these ndings we set out to design ConE, an approach to detect concurrently edited

les in pull requests at scale. It is based on heuristics like the extent of overlap and the presence of

rarely concurrently edited les between pairs of pull requests. To make sure the precision of the

system is suciently high, we deploy various lters and parameters that help in controlling the

behavior of the ConE system.

ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: September 2021.

24 Chandra Maddila, Nachiappan Nagappan, Christian Bird, Georgios Gousios, and Arie van Deursen

ConE has been deployed on 234 repositories inside Microsoft. During a period of six months,

ConE generated 775 notications, from which 71.48 % received positive feedback. Interviews with

48 developers showed 93% favorable feedback, and applicability in avoiding merge conicts as well

as duplicate work.

In the future, we anticipate ConE will be employed at substantially more systems within Microsoft.

As ConE has been deployed and found to be useful by the developers in a large and diverse (in

terms of programming languages used, tools, engineering systems, geographical presence, etc)

organization like Microsoft, we believe the techniques and the system has applicability beyond

Microsoft. Furthermore, we see opportunities for implementing a ConE service for systems like

GitHub or GitLab. Future research surrounding ConE might entail improving its precision by

learning from past user feedback or by leveraring dis without sacricing scalability. Beyond

warnings, future research could also target automating actions to be taken to address the pull

request conicts detected by ConE.

REFERENCES

[1]

Paola Accioly, Paulo Borba, and Guilherme Cavalcanti. 2018. Understanding Semi-Structured Merge Conict Character-

istics in Open-Source Java Projects. Empirical Softw. Engg. 23, 4 (Aug. 2018), 2051–2085. https://doi.org/10.1007/s10664-

017-9586-1

[2]

Iftekhar Ahmed, Caius Brindescu, Umme Ayda Mannan, Carlos Jensen, and Anita Sarma. 2017. An Empirical Examina-

tion of the Relationship between Code Smells and Merge Conicts. In 2017 ACM/IEEE International Symposium on

Empirical Software Engineering and Measurement (ESEM). 58–67. https://doi.org/10.1109/ESEM.2017.12

[3]

Usman Ashraf, Christoph Mayr-Dorn, and Alexander Egyed. 2019. Mining Cross-Task Artifact Dependencies from

Developer Interactions. In 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering

(SANER). 186–196. https://doi.org/10.1109/SANER.2019.8667990

[4] Azure Batch Accessed 2020. Azure Batch. https://azure.microsoft.com/en-us/services/batch/

[5] Azure Devops Accessed 2020. Azure DevOps. https://azure.microsoft.com/en-us/services/devops/?nav=min.

[6]

Steve Berczuk and Brad Appleton. 2002. Software Conguration Management Patterns: Eective Teamwork, Practical

Integration.

[7]

Jacob T. Biehl, Mary Czerwinski, Greg Smith, and George G. Robertson. 2007. FASTDash: A Visual Dashboard for

Fostering Awareness in Software Teams. In Proceedings of the SIGCHI Conference on Human Factors in Computing

Systems (San Jose, California, USA) (CHI ’07). Association for Computing Machinery, New York, NY, USA, 1313–1322.

https://doi.org/10.1145/1240624.1240823

[8]

Christian Bird, Peter C. Rigby, Earl T. Barr, David J. Hamilton, Daniel M. German, and Prem Devanbu. 2009. The

promises and perils of mining git. In 2009 6th IEEE International Working Conference on Mining Software Repositories.

1–10. https://doi.org/10.1109/MSR.2009.5069475

[9]

Christian Bird and Thomas Zimmermann. 2012. Assessing the Value of Branches with What-If Analysis. In Proceedings

of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering (Cary, North Carolina)

(FSE ’12). Association for Computing Machinery, New York, NY, USA, Article 45, 11 pages. https://doi.org/10.1145/

2393596.2393648

[10]

Caius Brindescu, Iftekhar Ahmed, Carlos Jensen, and Anita Sarma. 2019. An empirical investigation into merge conicts

and their eect on software quality. Empirical Software Engineering 25 (09 2019). https://doi.org/10.1007/s10664-019-

09735-4

[11]

Caius Brindescu, Iftekhar Ahmed, Rafael Leano, and Anita Sarma. 2020. Planning for untangling: predicting the

diculty of merge conicts. 801–811. https://doi.org/10.1145/3377811.3380344

[12]

Yuriy Brun, Reid Holmes, Michael Ernst, and David Notkin. 2011. Proactive detection of collaboration conicts.

SIGSOFT/FSE 2011 - Proceedings of the 19th ACM SIGSOFT Symposium on Foundations of Software Engineering, 168–178.

https://doi.org/10.1145/2025113.2025139

[13]

Yuriy Brun, Reid Holmes, Michael D. Ernst, and David Notkin. 2011. Proactive Detection of Collaboration Conicts.

In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software

Engineering (Szeged, Hungary) (ESEC/FSE ’11). Association for Computing Machinery, New York, NY, USA, 168–178.

https://doi.org/10.1145/2025113.2025139

[14]

Yuriy Brun, Reid Holmes, Michael D. Ernst, and David Notkin. 2011. Proactive Detection of Collaboration Conicts.

In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software

Engineering (Szeged, Hungary) (ESEC/FSE ’11). Association for Computing Machinery, New York, NY, USA, 168–178.

ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: September 2021.

ConE: A Concurrent Edit Detection Tool for Large Scale Soware Development 25

https://doi.org/10.1145/2025113.2025139

[15]

Yuriy Brun, Reid Holmes, Michael D. Ernst, and David Notkin. 2013. Early Detection of Collaboration Conicts and

Risks. IEEE Trans. Softw. Eng. 39, 10 (Oct. 2013), 1358–1375. https://doi.org/10.1109/TSE.2013.28

[16]

Marcelo Cataldo, Patrick A. Wagstrom, James D. Herbsleb, and Kathleen M. Carley. 2006. Identication of Coordination

Requirements: Implications for the Design of Collaboration and Awareness Tools. In Proceedings of the 2006 20th

Anniversary Conference on Computer Supported Cooperative Work (Ban, Alberta, Canada) (CSCW ’06). Association for

Computing Machinery, New York, NY, USA, 353–362. https://doi.org/10.1145/1180875.1180929

[17]

Catarina Costa, Jose Figueiredo, Gleiph Ghiotto lima de Menezes, and Leonardo Murta. 2014. Characterizing the

Problem of Developers’ Assignment for Merging Branches. International Journal of Software Engineering and Knowledge

Engineering 24 (12 2014), 1489–1508. https://doi.org/10.1142/S0218194014400166

[18]

Catarina Costa, Jair Figueiredo, Leonardo Murta, and Anita Sarma. 2016. TIPMerge: Recommending Experts for

Integrating Changes across Branches. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on

Foundations of Software Engineering (Seattle, WA, USA) (FSE 2016). Association for Computing Machinery, New York,

NY, USA, 523–534. https://doi.org/10.1145/2950290.2950339

[19]

Cleidson R. B. de Souza, David Redmiles, and Paul Dourish. 2003. "Breaking the Code", Moving between Private

and Public Work in Collaborative Software Development. In Proceedings of the 2003 International ACM SIGGROUP

Conference on Supporting Group Work (Sanibel Island, Florida, USA) (GROUP ’03). Association for Computing Machinery,

New York, NY, USA, 105–114. https://doi.org/10.1145/958160.958177

[20]

Cleidson R. B. de Souza, David Redmiles, and Paul Dourish. 2003. “Breaking the Code”, Moving between Private

and Public Work in Collaborative Software Development. In Proceedings of the 2003 International ACM SIGGROUP

Conference on Supporting Group Work (Sanibel Island, Florida, USA) (GROUP ’03). Association for Computing Machinery,

New York, NY, USA, 105–114. https://doi.org/10.1145/958160.958177

[21]

Klissiomara Dias, Paulo Borba, and Marcos Barreto. 2020. Understanding Predictive Factors for Merge Conicts.

Information and Software Technology 121 (05 2020), 106256. https://doi.org/10.1016/j.infsof.2020.106256

[22]

H. Christian Estler, Martin Nordio, Carlo A. Furia, and Bertrand Meyer. 2014. Awareness and Merge Conicts in

Distributed Software Development. In 2014 IEEE 9th International Conference on Global Software Engineering. 26–35.

https://doi.org/10.1109/ICGSE.2014.17

[23] Martin Fowler. Accessed 2020. Semantic Conict. https://martinfowler.com/bliki/SemanticConict.html.

[24] GitHub Accessed 2020. GitHub. https://github.com/about

[25]

Georgios Gousios, Martin Pinzger, and Arie van Deursen. 2014. An exploratory study of the pull-based software

development model.. In ICSE, Pankaj Jalote, Lionel C. Briand, and André van der Hoek (Eds.). ACM, 345–355. http:

//dblp.uni-trier.de/db/conf/icse/icse2014.html#GousiosPD14

[26]

Rebecca E. Grinter. 1995. Using a Conguration Management Tool to Coordinate Software Development. In Proceedings

of Conference on Organizational Computing Systems (Milpitas, California, USA) (COCS ’95). Association for Computing

Machinery, New York, NY, USA, 168–177. https://doi.org/10.1145/224019.224036

[27]

Mário Luís Guimarães and António Rito Silva. 2012. Improving early detection of software merge conicts. In 2012

34th International Conference on Software Engineering (ICSE). 342–352. https://doi.org/10.1109/ICSE.2012.6227180

[28]

Anja Guzzi, Alberto Bacchelli, Yann Riche, and Arie van Deursen. 2015. Supporting Developers’ Coordination in

the IDE. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work and Social Computing

(Vancouver, BC, Canada) (CSCW ’15). Association for Computing Machinery, New York, NY, USA, 518–532. https:

//doi.org/10.1145/2675133.2675177

[29]

Susan Horwitz, Jan Prins, and Thomas Reps. 1989. Integrating Noninterfering Versions of Programs. ACM Trans.

Program. Lang. Syst. 11, 3 (July 1989), 345–387. https://doi.org/10.1145/65979.65980

[30]

Eirini Kalliamvakou, Georgios Gousios, Kelly Blincoe, Leif Singer, Daniel M. German, and Daniela Damian. 2014. The

Promises and Perils of Mining GitHub. In Proceedings of the 11th Working Conference on Mining Software Repositories

(Hyderabad, India) (MSR 2014). Association for Computing Machinery, New York, NY, USA, 92–101. https://doi.org/10.

1145/2597073.2597074

[31]

Bakhtiar Khan Kasi and Anita Sarma. 2013. Cassandra: Proactive Conict Minimization through Optimized Task

Scheduling. In Proceedings of the 2013 International Conference on Software Engineering (San Francisco, CA, USA) (ICSE

’13). IEEE Press, 732–741.

[32]

Bakhtiar Khan Kasi and Anita Sarma. 2013. Cassandra: Proactive conict minimization through optimized task

scheduling. In 2013 35th International Conference on Software Engineering (ICSE). 732–741. https://doi.org/10.1109/

ICSE.2013.6606619

[33]

Sunghun Kim, Thomas Zimmermann, Kai Pan, and E. James Jr. Whitehead. 2006. Automatic Identication of Bug-

Introducing Changes. In 21st IEEE/ACM International Conference on Automated Software Engineering (ASE’06). 81–90.

https://doi.org/10.1109/ASE.2006.23

ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: September 2021.

26 Chandra Maddila, Nachiappan Nagappan, Christian Bird, Georgios Gousios, and Arie van Deursen

[34]

Michele Lanza, Marco D’Ambros, Alberto Bacchelli, Lile Hattori, and Francesco Rigotti. 2013. Manhattan: Supporting

real-time visual team activity awareness. In 2013 21st International Conference on Program Comprehension (ICPC).

207–210. https://doi.org/10.1109/ICPC.2013.6613849

[35]

Dugang Liu, Chen Lin, Zhilin Zhang, Yanghua Xiao, and Hanghang Tong. 2019. Spiral of Silence in Recommender

Systems. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining (Melbourne VIC,

Australia) (WSDM ’19). Association for Computing Machinery, New York, NY, USA, 222–230. https://doi.org/10.1145/

3289600.3291003

[36]

Shane McKee, Nicholas Nelson, Anita Sarma, and Danny Dig. 2017. Software Practitioner Perspectives on Merge

Conicts and Resolutions. In 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME).

467–478. https://doi.org/10.1109/ICSME.2017.53

[37] microsoft Accessed 2020. Microsoft Facts. https://news.microsoft.com/facts-about-microsoft/#EmploymentInfo.

[38]

Nicholas Nelson, Caius Brindescu, Shane McKee, Anita Sarma, and Danny Dig. 2019. The life-cycle of merge conicts:

processes, barriers, and strategies. Empirical Software Engineering 24 (02 2019). https://doi.org/10.1007/s10664-018-

9674-x

[39]

Antti Nieminen. 2012. Real-time collaborative resolving of merge conicts. In 8th International Conference on Collabo-

rative Computing: Networking, Applications and Worksharing (CollaborateCom). 540–543. https://doi.org/10.4108/icst.

collaboratecom.2012.250435

[40]

Moein Owhadi-Kareshk, Sarah Nadi, and Julia Rubin. 2019. Predicting Merge Conicts in Collaborative Software

Development. In 2019 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM).

1–11. https://doi.org/10.1109/ESEM.2019.8870173

[41]

Marc J. Rochkind. 1975. The Source Code Control System. IEEE Transactions on Software Engineering 1, 4 (March 1975),

364–370. https://doi.org/10.1109/TSE.1975.6312866

[42]

Anita Sarma, Zahra Noroozi, and Andre van der Hoek. 2003. Palantír: Raising Awareness among Conguration

Management Workspaces. In Proceedings of the 25th ACM/IEEE International Conference on Software Engineering (ICSE).

IEEE, USA, 444–454. https://doi.org/10.1109/ICSE.2003.1201222

[43]

Harald Steck. 2011. Item Popularity and Recommendation Accuracy. In Proceedings of the Fifth ACM Conference on

Recommender Systems (Chicago, Illinois, USA) (RecSys ’11). Association for Computing Machinery, New York, NY, USA,

125–132. https://doi.org/10.1145/2043932.2043957

[44]

Song Wang, Chetan Bansal, Nachiappan Nagappan, and Adithya Abraham Philip. 2019. Leveraging Change Intents for

Characterizing and Identifying Large-Review-Eort Changes. In Proceedings of the Fifte enth International Conference on

Predictive Models and Data Analytics in Software Engineering (Recife, Brazil) (PROMISE’19). Association for Computing

Machinery, New York, NY, USA, 46–55. https://doi.org/10.1145/3345629.3345635

[45]

Chadd Williams and Jaime Spacco. 2008. SZZ Revisited: Verifying When Changes Induce Fixes. In Proceedings of the

2008 Workshop on Defects in Large Software Systems (Seattle, Washington) (DEFECTS ’08). Association for Computing

Machinery, New York, NY, USA, 32–36. https://doi.org/10.1145/1390817.1390826

[46]

Titus Winters, Tom Manshreck, and Hyrum Wright. 2020. Software Engine ering at Google: Lessons Learned from

Programming Over Time. O’Reilly Media, USA.

[47]

Feng Zhang, Foutse Khomh, Ying Zou, and Ahmed E. Hassan. 2012. An Empirical Study of the Eect of File Editing

Patterns on Software Quality. In 2012 19th Working Conference on Reverse Engineering. 456–465. https://doi.org/10.

1109/WCRE.2012.55

ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: September 2021.